Quality Assurance
The package "qualityassurance" contains steps that aim for checking of SIPs.
Quality Assurance: extract SIP into workfolder
Extracts a zipped SIP to actions.workbench.work
. An optional second parameter can be used to specify a different destination folder.
java ch.docuteam.actions.qualityassurance.SIPExtractor \
[path/to/]SIP [targetdir]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.inbox property |
[targetdir] | target directory; absolute path of the directory where to unzip the SIP to. Optional, defaults to actions.workbench.work |
Quality Assurance: check fixity of SIP
Checks the files in a SIP for conformity with the checksums stored in the METS file. The results of the check are written to the METS file as PREMIS events.
java ch.docuteam.actions.qualityassurance.SIPFixityCheck \
[path/to/]SIP
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
Quality Assurance: check file path length
Checks whether the length of absolute paths of a SIP is greater than a specified value.
java ch.docuteam.actions.qualityassurance.FilePathLengthCheck \
/absolute/path/to/folder maxAllowedFilePathLength
Parameter | Description |
---|---|
/absolute/path/to/folder | absolute path of the folder that should be checked |
maxAllowedFilePathLength | the max allowed number of characters of the canonical file path |
Quality Assurance: check sip path length
Checks the file path lengths within a SIP against a specified limit.
java ch.docuteam.actions.qualityassurance.SIPPathLengthCheck \
[path/to/]SIP maxAllowedFilePathLength
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if not path is given, it will be expected to be in the location defined by the actions.workbench.work property |
maxAllowedFilePathLength | the max allowed number of characters of the canonical file path |
Quality Assurance: get PID
Connects to a Fedora repository and retrieves a single PID. This PID later becomes the basis for storage in the repository. The value is stored in the `
java ch.docuteam.actions.qualityassurance.SIPConfirmation \
[path/to/]SIP [PIDNamespace[:###]]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP. If no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[PID namespace[:###]] | namespace for new PID or complete PID to use for the object; if omitted, the standard namespace from the submission agreement will be used; if the submission agreement cannot be found, the default namespace of the Fedora repository will be used. |
Quality Assurance: convert to safe filenames
Renames files with special characters. Safe file names contain only characters from A-Z, a-z, 0-9, and "_.-".
java ch.docuteam.actions.qualityassurance.SIPConvertToSafeFileNames \
[path/to/]SIP
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
Quality Assurance: check file extensions
Checks that the file extensions in a SIP correspond to the PRONOM unique identifiers (PUID). This step does not run the file format identification again, but uses the PUIDs present in the mets.xml.
If a file does not have an extension, the action adds it based on the PUID.
If a file has a wrong extension, the action either reports all files with wrong extensions (default) or overwrites the extension (if --replaceExistingExtensions=true
is set).
If a file has no PUID, the action either reports this as an error or ignores files without PUID (if --ignoreUnidentifiedFiles=true
is set).
java ch.docuteam.actions.qualityassurance.SIPFileExtensionCheck \
--sip=[path/to/]SIP [--replaceExistingExtensions=true|false] [--ignoreUnidentifiedFiles=true|false]
Parameter | Description |
---|---|
--sip | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
--replaceExistingExtensions | optional, `true |
--ignoreUnidentifiedFiles | optional, `true |
Quality Assurance: delete backup files
Deletes files from SIP that match a specific name pattern.
java ch.docuteam.actions.qualityassurance.SIPDeleteBackupFiles \
[path/to/]SIP [filenamePattern filenamePattern ...]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[filenamePattern filenamePattern ...] | a list of filename patterns (not case-sensitive, '*' is wildcard, but is only allowed at the beginning or end of the pattern). Files matching any one of this patterns will be deleted |
Quality Assurance: delete empty files
Deletes empty files (with files size 0 bytes) from the SIP.
java ch.docuteam.actions.qualityassurance.SIPDeleteEmptyFiles \
--sip=[path/to/]SIP
Parameter | Description |
---|---|
--sip=[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
Quality Assurance: check SIP against submission agreement
Checks whether the file formats comply with the specifications in the submission agreement. There are two modes: in the first mode (removeBadFiles
= false), every file which does not match the submission agreement is listed (using the WARN log entries) and an error code is displayed. In the second mode (removeBadFiles
= true), every file which does not match the submission agreement will be deleted from the SIP. The modified mets.xml
is saved (the original SIP remains unchanged as a backup).
java ch.docuteam.actions.qualityassurance.SIPSubmissionAgreementCheck \
[path/to/]SIP [removeBadFiles] [operationSA] [operationDSS]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[removeBadFiles] | optional, `true |
[operationSA] | optional, ID of an external submission agreement to be used for this action (instead of the agreement which is part of the SIP) |
[operationDSS] | optional, ID of an external data submission section to be used for this action (instead of the agreement which is part of the SIP) |
Quality Assurance: SIP virus check
Every file present in the SIP is scanned for viruses. The virus scanner of ClamAV (www.clamav.net) is used for virus checking.
Prerequisite for this check is a started ClamAV service. Depending on the second argument, infected files will be discarded or automatically deleted. With the optional third argument, a maximal file size can be defined. Only files with a size smaller than this limit will be scanned. The limit can be entered in one of the following formats: 1B, 1KB, 1MB, 1GB, 1TB. Only integers are allowed, fractions (e.g. 1.5GB) do not work.
java ch.docuteam.actions.qualityassurance.SIPVirusCheck \
[path/to/]SIP deleteInfected [maxSize]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
deleteInfected | if true , the operation automatically removes infected files from the SIP |
[maxSize] | maxSize in B, KB, MB, GB or TB, only files smaller than this limit will be scanned |
Quality Assurance: remove by level of description
Removes a ertain description level from a SIP.
java ch.docuteam.actions.qualityassurance.RemoveByLevelOfDescription \
[/path/to/]folder levelOfDescription
Parameter | Description |
---|---|
[path/to/]folder | path of the folder to rename; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
levelOfDescription | name of the level of description to be removed from the SIP |
Quality Assurance: add/update file format information
For all files of the SIP, the format identification is done and the resulting information added/updated.
A parameter allows to indicate if existing information should be replaced or kept (default: false).
java ch.docuteam.actions.qualityassurance.SIPFormatIdentificationCheck \
--sip=[path/to/]SIP [--replaceExistingFormatInfo=true|false]
Parameter | Description |
---|---|
--sip | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
--replaceExistingFormatInfo | optional, `true |
Quality Assurance: remove files by format (PUID or MIME type)
Deletes all files of the SIP that match a given file format. Formats can be indicated either by MIME type or by Pronom Unique IDentifiers (PUID).
java ch.docuteam.actions.qualityassurance.SIPDeleteFilesByFormat \
--sip=[path/to/]SIP [--mimetype=...] [--puid=...]
Parameter | Description |
---|---|
--sip | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
--mimetype | optional; comma separated list of MIME types to be deleted from this package |
--puid | optional; comma separated list of PRONOM identifiers (PUID) to be deleted from this package |