Skip to main content
Version: 6.10

Quality Assurance

The package "qualityassurance" contains steps that aim for checking of SIPs.

Quality Assurance: extract SIP into workfolder

Extracts a zipped SIP to actions.workbench.work. An optional second parameter can be used to specify a different destination folder.

java ch.docuteam.actions.qualityassurance.SIPExtractor \
[path/to/]SIP [targetdir]
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.inbox property
[targetdir]target directory; absolute path of the directory where to unzip the SIP to. Optional, defaults to actions.workbench.work

Quality Assurance: check fixity of SIP

Checks the files in a SIP for conformity with the checksums stored in the METS file. The results of the check are written to the METS file as PREMIS events.

java ch.docuteam.actions.qualityassurance.SIPFixityCheck \
[path/to/]SIP
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property

Quality Assurance: check file path length

Checks whether the length of absolute paths of a SIP is greater than a specified value.

java ch.docuteam.actions.qualityassurance.FilePathLengthCheck \
/absolute/path/to/folder maxAllowedFilePathLength
ParameterDescription
/absolute/path/to/folderabsolute path of the folder that should be checked
maxAllowedFilePathLengththe max allowed number of characters of the canonical file path

Quality Assurance: check sip path length

Checks the file path lengths within a SIP against a specified limit.

java ch.docuteam.actions.qualityassurance.SIPPathLengthCheck \
[path/to/]SIP maxAllowedFilePathLength
ParameterDescription
[path/to/]SIPname of the SIP; if not path is given, it will be expected to be in the location defined by the actions.workbench.work property
maxAllowedFilePathLengththe max allowed number of characters of the canonical file path

Quality Assurance: get PID

Connects to a Fedora repository and retrieves a single PID. This PID later becomes the basis for storage in the repository. The value is stored in the <mets:OBJID/> element.

java ch.docuteam.actions.qualityassurance.SIPConfirmation \
[path/to/]SIP [PIDNamespace[:###]]
ParameterDescription
[path/to/]SIPname of the SIP. If no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[PID namespace[:###]]namespace for new PID or complete PID to use for the object; if omitted, the standard namespace from the submission agreement will be used; if the submission agreement cannot be found, the default namespace of the Fedora repository will be used.

Quality Assurance: convert to safe filenames

Renames files with special characters. Safe file names contain only characters from A-Z, a-z, 0-9, and "_.-".

java ch.docuteam.actions.qualityassurance.SIPConvertToSafeFileNames \
[path/to/]SIP
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property

Quality Assurance: check file extensions

Checks that the file extensions in a SIP correspond to the PRONOM unique identifiers (PUID). This step does not run the file format identification again, but uses the PUIDs present in the mets.xml.

If a file does not have an extension, the action adds it based on the PUID. If a file has a wrong extension, the action either reports all files with wrong extensions (default) or overwrites the extension (if --replaceExistingExtensions=true is set). If a file has no PUID, the action either reports this as an error or ignores files without PUID (if --ignoreUnidentifiedFiles=true is set).

java ch.docuteam.actions.qualityassurance.SIPFileExtensionCheck \
--sip=[path/to/]SIP [--replaceExistingExtensions={true|false}] [--ignoreUnidentifiedFiles={true|false}]
ParameterDescription
--sipname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
--replaceExistingExtensionsoptional, `{ true
--ignoreUnidentifiedFilesoptional, `{ true

Quality Assurance: delete backup files

Deletes files from SIP that match a specific name pattern.

java ch.docuteam.actions.qualityassurance.SIPDeleteBackupFiles \
[path/to/]SIP [filenamePattern filenamePattern ...]
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[filenamePattern filenamePattern ...]a list of filename patterns (not case-sensitive, '*' is wildcard, but is only allowed at the beginning or end of the pattern). Files matching any one of this patterns will be deleted

Quality Assurance: check SIP against submission agreement

Checks whether the file formats comply with the specifications in the submission agreement. There are two modes: in the first mode (removeBadFiles = false), every file which does not match the submission agreement is listed (using the WARN log entries) and an error code is displayed. In the second mode (removeBadFiles = true), every file which does not match the submission agreement will be deleted from the SIP. The modified mets.xml is saved (the original SIP remains unchanged as a backup).

java ch.docuteam.actions.qualityassurance.SIPSubmissionAgreementCheck \
[path/to/]SIP [removeBadFiles] [operationSA] [operationDSS]
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[removeBadFiles]optional, `true
[operationSA]optional, ID of an external submission agreement to be used for this action (instead of the agreement which is part of the SIP)
[operationDSS]optional, ID of an external data submission section to be used for this action (instead of the agreement which is part of the SIP)

Quality Assurance: SIP virus check

Every file present in the SIP is scanned for viruses. The virus scanner of ClamAV (www.clamav.net) is used for virus checking.

Prerequisite for this check is a started ClamAV service. Depending on the second argument, infected files will be discarded or automatically deleted.

java ch.docuteam.actions.qualityassurance.SIPVirusCheck \
[path/to/]SIP deleteInfected
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
deleteInfectedif true, the operation automatically removes infected files from the SIP

Quality Assurance: remove by level of description

Removes a ertain description level from a SIP.

java ch.docuteam.actions.qualityassurance.RemoveByLevelOfDescription \
[/path/to/]folder levelOfDescription
ParameterDescription
[path/to/]folderpath of the folder to rename; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
levelOfDescriptionname of the level of description to be removed from the SIP

Quality Assurance: add/update file format information

For all files of the SIP, the format identification is done and the resulting information added/updated.

A parameter allows to indicate if existing information should be replaced or kept (default: false).

java ch.docuteam.actions.qualityassurance.SIPFormatIdentificationCheck \
--sip=[path/to/]SIP [--replaceExistingFormatInfo={true|false}]
ParameterDescription
--sipname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
--replaceExistingFormatInfooptional, `true

Quality Assurance: remove files by format (PUID or MIME type)

Deletes all files of the SIP that match a given file format. Formats can be indicated either by MIME type or by Pronom Unique IDentifiers (PUID).

java ch.docuteam.actions.qualityassurance.SIPDeleteFilesByFormat \
--sip=[path/to/]SIP [--mimetype=...] [--puid=...]
ParameterDescription
--sipname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
--mimetypeoptional; comma separated list of MIME types to be deleted from this package
--puidoptional; comma separated list of PRONOM identifiers (PUID) to be deleted from this package