Ingest
The package "ingest" contains steps for the ingest process.
Ingest: add DNB URN to the root node of the SIP
This action uses the URN suggestion service of the German National Library (DNB) to generate an URN in the selected namespace and store it as a metadata element (accessor "URN") of the root node. If an URN is already present in the metadata, no new URN is generated. Optionally the URN ID (a substring of the URN without prefixes) can be stored in an additional element by filling out the urnIdAccessor parameter. If this field already contains a value, no URN ID is written.
java ch.docuteam.actions.ingest.AddDnbUrnToRootNode \
--sip=[/path/to/]SIP \
--urnNamespace=urnNamespace \
[--urnIdAccessor=urnIdAccessor] \
Parameter | Description |
---|---|
--sip=[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
--urnNamespace=urnNamespace | namespace for which an URN will be generated. |
[--urnIdAccessor=urnIdAccessor ] | EAD accessor defining which EAD field of the root node is used to store the URN ID (URN without prefixes). If left empty, no URN ID is written. |
Ingest: convert BAR-SIP
Converts a BAR-SIP into a SIP that conforms to the Matterhorn profile.
java ch.docuteam.actions.ingest.BARSIPConverter \
[path/to/]SIP [targetFolder]
Parameter | Description |
---|---|
[path/to/]BAR-SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.inbox property |
[targetFolder] | directory where to move the created SIP to; if omitted, the SIP will be moved to the location defined by the actions.workbench.work property |
Ingest: create SIP from eCH-0160 SIP
Creates a SIP that conforms to the Matterhorn profile from a eCH-0160 SIP.
java ch.docuteam.actions.ingest.CreateSIPFromECH0160SIP \
--sip=[path/to/]SIP \
--levelsFilePath=/path/to/levels.xml \
--[mappingFile=[path/to/]mappingFile] \
--[output-folder=/path/to/folder]
Parameter | Description |
---|---|
--sip=[path/to/]SIP | location of the SIP to convert; default lookup folder is actions.workbench.inbox |
--levelsFilePath=/path/to/levels.xml | path to the file levels.xml |
--[mappingFile=[path/to/]mappingFile] | file from which to read the mapping; defaults to a default mapping file (defined by the mapping module) |
--[output-folder=/path/to/folder] | indicate the output folder; defaults to actions.workbench.work |
Ingest: check workbench space
Checks if there is enough space for SIP processing (i.e. for working copies).
java ch.docuteam.actions.ingest.CheckWorkbenchSpace \
[path/to/]SIP [numberOfCopies]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP. If no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[numberOfCopies] | optional, number of copies to calculate with; defaults to 3 |
Ingest: cleanup working copies
Deletes existing SIPs in actions.workbench.work
. Optionally, you can also delete SIPs with the same name in actions.workbench.preparation
.
java ch.docuteam.actions.ingest.Cleanup \
[path/to/]SIP [prep]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP. If no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[prep] | if true , SIPs of the same name in actions.workbench.preparation will be removed as well; defaults to false |
Ingest: create EAD file
Creates EAD data from individual nodes of a given SIP.
java ch.docuteam.actions.ingest.CreateEADFile \
[path/to/]SIP [targetFilename]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[targetFilename] | optional, name of the output file; defaults to EAD.xml within the SIP's subfolder in the location defined by the actions.workbench.output property |
Ingest: extent calculator
Sets the number of files in the "Extent" metadata field and the unit to the default value "File(s)".
java ch.docuteam.actions.ingest.ExtentCalculator \
[path/to/]SIP
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
Ingest: migrate files
Migrates the files of a SIP according to the migrations specifications in the configuration file migration-config.xml
.
java ch.docuteam.actions.ingest.SIPFileMigrator \
[path/to/]SIP keepOriginals [path/to/migration-config.xml]
Parameter | Description |
---|---|
[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
keepOriginals | { true |
[path/to/migration-config.xml] | optional, path to a specific migration configuration file (defaults to ./config/migration-config.xml ) |
[skipAlreadyMigratedFiles] | optional, `true |
Ingest: remove SIP from inbox
Moves an existing SIP from actions.workbench.inbox
to a specified folder or deletes it if no destination folder is specified.
java ch.docuteam.actions.ingest.SIPRemoveFromInbox \
[path/to/]SIP [targetFolder]
Parameter | Description |
---|---|
[path/to/]SIP | path of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.inbox property |
[targetFolder] | directory where to move the SIP to; if omitted, the SIP will be deleted |
Ingest: replace file
Replaces a file in a SIP. The metadata is retained or added. Currently, only SIPs containing a single file can be processed with this step.
java ch.docuteam.actions.ingest.ReplaceFile \
[path/to/]SIP [targetFolder]
Parameter | Description |
---|---|
[path/to/]SIP | path of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
[targetFolder] | path to the file to be used as replacement of the current SIP content |
Ingest: get MARC from REST and add to SIP
For every object (file/folder) of a SIP, the process gets a MARC description from a REST webservice and adds it to the descriptive metadata.
The URL of the webservice is configured in the actions.properties with the "aleph.webservice.url" property.
The URL should contain a placehoder documentNumber
which is being replaced by the specific document number. The latter is extracted for each object based on its filename:
- For a filename of
BAU_5_000000444.wav
the document number000000444
will be extracted. - For a foldername of
DIRECTORY_X_000000555
the document number000000555
will be extracted.
If the HTTP-Request fails or the filename is invalid the operation will stop and leave the SIP unchagnedunchagendunchanged. Existing MARC metadata will be overwritten by the succesful operation.
java ch.docuteam.actions.marc.AddMarcFromRestByIdFromNodeName \
--sip=[path/to/]SIP
Parameter | Description |
---|---|
--sip=[path/to/]SIP | location of the SIP to convert; default lookup folder is actions.workbench.work |
Ingest: add OAIDC from REST by ID from filename
Takes a SIP and adds OAI DC information to its root folder.
The OAI DC information is requested from a web service, defined by the property "oai.webservice.url".
The URL is expected to have a placeholder identifier
which is replaced with the root node’s name, for example:
- "Kürzel-SignaturTIFF" z.B. "bbb-0027TIFF" wird zum
identifier
"bbb/0027"
If the node within the SIP has an invalid name or the request of the OAI DC information fails, the operation aborts and the SIP file is not change. When calling the operation on a SIP file which already contains OAI DC information, and Exception is thrown.
Additional file resources defined in the
java ch.docuteam.actions.oai_dc.AddOAIDCFromRESTByIDFromFilename \
--sip=[path/to/]SIP
Parameter | Description |
---|---|
--sip=[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
Ingest: convert an EDIDOC/EDIAKT package into a Matterhorn METS SIP
Creates a SIP that conforms to the Matterhorn METS profile from a EDIDOC/EDIAKT package.
java ch.docuteam.actions.ingest.CreateSIPFromEdidocSIP \
--sip=[path/to/]SIP \
--levelsFilePath=path/to/levels.xml \
[--mappingFile=path/to/mappingFile] \
[--outputFolder=/path/to/folder] \
[--steuerXml=/path/to/file]
Parameter | Description |
---|---|
--sip=[path/to/]SIP | location of the package to convert; default lookup folder is actions.workbench.inbox |
--levelsFilePath=path/to/levels.xml | path to the level configuration file, to be found to the classpath |
[--mappingFile=path/to/mappingFile] | file from which to read the mapping; defaults to ./config/edidoc-mapping.xml, to be found to the classpath |
[--outputFolder=/path/to/folder] | indicate the output folder; defaults to actions.workbench.work |
[--steuerXml=/path/to/file] | path to the EDIDOC archives extension XML file |
Ingest: import or download MARCXML file to or from Alma
The ImportOrDownloadMarcXmlIntoAlma action imports a MARCXML file (which needs to be present in the SIP) into Alma using the Alma REST API. After import, the answer from Alma (updated MARCXML file) is used to update the MARCXML file in the SIP. If the file is not present in the SIP, the action will either fail or (if the optional 'downloadIfMissing' flag is set) download the MARCXML file from Alma API (assuming its MMS ID is stored in the root node element defined by the writeMmsIdRoot parameter). By defining an accessor in the parameters writeMmsId and writeMmsIdRoot the MMS ID of the imported/downloaded file can also be written in the mets.xml metadata of the MARCXML file node (writeMmsId) and/or of the SIP root node (writeMmsIdRoot).
Connection information for the Alma REST API are expected in the actions.properties file.
java ch.docuteam.actions.ingest.alma.ImportOrDownloadMarcXmlIntoAlma \
--sip=[/path/to/]SIP \
--marcxml=path/to/marc.xml \
[--writeMmsId] \
[--writeMmsIdRoot] \
[--downloadIfMissing=false] \
[--checkMatch=false] \
[--fromCzMmsId] \
[--fromNzMmsId] \
[--importProfile] \
[--normalization] \
[--overrideWarning=true] \
[--validate=false]
Parameter | Description |
---|---|
--sip=[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property. |
--marcxml=path/to/marc.xml | MARCXML metadata file path (relative to the SIP's root node). |
[--writeMmsId=refCode] | EAD accessor defining the metadata element of the MARCXML file node into which the MMS ID is written after import. If empty, no MMS ID will be written for the file node. |
[--writeMmsIdRoot=refCode] | EAD accessor defining the metadata element of the root node into which the MMS ID is written after import. If empty, no MMS ID will be written for the root node. |
[--downloadIfMissing=false] | Indicating whether to download a MARCXML file (based on the MMS ID stored in the element defined by the writeMmsIdRoot accessor) from Alma if the MARCXML metadata file path does not point to a file. Default: false. |
[--checkMatch=false] | Indicating whether to check for a match. Default: false (record will be saved despite possible match). |
[--fromCzMmsId] | The MMS_ID of the Community-Zone record. Leave empty when creating a regular local record. |
[--fromNzMmsId] | The MMS_ID of the Network-Zone record. Leave empty when creating a regular local record. |
[--importProfile] | The ID of the import profile to use when processing the input record. Note that according to the profile configuration, the API can update an existing record in some cases. |
[--normalization] | The ID of the normalization profile to run. |
[--overrideWarning=true] | Indicating whether to ignore warnings. Default: true (record will be saved and the warnings will be added to the API output). |
[--validate=false] | Indicating whether to validate the MARC XML file. Default: false. |
Ingest: rename root node based on EAD metadata
This action updates the file or folder name of the root node a SIP based on EAD metadata of the root node. Special characters are normalised. If the given metadata elements exist multiple times, the first instance is used.
java ch.docuteam.actions.ingest.RenameRootNodeFromEad \
--sip=[/path/to/]SIP \
--accessorName=accessorName \
Parameter | Description |
---|---|
--sip=[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
--accessorName=accessorName | EAD accessor defining which EAD field of the root node is used as the source of the new file or folder name. |
Ingest: update xml file in SIP using xslt
Using an XSLT file, this action can update an XML file within the SIP applying this transformation. The action sets an xsl:parameter called "pathToMets" containing the path to the mets.xml file, so it can be read out during the xsl transformation.
java ch.docuteam.actions.ingest.ModifyFileWithXSL \
--sip=[/path/to/]SIP \
--xml=path/to/file.xml \
--xsl=path/to/transformation.xsl
Parameter | Description |
---|---|
--sip=[path/to/]SIP | name of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property |
--xml=path/to/file.xml | path to xml file within the SIP to be transformed (relative to the SIP's root node). The path accepts a wildcard (*) in place of the root folder. |
--xsl=path/to/transformation.xsl | path to the xsl script to be used in the transformation (if relative, assume xsl resides in $ACTIONS_HOME/xslt) |