Skip to main content
Version: 6.10

Ingest

The package "ingest" contains steps for the ingest process.

Ingest: add DNB URN to the root node of the SIP

This action uses the URN suggestion service of the German National Library (DNB) to generate an URN in the selected namespace and store it as a metadata element (accessor "URN") of the root node. If an URN is already present in the metadata, no new URN is generated. Optionally the URN ID (a substring of the URN without prefixes) can be stored in an additional element by filling out the urnIdAccessor parameter. If this field already contains a value, no URN ID is written.

java ch.docuteam.actions.ingest.AddDnbUrnToRootNode \
--sip=[/path/to/]SIP \
--urnNamespace=urnNamespace \
[--urnIdAccessor=urnIdAccessor] \
ParameterDescription
--sip=[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
--urnNamespace=urnNamespacenamespace for which an URN will be generated.
[--urnIdAccessor=urnIdAccessor ]EAD accessor defining which EAD field of the root node is used to store the URN ID (URN without prefixes). If left empty, no URN ID is written.

Ingest: convert BAR-SIP

Converts a BAR-SIP into a SIP that conforms to the Matterhorn profile.

java ch.docuteam.actions.ingest.BARSIPConverter \
[path/to/]SIP [targetFolder]
ParameterDescription
[path/to/]BAR-SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.inbox property
[targetFolder]directory where to move the created SIP to; if omitted, the SIP will be moved to the location defined by the actions.workbench.work property

Ingest: create SIP from eCH-0160 SIP

Creates a SIP that conforms to the Matterhorn profile from a eCH-0160 SIP.

java ch.docuteam.actions.ingest.CreateSIPFromECH0160SIP \
--sip=[path/to/]SIP \
--levelsFilePath=/path/to/levels.xml \
--[mappingFile=[path/to/]mappingFile] \
--[output-folder=/path/to/folder]
ParameterDescription
--sip=[path/to/]SIPlocation of the SIP to convert; default lookup folder is actions.workbench.inbox
--levelsFilePath=/path/to/levels.xmlpath to the file levels.xml
--[mappingFile=[path/to/]mappingFile]file from which to read the mapping; defaults to a default mapping file (defined by the mapping module)
--[output-folder=/path/to/folder]indicate the output folder; defaults to actions.workbench.work

Ingest: check workbench space

Checks if there is enough space for SIP processing (i.e. for working copies).

java ch.docuteam.actions.ingest.CheckWorkbenchSpace \
[path/to/]SIP [numberOfCopies]
ParameterDescription
[path/to/]SIPname of the SIP. If no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[numberOfCopies]optional, number of copies to calculate with; defaults to 3

Ingest: cleanup working copies

Deletes existing SIPs in actions.workbench.work. Optionally, you can also delete SIPs with the same name in actions.workbench.preparation.

java ch.docuteam.actions.ingest.Cleanup \
[path/to/]SIP [prep]
ParameterDescription
[path/to/]SIPname of the SIP. If no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[prep]if true, SIPs of the same name in actions.workbench.preparation will be removed as well; defaults to false

Ingest: create EAD file

Creates EAD data from individual nodes of a given SIP.

java ch.docuteam.actions.ingest.CreateEADFile \
[path/to/]SIP [targetFilename]
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[targetFilename]optional, name of the output file; defaults to EAD.xml within the SIP's subfolder in the location defined by the actions.workbench.output property

Ingest: extent calculator

Sets the number of files in the "Extent" metadata field and the unit to the default value "File(s)".

java ch.docuteam.actions.ingest.ExtentCalculator \
[path/to/]SIP
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property

Ingest: migrate files

Migrates the files of a SIP according to the migrations specifications in the configuration file migration-config.xml.

java ch.docuteam.actions.ingest.SIPFileMigrator \
[path/to/]SIP keepOriginals [path/to/migration-config.xml]
ParameterDescription
[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
keepOriginals`true
[path/to/migration-config.xml]optional, path to a specific migration configuration file (defaults to ./config/migration-config.xml)
[skipAlreadyMigratedFiles]optional, `true

Ingest: remove SIP from inbox

Moves an existing SIP from actions.workbench.inbox to a specified folder or deletes it if no destination folder is specified.

java ch.docuteam.actions.ingest.SIPRemoveFromInbox \
[path/to/]SIP [targetFolder]
ParameterDescription
[path/to/]SIPpath of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.inbox property
[targetFolder]directory where to move the SIP to; if omitted, the SIP will be deleted

Ingest: replace file

Replaces a file in a SIP. The metadata is retained or added. Currently, only SIPs containing a single file can be processed with this step.

java ch.docuteam.actions.ingest.ReplaceFile \
[path/to/]SIP [targetFolder]
ParameterDescription
[path/to/]SIPpath of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
[targetFolder]path to the file to be used as replacement of the current SIP content

Ingest: get MARC from REST and add to SIP

For every object (file/folder) of a SIP, the process gets a MARC description from a REST webservice and adds it to the descriptive metadata.

The URL of the webservice is configured in the actions.properties with the "aleph.webservice.url" property. The URL should contain a placehoder documentNumber which is being replaced by the specific document number. The latter is extracted for each object based on its filename:

  • For a filename of BAU_5_000000444.wav the document number 000000444 will be extracted.
  • For a foldername of DIRECTORY_X_000000555 the document number 000000555 will be extracted.

If the HTTP-Request fails or the filename is invalid the operation will stop and leave the SIP unchagnedunchagendunchanged. Existing MARC metadata will be overwritten by the succesful operation.

java ch.docuteam.actions.marc.AddMarcFromRestByIdFromNodeName \
--sip=[path/to/]SIP
ParameterDescription
--sip=[path/to/]SIPlocation of the SIP to convert; default lookup folder is actions.workbench.work

Ingest: add OAIDC from REST by ID from filename

Takes a SIP and adds OAI DC information to its root folder.

The OAI DC information is requested from a web service, defined by the property "oai.webservice.url". The URL is expected to have a placeholder {identifier} which is replaced with the root node’s name, for example:

  • “Kürzel-SignaturTIFF” z.B. “bbb-0027TIFF” wird zum {identifier} "bbb/0027"

If the node within the SIP has an invalid name or the request of the OAI DC information fails, the operation aborts and the SIP file is not change. When calling the operation on a SIP file which already contains OAI DC information, and Exception is thrown.

Additional file resources defined in the metadata are downloaded and appended to the SIP under a new subfolder labeled "TEI-Handschriftenbeschreibungen".

java ch.docuteam.actions.oai_dc.AddOAIDCFromRESTByIDFromFilename \
--sip=[path/to/]SIP
ParameterDescription
--sip=[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property

Ingest: convert an EDIDOC/EDIAKT package into a Matterhorn METS SIP

Creates a SIP that conforms to the Matterhorn METS profile from a EDIDOC/EDIAKT package.

java ch.docuteam.actions.ingest.CreateSIPFromEdidocSIP \
--sip=[path/to/]SIP \
--levelsFilePath=path/to/levels.xml \
[--mappingFile=path/to/mappingFile] \
[--outputFolder=/path/to/folder] \
[--steuerXml=/path/to/file]
ParameterDescription
--sip=[path/to/]SIPlocation of the package to convert; default lookup folder is actions.workbench.inbox
--levelsFilePath=path/to/levels.xmlpath to the level configuration file, to be found to the classpath
[--mappingFile=path/to/mappingFile]file from which to read the mapping; defaults to ./config/edidoc-mapping.xml, to be found to the classpath
[--outputFolder=/path/to/folder]indicate the output folder; defaults to actions.workbench.work
[--steuerXml=/path/to/file]path to the EDIDOC archives extension XML file

Ingest: import or download MARCXML file to or from Alma

The ImportOrDownloadMarcXmlIntoAlma action imports a MARCXML file (which needs to be present in the SIP) into Alma using the Alma REST API. After import, the answer from Alma (updated MARCXML file) is used to update the MARCXML file in the SIP. If the file is not present in the SIP, the action will either fail or (if the optional 'downloadIfMissing' flag is set) download the MARCXML file from Alma API (assuming its MMS ID is stored in the root node element defined by the writeMmsIdRoot parameter). By defining an accessor in the parameters writeMmsId and writeMmsIdRoot the MMS ID of the imported/downloaded file can also be written in the mets.xml metadata of the MARCXML file node (writeMmsId) and/or of the SIP root node (writeMmsIdRoot), (optionally also with a prefix, which is given as a parameter).

Connection information for the Alma REST API are expected in the actions.properties file.

java ch.docuteam.actions.ingest.alma.ImportOrDownloadMarcXmlIntoAlma \
--sip=[/path/to/]SIP \
--marcxml=path/to/marc.xml \
[--writeMmsId] \
[--writeMmsIdRoot] \
[--mmsIdPrefix] \
[--downloadIfMissing=false] \
[--checkMatch=false] \
[--fromCzMmsId] \
[--fromNzMmsId] \
[--importProfile] \
[--normalization] \
[--overrideWarning=true] \
[--validate=false]
ParameterDescription
--sip=[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property.
--marcxml=path/to/marc.xmlMARCXML metadata file path (relative to the SIP's root node).
[--writeMmsId=refCode]EAD accessor defining the metadata element of the MARCXML file node into which the MMS ID is written after import. If empty, no MMS ID will be written for the file node.
[--writeMmsIdRoot=refCode]EAD accessor defining the metadata element of the root node into which the MMS ID is written after import. If empty, no MMS ID will be written for the root node.
[--mmsIdPrefix=alma]Prefix to be added when storing the MMS ID in a metadata element.
[--downloadIfMissing=false]Indicating whether to download a MARCXML file (based on the MMS ID stored in the element defined by the writeMmsIdRoot accessor) from Alma if the MARCXML metadata file path does not point to a file. Default: false.
[--checkMatch=false]Indicating whether to check for a match. Default: false (record will be saved despite possible match).
[--fromCzMmsId]The MMS_ID of the Community-Zone record. Leave empty when creating a regular local record.
[--fromNzMmsId]The MMS_ID of the Network-Zone record. Leave empty when creating a regular local record.
[--importProfile]The ID of the import profile to use when processing the input record. Note that according to the profile configuration, the API can update an existing record in some cases.
[--normalization]The ID of the normalization profile to run.
[--overrideWarning=true]Indicating whether to ignore warnings. Default: true (record will be saved and the warnings will be added to the API output).
[--validate=false]Indicating whether to validate the MARC XML file. Default: false.

Ingest: rename root node based on EAD metadata

This action updates the file or folder name of the root node a SIP based on EAD metadata of the root node. Special characters are normalised. If the given metadata elements exist multiple times, the first instance is used.

java ch.docuteam.actions.ingest.RenameRootNodeFromEad \
--sip=[/path/to/]SIP \
--accessorName=accessorName \
ParameterDescription
--sip=[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
--accessorName=accessorNameEAD accessor defining which EAD field of the root node is used as the source of the new file or folder name.

Ingest: update xml file in SIP using xslt

Using an XSLT file, this action can update an XML file within the SIP applying this transformation. The action sets an xsl:parameter called "pathToMets" containing the path to the mets.xml file, so it can be read out during the xsl transformation.

java ch.docuteam.actions.ingest.ModifyFileWithXSL \
--sip=[/path/to/]SIP \
--xml=path/to/file.xml \
--xsl=path/to/transformation.xsl
ParameterDescription
--sip=[path/to/]SIPname of the SIP; if no path is given, it will be expected to be in the location defined by the actions.workbench.work property
--xml=path/to/file.xmlpath to xml file within the SIP to be transformed (relative to the SIP's root node). The path accepts a wildcard (*) in place of the root folder.
--xsl=path/to/transformation.xslpath to the xsl script to be used in the transformation (if relative, assume xsl resides in $ACTIONS_HOME/xslt)