Ingest
The group of operations "ingest" contains steps for the ingest process.
Harvest identifiers from OAI-PMH
This operation harvests OAI-PMH identifiers from the specified endpoint, set and date range passed as parameters. Optionally, the metadata prefix can also be given as a parameter (defaults to "oai_dc"). For every identifier retrieved, a feeder event is generated, which can be used to launch a follow-up workflow for each identifier.
The endpoint parameter should contain both base URL (e.g. https://docuteam.ch) and path (oai/request). The verb statement (?verb=ListIdentifiers) is added automatically by the action. fromDate and toDate should be given in ISO format UTC date time (e.g. 2000-01-01T00:00:00Z).
The created event has the event_type "harvestOaiPmhDate" and contains both the endpoint, set and identifier of the harvested record.
docuteam-actions harvestOaiPmhData -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-e, --endpoint OAI-PMH endpoint used to harvest [string] [required]
-s, --set OAI-PMH set to harvest [string] [required]
-m, --prefix OAI-PMH metadata format [string]
-f, --fromDate From date time in ISO format [string] [required]
-t, --toDate To date time in ISO format [string] [required]
Harvest records from OAI-PMH
This operation harvests a single OAI-PMH record from the specified endpoint. Identifier and metadata prefix are passed as parameters. The record is then stored as file named oai.xml
in the folder specified by the "path" parameter.
The "path" parameter can be an absolute path or a path relative to the folder where the action is executed. If the specified path does not exist, it will be created.
The endpoint parameter should contain both base URL (e.g. https://docuteam.ch) and path (oai/request). The verb statement (?verb=GetRecord) is added automatically by the action.
docuteam-actions harvestOaiPmhRecord -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-e, --endpoint OAI-PMH endpoint used to harvest [string] [required]
-i, --identifier OAI-PMH identifier [string] [required]
-m, --prefix OAI-PMH metadata format [string] [required]
-p, --path Path where to store the oai.xml response [string] [required]
Register URN with the German National Library
This operation registers a URN and associated URLs with the German National Library. Both URN (accessor URN) and URLs (accessor registrationURL) need to be already present in the mets.xml
file (stored in the root node of the SIP).
The URN to be registered can only be present once in the root node, while it is possible to have multiple URLs which need to be registered (though there must be at least one URL).
The operation first checks whether the URN is already registered. If this is the case, it will try to add the URLs to the already registered URN with URL priority 2. If the URN is not yet registered, it will register both the URN and the associated URLs. The first URL will be the primary URL (priority 1). All other URLs will have priority 2.
The operation reads out the URN namespace from the URN provided in the mets.xml
and tries to find the corresponding credentials in the config.json
file (together with the base URL of the API). If it doesn't find the credentials, it will use the namespace given using the namespace
parameter.
docuteam-actions registerDnbUrnForRootNode -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-n, --namespace URN namespace [string] [required]
-d, --data Data file path [string] [required]
Rename file or folder
This operation can be used to rename a file or folder in the feeder workbench by specifying an old and new path to a file or folder.
Please note that while it is possible to rename files or folders inside a Matterhorn METS SIP, doing so will corrupt the SIP, as the changes to the SIP will not be registred in the structural metadata of the SIP (inside the mets.xml
file).
docuteam-actions renameFileOrFolder -c [/path/to/]config.json -o /path/to/old/folder -n /path/to/new/folder
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-o, --oldPath Source file or folder [string] [required]
-n, --newPath New file or folder [string] [required]
Transform CMI eCH-0160 package
This operation modifies a zipped eCH-0160 SIP package exported from CMI. In CMI it is possible to create nested folders inside dossiers and store documents in these folders.
This dossier substructure is lost when converting the eCH-0160 package into a Matterhorn METS SIP. This action modifies the metadata.xml file of the eCH-0160 package, so that nested dossiers are created inside the existing dossiers (based on the folder structure), which are kept when converting to Matterhorn METS. The documents are then assigned to the newly created dossiers.
With the optional parameter renameExistingDossierAndDocuments
(defaults to false) it is possible to modify the metadata.xml in a way so the titles of dossiers and documents are used for naming the Matterhorn METS nodes instead of CMI IDs, resulting in a more human readable package.
Additionally, the optional parameter maximumFolderNameLength
can be used shorten newly created or renamed metadata elements which are later used to create the folder structure in the Matterhorn METS package.
docuteam-actions transformCMIECH0160Package -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-d, --data Data file path, pointing to a zipped eCH-0160 package [string] [required]
--renameExistingDossiersAndDocuments Rename existing dossiers and documents (default: false) [boolean]
--maximumFolderNameLength Shorten metadata elements used to create folder (default: NaN) [integer]