Ingest
The group of operations "ingest" contains steps for the ingest process.
Download package from pre-ingest component (customer specific action/2118)
This operation uses an API to download a package from the a pre-ingest component. Based on connection details in config.json
an API request to preIngestComponent.baseUrl/api/sips/package_id/download
is executed and the package (in zip format) is downloaded and unzipped to the location defined with the --path
parameter.
docuteam-actions downloadPackageFromPreIngestComponent -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-p, --path Path where to store the extracted zip file [string] [required]
-i, --packageId ID of the package to download [string] [required]
Harvest identifiers from OAI-PMH
This operation harvests OAI-PMH identifiers from the specified endpoint, set and date range passed as parameters. Optionally, the metadata prefix can also be given as a parameter (defaults to "oai_dc"). For every identifier retrieved, a feeder event is generated, which can be used to launch a follow-up workflow for each identifier.
The endpoint parameter should contain both base URL (e.g. https://docuteam.ch) and path (oai/request). The verb statement (?verb=ListIdentifiers) is added automatically by the action. fromDate and toDate should be given in ISO format UTC date time (e.g. 2000-01-01T00:00:00Z).
The created event has the event_type "harvestOaiPmhDate" and contains both the endpoint, set and identifier of the harvested record.
docuteam-actions harvestOaiPmhData -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-e, --endpoint OAI-PMH endpoint used to harvest [string] [required]
-s, --set OAI-PMH set to harvest [string] [required]
-m, --prefix OAI-PMH metadata format [string]
-f, --fromDate From date time in ISO format [string] [required]
-t, --toDate To date time in ISO format [string] [required]
Harvest records from OAI-PMH
This operation harvests a single OAI-PMH record from the specified endpoint. Identifier and metadata prefix are passed as parameters. The record is then stored as file named oai.xml
in the folder specified by the "path" parameter.
The "path" parameter can be an absolute path or a path relative to the folder where the action is executed. If the specified path does not exist, it will be created.
The endpoint parameter should contain both base URL (e.g. https://docuteam.ch) and path (oai/request). The verb statement (?verb=GetRecord) is added automatically by the action.
docuteam-actions harvestOaiPmhRecord -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-e, --endpoint OAI-PMH endpoint used to harvest [string] [required]
-i, --identifier OAI-PMH identifier [string] [required]
-m, --prefix OAI-PMH metadata format [string] [required]
-p, --path Path where to store the oai.xml response [string] [required]
List available packages from pre-ingest component (customer specific action/2118)
This operation uses an API to list packages ready for ingest which are prepared by a pre-ingest component. Based on connection details in config.json
an API request to preIngestComponent.baseUrl/api/sips
is executed and should return json data about packages ready for ingest. For each package, the action will create a feeder event with event type NewPreIngestPackage
and the package type, package id and package channel id.
docuteam-actions listAvailablePackagesFromPreIngestComponent -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
Register URN with the German National Library
This operation registers a URN and associated URLs with the German National Library. Both URN (accessor URN) and URLs (accessor registrationURL) need to be already present in the mets.xml
file (stored in the root node of the SIP).
The URN to be registered can only be present once in the root node, while it is possible to have multiple URLs which need to be registered (though there must be at least one URL).
The operation first checks whether the URN is already registered. If this is the case, it will try to add the URLs to the already registered URN with URL priority 2. If the URN is not yet registered, it will register both the URN and the associated URLs. The first URL will be the primary URL (priority 1). All other URLs will have priority 2.
The operation reads out the URN namespace from the URN provided in the mets.xml
and tries to find the corresponding credentials in the config.json
file (together with the base URL of the API). If it doesn't find the credentials, it will use the namespace given using the namespace
parameter.
docuteam-actions registerDnbUrnForRootNode -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-n, --namespace URN namespace [string] [required]
-d, --data Data file path [string] [required]
Rename file or folder
This operation can be used to rename a file or folder in the feeder workbench by specifying an old and new path to a file or folder.
Please note that while it is possible to rename files or folders inside a Matterhorn METS SIP, doing so will corrupt the SIP, as the changes to the SIP will not be registred in the structural metadata of the SIP (inside the mets.xml
file).
docuteam-actions renameFileOrFolder -c [/path/to/]config.json -o /path/to/old/folder -n /path/to/new/folder
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-o, --oldPath Source file or folder [string] [required]
-n, --newPath New file or folder [string] [required]
Report package status to pre-ingest component (customer specific action/2118)
This operation uses an API to update the status of a package downloadef from a pre-ingest component. Based on connection details in config.json
an API request using PATCH to preIngestComponent.baseUrl/api/sips/package_id
is executed in order to update the status of the package.
The following status can be set using this action:
transmitted
error
: Using this state it is necessary to add an error message using theerrorMessage
parameter.archived
: Using this state, the PID, refCode and refCodeAdmin are read out from the root node of the package defined by thepath
parameter and sent back to the pre-ingest component. If refCodeAdmin does not exist, filePlanPosition is read out instead.
docuteam-actions reportPackageStatusToPreIngestComponent -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-p, --path Path pointing to a folder containing a package [string] [required]
-i, --packageId ID of the package whose status will be updated [string] [required]
-s, --packageStatus New package status [string] [required]
-e, --errorMessage Optional error message [string]
Send ingests to access component (customer specific action/2118)
This operation creates and sends a DIP to an access component. Based on a Matterhorn METS SIP stored in the workbench (parameter inputData
) it will parse the METS and will decide based on the accessPolicy
metadata element, if a DIP will need to be assembled.
If the root folder of the package has accessPolicy
01, 02 or 03, the DIP will be assembled and sent to the access component. However file or folder nodes which have a policy other than 01, 02 or 03 will not be copied to the temporary folder, where the SIP will be assembled (defined with the data
parameter). File and folder nodes without a policy inherit the policy of their ancestors. Nodes with an invalid policy count as blocked.
The assembled DIP (including the full mets.xml
file) will then be zipped and uploaded to an S3 storage defined in config.json
.
Additionally, a addDip
message is sent to the message queue of the access component defined in config.json
. The mets.xml
is attached to the message.
If the root folder of the package has a policy 04 or 05 (metadata only), no package is uploaded to the S3 storage, however a message with type addDip
will still be sent.
If the root folder of the package has a policy 06, 07 or 08 (blocked), no package is uploaded to the S3 storage, however a message with type deleteDip
will still be sent.
docuteam-actions sendIngestsToNbAccessComponent -c [/path/to/]config.json -i /path/to/workbench/folder -d /path/to/temporary/folder
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-i, --inputData Path to folder with Matterhorn METS SIP [string] [required]
-d, --data Path to folder where the DIP will be assembled [string] [required]
Send updates to access component (customer specific action/2118)
This operation downloads data from box and send a DIP to an access component. In comparison to the sendIngestsToNbAccessComponent
action, it does not require a Matterhorn METS SIP, but instead downloads first the mets.xml and then the files which need to be delivered to the access component from box (excluding blocked and metadata only file or folders). It also supports sending messsages for data and metadata updates from box (which can be triggered by webhooks).
The exact behavior of the action is defined by the operation
parameter. It support the following values: insert
, replace
, append
, changeMetadata
.
The insert
operation duplicates the behavior of the sendIngestsToNbAccessComponent
action. It needs to be called with the PID of the root folder of the package.
The replace
operation needs to be called with PID of a file which was replaced in box. Based on this PID, it reads out the root folder PID from box and uses that to assemble the full package (excluding blocked and metadata only nodes). When sending the message to the access component, it uses the types update_file
or delete_dip
and sends both the root PID and the PID of the replaced file.
The append
operation needs to be called with PID of a file or folder which was appended to an existing AIP. Based on this PID, it reads out the root folder PID from box and uses that to assemble the full package (excluding blocked and metadata only nodes). When sending the message to the access component, it uses the types append_file
, append_folder
or delete_dip
and sends both the root PID and the PID of the appended file or folder.
The changeMetadata
operation needs to be called with PID of a root folder, folder or file whose metadata changed in box. When using this operation it is also necessary to use the oldAccessPolicy
parameter to tell the action the old access policy of the node in question (before data was changed).
If the policy changed and switched from one of the three groups (allowed, metadata only or blocked) to another, a DIP will be assembled and a message of type change_access_rights_dip_new
, change_access_rights_folder_new
, change_access_rights_file_new
or delete_dip
will be sent. If the policy did not change (or only inside the same group), no package is assembled and a message of type change_ead_meta_dip
change_ead_meta_folder
or change_ead_meta_file
is sent.
docuteam-actions sendUpdatesToNbAccessComponent -c [/path/to/]config.json -pid PID -d /path/to/temporary/folder --operation insert
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
--pid PID for which a DIP will be assembled [string] [required]
-d, --data Path to folder where the DIP will be assembled [string] [required]
--operation PID for which a DIP will be assembled [string] [required]
--oldAccessPolicy PID for which a DIP will be assembled [string] [required]
Transform CMI eCH-0160 package
This operation modifies a zipped eCH-0160 SIP package exported from CMI. In CMI it is possible to create nested folders inside dossiers and store documents in these folders.
This dossier substructure is lost when converting the eCH-0160 package into a Matterhorn METS SIP. This action modifies the metadata.xml file of the eCH-0160 package, so that nested dossiers are created inside the existing dossiers (based on the folder structure), which are kept when converting to Matterhorn METS. The documents are then assigned to the newly created dossiers.
With the optional parameter renameExistingDossierAndDocuments
(defaults to false) it is possible to modify the metadata.xml in a way so the titles of dossiers and documents are used for naming the Matterhorn METS nodes instead of CMI IDs, resulting in a more human readable package.
Additionally, the optional parameter maximumFolderNameLength
can be used shorten newly created or renamed metadata elements which are later used to create the folder structure in the Matterhorn METS package.
docuteam-actions transformCMIECH0160Package -c [/path/to/]config.json
Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-d, --data Data file path, pointing to a zipped eCH-0160 package [string] [required]
--renameExistingDossiersAndDocuments Rename existing dossiers and documents (default: false) [boolean]
--maximumFolderNameLength Shorten metadata elements used to create folder (default: NaN) [integer]