Skip to main content
Version: 2.2

Ingest

The group of operations "ingest" contains steps for the ingest process.

Harvest identifiers from OAI-PMH

This operation harvests OAI-PMH identifiers from the specified endpoint, set and date range passed as parameters. Optionally, the metadata prefix can also be given as a parameter (defaults to "oai_dc"). For every identifier retrieved, a feeder event is generated, which can be used to launch a follow-up workflow for each identifier.

The endpoint parameter should contain both base URL (e.g. https://docuteam.ch) and path (oai/request). The verb statement (?verb=ListIdentifiers) is added automatically by the action. fromDate and toDate should be given in ISO format UTC date time (e.g. 2000-01-01T00:00:00Z).

The created event has the event_type "harvestOaiPmhDate" and contains both the endpoint, set and identifier of the harvested record.

docuteam-actions harvestOaiPmhData -c [/path/to/]config.json

Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-e, --endpoint OAI-PMH endpoint used to harvest [string] [required]
-s, --set OAI-PMH set to harvest [string] [required]
-m, --prefix OAI-PMH metadata format [string]
-f, --fromDate From date time in ISO format [string] [required]
-t, --toDate To date time in ISO format [string] [required]

Harvest records from OAI-PMH

This operation harvests a single OAI-PMH record from the specified endpoint. Identifier and metadata prefix are passed as parameters. The record is then stored as file named oai.xml in the folder specified by the "path" parameter. The "path" parameter can be an absolute path or a path relative to the folder where the action is executed. If the specified path does not exist, it will be created.

The endpoint parameter should contain both base URL (e.g. https://docuteam.ch) and path (oai/request). The verb statement (?verb=GetRecord) is added automatically by the action.

docuteam-actions harvestOaiPmhRecord -c [/path/to/]config.json

Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-e, --endpoint OAI-PMH endpoint used to harvest [string] [required]
-i, --identifier OAI-PMH identifier [string] [required]
-m, --prefix OAI-PMH metadata format [string] [required]
-p, --path Path where to store the oai.xml response [string] [required]

Register URN with the German National Library

This operation registers a URN and associated URLs with the German National Library. Both URN (accessor URN) and URLs (accessor registrationURL) need to be already present in the mets.xml file (stored in the root node of the SIP). The URN to be registered can only be present once in the root node, while it is possible to have multiple URLs which need to be registered (though there must be at least one URL). The operation first checks whether the URN is already registered. If this is the case, it will try to add the URLs to the already registered URN with URL priority 2. If the URN is not yet registered, it will register both the URN and the associated URLs. The first URL will be the primary URL (priority 1). All other URLs will have priority 2. When calling the operation, the namespace for which the URN will be registered needs to be given as a parameter. The credentials for the different namespaces, together with the base URL of the API offered by the German National Library, need to be stored in the config.json file.

docuteam-actions registerDnbUrnForRootNode -c [/path/to/]config.json

Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-n, --namespace URN namespace [string] [required]
-d, --data Data file path [string] [required]

Transform CMI eCH-0160 package

This operation modifies a zipped eCH-0160 SIP package exported from CMI. In CMI it is possible to create nested folders inside dossiers and store documents in these folders. This dossier substructure is lost when converting the eCH-0160 package into a Matterhorn METS SIP. This action modifies the metadata.xml file of the eCH-0160 package, so that nested dossiers are created inside the existing dossiers (based on the folder structure), which are kept when converting to Matterhorn METS. The documents are then assigned to the newly created dossiers. With the optional parameter renameExistingDossierAndDocuments (defaults to false) it is possible to modify the metadata.xml in a way so the titles of dossiers and documents are used for naming the Matterhorn METS nodes instead of CMI IDs, resulting in a more human readable package.

docuteam-actions transformCMIECH0160Package -c [/path/to/]config.json

Options:
--version Show version number [boolean]
--debug Set log level to debug [boolean]
-c, --config Configuration file path [string] [required]
--help Show help [boolean]
-d, --data Data file path, pointing to a zipped eCH-0160 package [string] [required]
--renameExistingDossiersAndDocuments Rename existing dossiers and documents (default: false) [boolean]