Docuteam Dublin Core CSV 1.0
Docuteam Dublin Core CSV 1.0 is a package format that can be processed by docuteam feeder.
Definition
- The package is a .zip file containing a CSV file next to any number of other files and folders.
- docuteam feeder will create a Matterhorn METS SIP according to the metadata and the paths to the binaries found in the CSV file.
csv file
- text encoding: UTF-8
- delimiter:
;
- quote character:
"
- the CSV file includes the following columns:
Label | required | repeatable | Description |
---|---|---|---|
ID | yes | no | The CSV file must contain an ID and a ParentID column that reflect the structure of the SIP. The IDs can be assigned arbitrarily and are only used to map the hierarchy. |
ParentID | yes | no | see above |
File | yes | no | Path to files in package. Can be specified either absolute or relative to the CSV file. A mixed specification of relative/absolute paths is not allowed. |
Checksum | no | no | Algorithm: SHA-512 |
DescriptionLevel | yes | no | Level of Archival Description |
Title | yes | no | Dublin Core Title |
Identifier | no | yes | Dublin Core Creator |
Creator | no | yes | Dublin Core Creator |
Subject | no | yes | Dublin Core Subject |
Description | no | yes | Dublin Core Description |
Publisher | no | yes | Dublin Core Publisher |
Contributor | no | yes | Dublin Core Contributor |
Date | no | yes | Dublin Core Date |
Type | no | yes | Dublin Core Type |
Format | no | yes | Dublin Core Format |
Source | no | yes | Dublin Core Source |
Language | no | yes | Dublin Core Language |
Relation | no | yes | Dublin Core Relation |
Coverage | no | yes | Dublin Core Coverage |
Rights | no | yes | Dublin Core Rights |
- Repeated fields are recorded in square brackets and separated by commas
[Topic1,Topic2]
. - If the field content itself contains square brackets or commas, these must be encoded as follows:
"[""Topic with ,"",""Topic with []""]"
. - If a checksum column is present docuteam feeder will compare these checksums with the checksums generated for the Matterhorn METS SIP, thus ensuring the integrity of the files.
- The naming of the .zip file is irrelevant.
Simple Example
ZIP-File
SomeName.zip
├── metadata.csv
├── fileA.ext
├── fileB.ext
metadata.csv
ID ;ParentID ;File ;Checksum ;DescriptionLevel ;Title ;Creator ;Subject ;Coverage ;Date ;Identifier ;Description ;Publisher ;Contributor ;Type ;Format ;Source ;Language ;Relation ;Rights
1 ; ; ; ;Fonds ;Transaction ;Department A ;[Topic1,Topic2] ;2020-2022 ; ; ; ; ; ; ; ; ; ; ;
2 ;1 ;fileA.ext ;6bf6b8... ;File ;fileA.ext ; ; ; ;2020-10-12 ; ; ; ; ; ; ; ; ; ;
3 ;1 ;fileB.ext ;987654... ;File ;fileB.ext ; ; ; ;2022-03-01 ; ; ; ; ; ; ; ; ; ;
resulting structure of Matterhorn METS SIP
Transaction
├── Transaction
| ├── fileA.ext
| ├── fileB.ext
├── mets.xml
- Transaction has level
Fonds
and includes metadata forTitle
,Creator
,Subject
andCoverage
. - fileA.ext and fileB.ext have level
File
and include metadata forTitle
andDate
. - checksums in Matterhorn METS were recalculated and compared to the values in the CSV file.
Extended Example
ZIP-File
SomeName.zip
├── someOtherName.csv
├── fileA.ext
├── fileB.ext
├── FolderA
| ├── SubfolderA
| | ├── fileC.ext
| | ├── fileD.ext
| ├── fileE.ext
someOtherName.csv
ID ;ParentID ;File ;Checksum ;DescriptionLevel ;Title ;Creator ;Subject ;Coverage ;Language ;Type ;Identifier ;Description ;Publisher ;Contributor ;Date ;Format ;Source ;Relation ;Coverage ;Rights
1 ; ; ; ;Fonds ;SomeTheme ;Department A ; ;2020-2022 ;EN ; ; ; ; ; ; ; ; ; ; ;
2 ;1 ; ; ;Series ;Transaction1 ; ;[Topic1,Topic2] ;2020 ;EN ; ; ; ; ; ; ; ; ; ; ;
3 ;2 ;fileA.ext ;6bf6b8... ;Document ;fileA.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
4 ;2 ;fileB.ext ;987654... ;Document ;fileB.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
5 ;1 ; ; ;Series ;Transaction2 ; ;[Topic1,Topic3] ;2022 ;EN ; ; ; ; ; ; ; ; ; ; ;
6 ;5 ;FolderA\SubfolderA\fileC.ext ;77453b... ;Document ;fileC.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
7 ;5 ;FolderA\SubfolderA\fileD.ext ;836247... ;Document ;fileD.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
8 ;5 ;FolderA\fileE.ext ;9db428... ;Document ;fileE.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
resulting structure of Matterhorn METS SIP
SomeTheme
├── SomeTheme
| ├── Transaction1
| | ├── fileA.ext
| | ├── fileB.ext
| ├── Transaction2
| | ├── fileC.ext
| | ├── fileD.ext
| | ├── fileE.ext
├── mets.xml
Customization
A package submitted in the format Docuteam Dublin Core CSV 1.0 to docuteam feeder is converted to a Matterhorn METS SIP with the step Submission: create SIP from CSV. Part of this step is a mapping file with many customization options:
- text encoding
- delimiter
- quote character
- different csv table headers and different field mappings between the csv columns and the Matterhorn METS metadata elements
- specification of whether or not a checksum check should be carried out
Adjustments to this mapping file thus allow the package format to be adapted to the specific use case.