Skip to main content

Docuteam Dublin Core CSV 1.0

Docuteam Dublin Core CSV 1.0 is a package format that can be processed by docuteam feeder.

Definition

  • The package is a .zip file containing a CSV file next to any number of other files and folders.
  • docuteam feeder will create a Matterhorn METS SIP according to the metadata and the paths to the binaries found in the CSV file.

csv file

  • text encoding: UTF-8
  • delimiter: ;
  • quote character: "
  • the CSV file includes the following columns:
LabelrequiredrepeatableDescription
IDyesnoThe CSV file must contain an ID and a ParentID column that reflect the structure of the SIP. The IDs can be assigned arbitrarily and are only used to map the hierarchy.
ParentIDyesnosee above
FileyesnoPath to files in package. Can be specified either absolute or relative to the CSV file. A mixed specification of relative/absolute paths is not allowed.
ChecksumnonoAlgorithm: SHA-512
DescriptionLevelyesnoLevel of Archival Description
TitleyesnoDublin Core Title
IdentifiernoyesDublin Core Creator
CreatornoyesDublin Core Creator
SubjectnoyesDublin Core Subject
DescriptionnoyesDublin Core Description
PublishernoyesDublin Core Publisher
ContributornoyesDublin Core Contributor
DatenoyesDublin Core Date
TypenoyesDublin Core Type
FormatnoyesDublin Core Format
SourcenoyesDublin Core Source
LanguagenoyesDublin Core Language
RelationnoyesDublin Core Relation
CoveragenoyesDublin Core Coverage
RightsnoyesDublin Core Rights

  • Repeated fields are recorded in square brackets and separated by commas [Topic1,Topic2].
  • If the field content itself contains square brackets or commas, these must be encoded as follows: "[""Topic with ,"",""Topic with []""]".
  • If a checksum column is present docuteam feeder will compare these checksums with the checksums generated for the Matterhorn METS SIP, thus ensuring the integrity of the files.
  • The naming of the .zip file is irrelevant.

Simple Example

ZIP-File

SomeName.zip
├── metadata.csv
├── fileA.ext
├── fileB.ext

metadata.csv

ID ;ParentID  ;File       ;Checksum   ;DescriptionLevel  ;Title        ;Creator       ;Subject         ;Coverage  ;Date       ;Identifier ;Description ;Publisher ;Contributor ;Type ;Format ;Source ;Language ;Relation ;Rights
1 ; ; ; ;Fonds ;Transaction ;Department A ;[Topic1,Topic2] ;2020-2022 ; ; ; ; ; ; ; ; ; ; ;
2 ;1 ;fileA.ext ;6bf6b8... ;File ;fileA.ext ; ; ; ;2020-10-12 ; ; ; ; ; ; ; ; ; ;
3 ;1 ;fileB.ext ;987654... ;File ;fileB.ext ; ; ; ;2022-03-01 ; ; ; ; ; ; ; ; ; ;

resulting structure of Matterhorn METS SIP

Transaction
├── Transaction
| ├── fileA.ext
| ├── fileB.ext
├── mets.xml
  • Transaction has level Fonds and includes metadata for Title, Creator, Subject and Coverage.
  • fileA.ext and fileB.ext have level File and include metadata for Title and Date.
  • checksums in Matterhorn METS were recalculated and compared to the values in the CSV file.

Extended Example

ZIP-File

SomeName.zip
├── someOtherName.csv
├── fileA.ext
├── fileB.ext
├── FolderA
| ├── SubfolderA
| | ├── fileC.ext
| | ├── fileD.ext
| ├── fileE.ext

someOtherName.csv

ID ;ParentID  ;File                          ;Checksum   ;DescriptionLevel    ;Title        ;Creator       ;Subject         ;Coverage  ;Language ;Type ;Identifier ;Description ;Publisher ;Contributor ;Date ;Format ;Source ;Relation ;Coverage ;Rights
1 ; ; ; ;Fonds ;SomeTheme ;Department A ; ;2020-2022 ;EN ; ; ; ; ; ; ; ; ; ; ;
2 ;1 ; ; ;Series ;Transaction1 ; ;[Topic1,Topic2] ;2020 ;EN ; ; ; ; ; ; ; ; ; ; ;
3 ;2 ;fileA.ext ;6bf6b8... ;Document ;fileA.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
4 ;2 ;fileB.ext ;987654... ;Document ;fileB.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
5 ;1 ; ; ;Series ;Transaction2 ; ;[Topic1,Topic3] ;2022 ;EN ; ; ; ; ; ; ; ; ; ; ;
6 ;5 ;FolderA\SubfolderA\fileC.ext ;77453b... ;Document ;fileC.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
7 ;5 ;FolderA\SubfolderA\fileD.ext ;836247... ;Document ;fileD.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;
8 ;5 ;FolderA\fileE.ext ;9db428... ;Document ;fileE.ext ; ; ; ; ;Text ; ; ; ; ; ; ; ; ; ;

resulting structure of Matterhorn METS SIP

SomeTheme
├── SomeTheme
| ├── Transaction1
| | ├── fileA.ext
| | ├── fileB.ext
| ├── Transaction2
| | ├── fileC.ext
| | ├── fileD.ext
| | ├── fileE.ext
├── mets.xml

Customization

A package submitted in the format Docuteam Dublin Core CSV 1.0 to docuteam feeder is converted to a Matterhorn METS SIP with the step Submission: create SIP from CSV. Part of this step is a mapping file with many customization options:

  • text encoding
  • delimiter
  • quote character
  • different csv table headers and different field mappings between the csv columns and the Matterhorn METS metadata elements
  • specification of whether or not a checksum check should be carried out

Adjustments to this mapping file thus allow the package format to be adapted to the specific use case.