Metadata and documentation

Research data has to be documented to be of any value. Findable data has at least some of the documentation in a structured form. This basic information is called descriptive metadata and is usually stored in a data catalog.

Metadata is information regarding the data, for example, where, when, why, and how the data were collected, processed and interpreted. Metadata may also contain details about experiments, analytical methods, and research context. 

Metadata elements can include descriptive and technical metadata. Descriptive metadata enables indexing, discovery and retrieval (e.g. keywords) of data sets. Technical metadata describes how data sets were produced, structured and how they should be used (e.g. file naming). Metadata concerning rights define who owns and who can access the data, and who has the right to manage it.

Consider how the data will be organized during the project. Describe e.g. your file naming conventions, version control and folder structure.

Repositories often require the use of a specific metadata standard. Check whether a discipline/community or repository based metadata schema or standard (i.e., preferred sets of metadata elements) exists that can be adopted. Discipline-specific standards can be found from the Digital Curation Centre website: http://www.dcc.ac.uk/resources/metadata-standards .

 

The metadata of the datasets usually contains

  • a persistent identifier to be used when citing the dataset or reporting re-use
  • general information about the dataset (title, field of science, keywords, content coverage, variables)
  • information about agents (creators, contributors, publisher, distributor)
  • information about access (download link or access information, rights statements and licenses)
  • information about lifecycle events and related entities
  • technical information like checksum, size, file format, media type

It is important to create relevant metadata to enable re-use and future credit. Additional metadata and documentation can be added to the dataset in separate files as parts of the dataset or within the data files.

Other important documentation might be separate metadata files or metadata included in the data files, configuration files, license deeds, code books and other data or information that is important for replication and re-use of the data.