Data Documentation

Data documentation means describing the data, more precisely creating metadata. Metadata is the contextual information about the data and its provenance, necessary for interpreting it. Providing comprehensive metadata according to your discipline's conventions makes your data understandable, discoverable, and reusable.

Descriptive metadata, administrative metadata and structural metadata

Metadata is information regarding the data, for example, where, when, why, and how the data were collected, processed and interpreted. Metadata may also contain details about experiments, analytical methods, and research context.

Metadata is a broad term and it includes a variety of descriptive information about a dataset.

  1. Descriptive metadata enables discovery, reuse and correct interpretation.
  2. Administrative metadata defines who owns and who can access the data, and who has the right to manage it and how it can be used.
  3. Structural metadata describes how data sets were produced, structured and for instance what software are needed

Types of metadata: Descriptive: Who created the data, What does it contain, How to interpret the data, Persistent identifier, When was it published. Administrative: Who owns the data, Restrictions and embargo times, Who gives access to it, How can the data be used. Structural: How is the dataset organized, Relations to other datasets (e.g. version).

License: CC BY 4.0


Descriptive metadata

Descriptive metadata of a dataset can be divided into two subcategories: core metadata (for discovery and identification - for search and citation) and detailed descriptive metadata (variables configurations etc. - for enabling assessment and reuse).

Core metadata includes

  • a persistent identifier to be used when citing the dataset or reporting re-use
  • general information about the dataset (title, field of science, keywords, content coverage, variables)
  • information about agents (creators, contributors, publisher, distributor)
  • information about access (download link or access information, rights statements and licenses)
  • information about lifecycle events and related entities (provenance)
  • technical information like checksum, size, file format, media type

You can use Qvain - Research Dataset Description Tool , to create core metadata for your dataset. It will be published in Etsin - Research Data Finder .

Detailed descriptive metadata

It is important that you create relevant metadata for reuse and future credit. If you have additional metadata that does not fit in the data catalogue, additional metadata and documentation like code books or configuration files can be added to the dataset as separate files. Metadata can also be innate within the data files. Remember that this can make the data more hard to find. If you add extra metadata:

  1. Use metadata standards if possible: Repositories often require the use of a specific metadata standard; structured formats that use specific vocabularies or ontologies in describing the data. Check whether a discipline/community or repository based metadata schema or standard (i.e., preferred sets of metadata elements) exists that can be adopted. Discipline-specific standards can be found from the Digital Curation Centre website .
    • Some research instruments create standardised metadata formats automatically. Choose a standard which is compatible with other software, if possible.
  2. Use separate metadata files or metadata included in the data files, configuration files, license deeds, code books and other data or information that is important for replication and reuse of the data.
    • Readme file(s) providing information about data files to ensure correct interpretation
    • Data dictionary / Code book explaining variables in the data and gathering codes used in a dataset.

Also think about your file naming conventions, directory structure and version control. Read more from Files and file formats page.

Administrative metadata

Administrative metadata includes information about rights of the dataset. This means information about license, type of restriction and reason for it (ethical, legal etc.), embargo time, owner of the rights, contact for reuse as well as how to apply for use permit and access.

Other categories of administrative metadata include technical metadata (file types etc. information needed for rendering files) and preservation metadata.

Structural metadata

Structural metadata describes how the dataset is organised internally and how does it relate to other datasets (managing versions etc.).