Data Documentation - Services for Research
Data documentation means describing the data, more precisely creating metadata. Metadata is the contextual information about the data and its provenance, necessary for interpreting it. Providing comprehensive metadata according to your discipline's conventions makes your data understandable, discoverable, and reusable.
Descriptive metadata, administrative metadata and structural metadata
Metadata is information regarding the data, for example, where, when, why, and how the data were collected, processed and interpreted. Metadata may also contain details about experiments, analytical methods, and research context.
Metadata is a broad term and it includes a variety of descriptive information about a dataset.
- Descriptive metadata enables discovery, reuse and correct interpretation.
- Administrative metadata defines who owns and who can access the data, and who has the right to manage it and how it can be used.
- Structural metadata describes how data sets were produced, structured and for instance what software are needed
License: CC BY 4.0
Descriptive metadata of a dataset can be divided into two subcategories: core metadata (for discovery and identification - for search and citation) and detailed descriptive metadata (variables configurations etc. - for enabling assessment and reuse).
Core metadata includes
- a persistent identifier to be used when citing the dataset or reporting re-use
- general information about the dataset (title, field of science, keywords, content coverage, variables)
- information about agents (creators, contributors, publisher, distributor)
- information about access (download link or access information, rights statements and licenses)
- information about lifecycle events and related entities (provenance)
- technical information like checksum, size, file format, media type
Detailed descriptive metadata
It is important that you create relevant metadata for reuse and future credit. If you have additional metadata that does not fit in the data catalogue, additional metadata and documentation like code books or configuration files can be added to the dataset as separate files. Metadata can also be innate within the data files. Remember that this can make the data more hard to find. If you add extra metadata:
- Use metadata standards if possible: Repositories often require the use of a specific metadata standard; structured formats that use specific vocabularies or ontologies in describing the data. Check whether a discipline/community or repository based metadata schema or standard (i.e., preferred sets of metadata elements) exists that can be adopted. Discipline-specific standards can be found from the Digital Curation Centre website .
- Some research instruments create standardised metadata formats automatically. Choose a standard which is compatible with other software, if possible.
- Use separate metadata files or metadata included in the data files, configuration files, license deeds, code books and other data or information that is important for replication and reuse of the data.
- Readme file(s) providing information about data files to ensure correct interpretation
- Data dictionary / Code book explaining variables in the data and gathering codes used in a dataset.
Also think about your file naming conventions, directory structure and version control. Read more from Files and file formats page.
Administrative metadata includes information about rights of the dataset. This means information about license, type of restriction and reason for it (ethical, legal etc.), embargo time, owner of the rights, contact for reuse as well as how to apply for use permit and access.
Other categories of administrative metadata include technical metadata (file types etc. information needed for rendering files) and preservation metadata.
Structural metadata describes how the dataset is organised internally and how does it relate to other datasets (managing versions etc.).