Identifiers and dataset versioning - Services for Research
Persistent identifiers are used when citing and managing research data and information. Persistence in a digital context always means good lifecycle management. Identification implies that the object or content the identifier represents doesn't change over time. A persistent identifier is globally unique and documented. A persistent identifier should be machine actionable, so that it for instance resolves to a web page that (re)presents the content.
The FAIR data principles require that sustainable and trustworthy research data management and citation is enabled by persistent identifiers. The Finnish research information ecosystem relies on linked data, which also requires persistent identifiers. Linked data often relies on so called cool URIs, which calls for additional attention when choosing domain names to keep them persistent. Link rot or content drift should not be allowed. A persistent identifier should never be reused.
It is essential that references and links are accurate for research to be replicable. If a new dataset version is created it should have a new unique identifier. Citation should be made clear and references between dataset versions unambiguous and machine readable.
- More information about persistent identifiers for researchers
- More information about persistent identifiers for organizations
- More reading about persistent identifiers
Persistent identifiers offer managed ways to link and tag digital information. By using identifiers like DOI or ORCID when you publish or cite data, the linking is protected despite changes in names or organisation. Identifiers are globally unique, which means that you can be sure you have the correct dataset at your hands or that you get credit for your publications.
You can read more about the researcher and contributor id ORCID and about guidelines for data citation. Do not hesitate to contact the research data services or library in your own organisation for further help. The more persistent identifiers you can include in your work flows, the better and easier is your information management.
Two-tier identifiers are protected by an extra layer of resolving. This way a link can be kept as opaque and stable as possible. When the identifier is independent of organizational and administrative semantics and does not contain natural languages, it is free from problems that arise when technology or administration changes. It is also not bound to any one language, but neutral.
A certain amount of semantics is good to allow for namespace management and for more explicit scopes but it should be tied to the type of content, not ownership. For scientific citation of research the DOI is often used and researchers are identifier with the help of ORCID.
Persistent identifiers are minted and allocated by services and research organizations. CSC coordinates the Finnish ORCID consortium and a DataCite service for DOI. The National library is responsible for the URN service, which is suitable for instance for web publications. You can read more about the URN service here.
Whichever service for persistent identifiers you want to offer your customers, the need for trustworthy quality management is the same. It is the responsibility of the organization that manages the information to keep the links working and to monitor the data quality and life cycle of the data. A policy for persistent identifiers is an integral part of a data policy and all information management.
Organizations are required to manage persistent identifiers in order to implement good researcher services and an efficient service architecture. Implementation of national architectures also requires organizations to pay attention to semantic interoperability and to enable the linkage of information. This also requires management of identifying identifiers and their persistence. In an ideal situation, clicking on an online tag you always access to the original, individualized information, and the machine can interpret this link as well as understand what type of content or issue it is about.
Both the internal solutions for organizations and the external PID services are available. There are different levels of guidance and identification services for spatial data, publications and digital resources, researchers, and education. Finto is a Finnish thesaurus and ontology service, which offers identifiers. It also contains the administrative sector's vocabularies and discipline classification. The national Name Information Service is also being developed. The use of common identifiers is recommended as it generally facilitates operations considerably.
If you have your own systems with online identifiers, make sure that they are at least equivalent to the EU Guidelines and W3C Working group note. If it is a normal URI, its stability must be taken care of. When choosing an external service, it is necessary to check that the system is technically reliable, authoritative, flexible in terms of metadata presentation and interoperable with its own and national systems. There is also a need to consider the need for resolving.
Various PID systems are being used more and more often, since simple URIs may not be sufficiently stable when web addresses, sites, or organization structures change. When choosing a domain, it is preferable to choose a domain name that best describes that data source and not the domain name with the name of the organization. However, URI tags may be permanent as long as the organization managing them owns the right to that network address. The PID tags are, in turn, persistent as long as the service exists, and are not affected, for example, by the change of the website address. Maintaining a PID system, such as Handle, requires continuous technical maintenance and expertise. With the content negotiation mechanism, the system can also adapt its response depending on the query agent so that a web browser, for example, will be responded with a html document and an RDF reader receives an RDF file. This will allow the new formats to be added to the system later.
Research organizations have a great responsibility to manage the identifiers and their persistence. In order for researchers to safely refer to publications and data, and to bring visibility and impact to themselves and to their organization, organizations must support and guide researchers using identifiers. The management of research data and bibliometrics are also greatly facilitated by the controlled use of identifiers.
CSC provides guidance and services for organizations for allocating and minting persistent identifiers. For more information and support contact CSC PID services at email@example.com
Webinar recording in Finnish: Tukea pysyvien tunnisteiden hyödyntämiseen -CSC:n PID-palvelut
Support for Persistent Identifiers (at CSC)
Digital Preservation Handbook by Digital Preservation Coalition