Persistent Identifiers (PIDs)

Persistent identifiers (PIDs) consist of a defined combination of numbers and letters that make it possible to uniquely and sustainably reference objects, persons and organizations. Their use is growing steadily and is being extended to more and more areas, for example through the development of PIDs for software, research tools, data management plans, repositories or scientific conferences.

TIB supports membership-based initiatives for the establishment and widespread use of open, non-commercial PID systems that are developed according to the needs of their scientific users. These initiatives include the non-profit organizations DataCite for DOI registration, ORCID (Open Researcher and Contributor Identifier) for personal identification and ROR (Research Organization Registry) for the identification of research institutions.

 

PIDs for research objects

The Digital Object Identifier (DOI) is one of the established digital PIDs for objects in the scientific field and allows sustainable and unambiguous access to textual and non-textual research objects on the Internet. Similar to an ISBN, a DOI identifies an object and localizes it through the URL stored in the metadata. In this way, research objects can be cited reliably and sustainably in a standardized form. For research institutions that would like to register DOIs for their research objects, TIB offers the TIB DOI Consortium. Further information can be found on the subpage: Digital Object Identifier (DOI).

 

PIDs for researchers

The Open Researcher and Contributor Identifier (ORCID) is used to identify people and enables the connection of a researcher with the affiliated research organization, with a project, and with their published research output. In particular, it prevents confusion in the assignment of research achievements when authors have the same or similar names. The added value here is both for the researchers themselves and for research institutions, as the visibility and citability of research objects are increased. The TIB is involved as a project partner in the DFG-funded ORCID DE project and is actively contributing to the dissemination of ORCID in Germany. Further information on ORCID can be found here: Open Researcher and Contributor Identifier (ORCID).

 

PIDs for research organizations

The organization identifier of the Research Organization Registry (ROR) complements the landscape of open PID systems since 2019. The initiative integrates existing open as well as commercial organization identifiers into the metadata and is supported by numerous research institutions worldwide. TIB is involved in the further development and dissemination of the ROR ID. Further information: Research Organization Registry (ROR).

 

PIDs and their metadata promote Open Science

 

PIDs offer added value

  • For researchers, the main benefits of PIDs are improved discoverability of research objects and greater visibility of their own work including any underlying research data. The improved discoverability of research objects and their contextualisation through PIDs and their metadata creates greater transparency in the research process and facilitates the quality assessment of these resources. In addition, researchers benefit from the possibility of creating biographies across institutions and employers via ORCID, including all achievements such as publication lists, peer review activities, committee work and memberships. Integrations with institutional research information, PID systems, publisher systems, other bibliographic databases (e.g. Scopus), etc. enable the automatic updating of this data.
  • For librarians and repository managers, the longevity and trustworthiness of PIDs are a great advantage as well as a way of evaluating research objects and their links to researchers and research institutions. They improve and simplify processes such as cataloging and reporting.
  • For developers, PIDs facilitate cross-system collaboration. They also facilitate interoperability through standardized metadata requirements and APIs of the system providers.
  • For research funding organizations, the traceability and reportability of funded research improves. By clearly identifying the researchers involved and the research objects, and linking them to a funder and project ID, the transparency of the research output and its subsequent use increases. A network of PID systems makes this possible. Access is via a GraphQL API provided by DataCite or the DataCite Commons web service.
  • For publishers, PIDs improve the visibility and trustworthiness of their resources and provide an interoperable framework to improve the re-use of published research.

 

PID systems

A PID must fulfill several requirements: It must remove potential ambiguities of the resource to be described, it should be readable by humans and machines, it should offer reliable retrievability and it must be persistent, i.e. permanently available. In addition, a globally successful PID system depends on the popularity and acceptance of the system, a sustainable infrastructure, a business and management model, quality of documentation, compliance with standards, interoperability with other systems, and openness and willingness to cooperate, harmonize and integrate with existing services within the broad research landscape.

Most PID systems meet these requirements to varying degrees, although it should be noted that it is not possible to assess the characteristics of a PID system without considering its use cases. These may limit use for certain object types, granularity options or publication levels.

 

Overview of PID systems

  URN:NBN:DE B2HANDLE ePIC Archival Resource Key (ARK) DOI

Popularity and acceptance

45 Millionen (Stand: Ende 2020) n.s. n.s. 8,2 Milliarden (Stand: September 2020) 230 Millionen (Stand: Januar 2021)
Example https://nbn-resolving.org/urn:nbn:de:bvb:91-diss20060308-1417541491 http://hdl.handle.net/11858/00-001M-0000-002D-6078-A http://hdl.handle.net/21.012/xyz-123 http://myrepo.example.org/ark:/12345/bcd987 https://doi.org/10.5438/1dgk-1m22
Introduced 2003 n.s. 2009 2001 2000
Handle System n.s. yes yes no yes

Standardized metadata

yes n.s. n.s. no yes

Independently resolvable

yes n.s. n.s. yes yes
Coverage German speaking countries Europ Europ global global

Centralized management

yes yes yes no yes

Global communication and documentation

no no no yes yes

Interoperability with other PID systems (ORCID, ROR)

partly n.s. n.s. no yes

 

The further development of PID systems is widely supported by national and international funding organizations. For example, the EU-funded Technical and Human Infrastructure for Open Research (THOR) project worked extensively on harmonizing the metadata of DOIs for journal articles (Crossref), DOIs for datasets (DataCite) and ORCID for individuals. The follow-up project FREYA expanded the infrastructure for PIDs as a core component of open research by extending the use of identifiers to organizations (ROR) and new types of content (software, devices, physical samples, workflows, etc.). In addition, FREYA aimed to integrate other established PID systems within research communities, such as the accession number in the life sciences.

Key research funders and councils recommend the use of open standards for PIDs rather than proprietary models. For example, the German Council for Science and Humanities advocates the use of open standard PIDs in its recommendations on the Research core Dataset (Kerndatensatz Forschung, KDSF). The DOI with its ISO standard (ISO 26324) is the most widely used PID for publications and datasets and serves as the de facto standard for the global research community. The DataCite DOI and its associated metadata play a key role in the fairness of research data and are an integral part of the European and global PID landscape.

PID services are rooted in and developed by the global research community. This ensures that the community is at the center of activities to guide development, standards, policies and best practices, drive adoption, promote advocacy and participate in the governance of all research-related issues. These activities take place, for example, in the Research Data Alliance (RDA) groups. The RDA is building the social and technical networks to support Open Science. Other international initiatives such as GO FAIR focus particularly on FAIR data and services. Their output is embedded in the EOSCpilot, EOSC-hub, OpenAIRE, FREYA and all upcoming projects in the European Commission's INFRAEOSC programme, including the range of disciplines consulted through ESFRI ERICs, the Community Data Services Clusters currently being established, for example the EOSC-Life project under the EU-OPENSCREEN ERIC.

Widespread dissemination of PIDs increases the added value that PID services can create. The initiatives and projects mentioned above, such as RDA, GO FAIR, DICE, Make Data Count, FAIRsFAIR, ORCID DE and re3data COREF, help to promote this process and support FAIR data exchange. German research organizations are currently implementing social and technical infrastructures to use the PID services. Overall, the increasing dissemination and acceptance of DOIs and ORCIDs are noticeable. For example, the number of DataCite DOIs in the TIB DOI Consortium has increased from over 25,000 DOIs in 2012 to over 1.5 million DOIs in 2022. The German ORCID consortium has grown from five members in 2016 to 78 members in 2022. However, there is still a lot of untapped potential due to a lack of IT infrastructure, knowledge gaps and limited resources available at research institutions, among other things. On the researchers' side, PIDs could support data sharing and open, transparent science to a much greater extent than at present through wider application in the future.

 

PIDs provide quality

Human-, machine-readable and interoperable metadata standards are a key element for implementing quality management in repositories. The integration of PIDs is key to the establishment of such standards and is addressed by several quality management concepts. For example, the CoreTrustSeal (CTS) - merged from the former Data Seal of Approval (DSA) and the ICSU World Data System (WDS) Member Certification - the nestor seal for trusted digital archives and the ISO 31644 standard form a three-tier global framework for repository certification. All three concepts require the reliable accessibility of the metadata describing the scientific object.

Generic metadata are widely used in research communities across all disciplines. Their use facilitates interoperability between systems and disciplines, enables standards-based quality management and in this way provides reliability for the reuse of data. A general problem with generic metadata standards is the limitation to generic categories and the relative vagueness of the categories used to enable application across the widest possible range. As a result, researchers and data curators may need guidance on how and in what detail they should describe data.

In contrast, subject-specific metadata is tailored to the specifics of the data and standards used in the relevant discipline and provides very specific information that is difficult to integrate into a generic schema. For consistent and high quality metadata, it is therefore essential that PID consortia provide guidance and best practice advice on how to integrate subject-specific metadata into a generic schema.

The DataCite metadata schema is a list of core metadata properties selected for accurate and consistent identification of an object for citation and retrieval purposes. Developed and continually refined by the research community, the metadata schema supports openness and future extensibility of the schema by working with the Dublin Core Metadata Initiative (DCMI) Science and Metadata Community (SAM) to maintain a Dublin Core Application Profile for the schema.

Although the DataCite metadata schema has been expanded with each new version, each time incorporating subject-specific needs for custom-fit metadata fields, it is still designed to be generic and discipline-independent for a wide range of research objects. DataCite metadata primarily supports citation and retrieval of data; it is not intended to replace or supersede the discipline-specific metadata that fully describe the data and are essential for understanding the context and enabling reuse. However, the DataCite metadata schema provides a reliable method for linking the discipline-specific metadata to the metadata of the research object.

 

PIDs make research data FAIR

The FAIR principles, developed by the international FORCE11 initiative, are guidelines that promote the findability, accessibility, interoperability and re-usability of research data. These principles are an internationally recognised framework of minimum requirements for research data including metadata and protocols that support effective research data management.

An essential component for the implementation of the FAIR principles is the use of PIDs, which enable the unique identification of all scientific outputs, for example research data, as well as funders, research organizations, researchers and research projects. The mandatory and standardized metadata associated with PIDs make research data findable, accessible and citable.

to be Findable

Standardized PID metadata supports the findability of research data.

to be Accessible 

Example DOI: Worldwide resolvability with any internet browser. The associated URL can be updated, the DOI remains unchanged.

to be Interoperable           

Standard vocabularies and links to other PIDs, for example software DOIs, research equipment DOIs, ORCIDs in the metadata of a PID.

to be Reusable            

Citability, reputation, high quality and up-to-date metadata generate trust with other PIDs.

FAIR data management does not imply that research data are automatically published in the sense of Open Data. Not all data can be published, for example for legal reasons. As long as the conditions and ways of access are evident, the FAIR principles are respected.

Further information on the FAIR principles can be found here:

Interoperability is supported by standard vocabularies and links to other PIDs in the PID metadata. This descriptive metadata is provided by the publishing research organization in a human- and machine-readable format to facilitate PID networking for cross-cutting value-added services, for example knowledge graphs. The hosting and publishing repositories play an important role in this by implementing a comprehensive quality strategy that includes rich and up-to-date metadata, best practice PID services, licensing information, policies, quality control methods and support. In this way, repositories build confidence in the reusability of research data.

Re3data, the world's most comprehensive reference source for research data infrastructures, has taken a first step as a building block for certification of trusted FAIR repositories. Several research projects aim to improve PID services to optimize support for FAIR principles for research data. In recent years, international standards for referencing have been established, including the open standards Digital Object Identifier (DOI) and Open Researcher and Contributor Identifier (ORCID).

 

PIDs enable networks

National and European projects and initiatives build on PID services to support Open Science and knowledge aggregation through the use and linking of metadata. For example, OpenAIRE aggregates metadata from PIDs to provide information about research outputs as part of the open science platform EOSC. This includes information on provenance, data use and citation. Another core element of EOSC is the PID graph, which was developed by the EU-funded FREYA project with the aim of improving discoverability, transparency, reproducibility and quality assurance of research.

Provenance: PID services provide information about the original source of scientific objects and track changes, enhancing transparency and trust. One example is ORCID's trust principles, which provide for transparent identification of the provenance of the scientific object in each ORCID record. By working with the DOI service provider embedded in ORCID, automatic updates to the researcher's publication list are made and verified in the ORCID record. This service simplifies record maintenance and is a quality control mechanism for the data entered into ORCID.

Data use and citation: Scholix is a recommendation framework for linking digital objects to enable interoperability between existing PID metadata schemas and provide the means for, for example, tracking data citations. The Make Data Count joint project between the California Digital Library, DataCite and DataOne adapted this framework and developed data use and citation services for the research community. Another component of the project was the modification of the Counter Code of Practice to establish standardized processes, policies and metrics for the use of research data to increase recognition of data publications and incentivise data sharing and reuse. Citation metadata is extracted from dataset and article metadata and made available in DataCite's Event Data Service via public REST API.

PID Graph: The PID Graph is a model that uses existing links of PIDs described in the corresponding metadata records, including text publications that cite a research dataset or publications linked to a research institution or research funder. For this purpose, the metadata in the databases of DataCite, Crossref, ORCID, ROR and re3data are queried. The PID graph aggregates publications and citations at the level of the researcher, the scientific institution or the research funder. This data can be retrieved via GraphQL API or the DataCite Commons web service. The PID-Graph was developed in the FREYA project funded by the EU Commission.

 

TIB's role: PID and metadata services

As the first DOI registration agency for research data, TIB has built up expertise in publishing, identifying and citing research data. In 2009, TIB was the initiator and co-founder of the non-profit association DataCite, whose office is run by TIB. With more than 240 member organizations, DataCite is the leading registration agency of DOIs for research data. More than 2,000 repositories managed by research organizations worldwide use DataCite's services. TIB serves more than 140 data centers managed by German universities and research institutions with DOIs in the TIB DOI Consortium. Together, they have registered over 1.5 million DOIs since 2005. The focus of DOI registration is currently on research data (21 %). The services offered by the TIB include:

  • administrative services,
  • initial introduction,
  • provision of information and advice,
  • 1st level technical service and
  • metadata services.

TIB is the consortium leader of the German ORCID consortium with currently 78 member organizations. Together with the project members of the DFG-funded ORCID DE project, it supports the introduction of ORCID at German research institutions. As consortium leader, TIB has the following tasks:

  • Membership recruitment,
  • administrative services,
  • initial onboarding,
  • provision of information and advice, and
  • 1st level technical service.

TIB is involved in several national and international initiatives and projects with the common goal of making research data FAIR.

The Research Organization Registry (ROR) is another open, interoperable (standard API) identifier platform driven by the research community that plays an important role within the PID landscape. ROR solves the use case of affiliation by uniquely and permanently identifying the research organization and linking it to the researcher and the scientific objects. For this purpose, the ROR ID can be entered into the DataCite metadata. TIB will integrate ROR into its PID services. The integration of persistent identifiers such as DOI, ORCID and ROR is crucial for a sustainable publication culture as part of research data management. TIB is involved in current PID developments for new content or object types such as software, research tools, data management plans, repositories and conferences that will improve the visibility and re-usability of research results and the overall trustworthiness of research.