Follow @IDAHO on Mastodon!
Workshop on PIDs for physical objects
The use of persistent identifiers (PIDs) is an essential component of an open scientific landscape in the digital age. They improve the findability, accessibility, interoperability and reusability (FAIR) of various research-related objects and allow the scientific process to become more transparent and interconnected.
The adoption of PIDs for physical objects such as different types of samples or artefacts has so far been common practice mostly in the earth and life sciences. Broader application could improve the discoverability and accessibility of these resources by placing data in a broader context. Interoperable metadata standards and standardised forms of documentation can promote collaboration between disciplines and increase the reusability of data.
The “PID Network Germany” project held a workshop on “PIDs for physical objects” at the Helmholtz Centre for Geosciences on February 20, 2025. A hybrid streaming of the presentations was possible. 30 participants came to Potsdam's Telegrafenberg (despite the rail strike), 70 others took part online.
The workshop, which took place after the RDA DE 2025 conference, provided a platform for discussing areas of application, challenges and solutions for the introduction of PIDs. The aim of the event was to promote a lively exchange on best practices, challenges and possible solutions. A particular focus was on the exchange of ideas for a national PID roadmap. The focus was on the active participation of the research community.
The presentations are documented below. Next is a summary of the results of the exchange formats.
Exchange with the participants
The presentations were followed by two different exchange formats. On the one hand, a World Café was held with the topics: (1) challenges in implementing PIDs, (2) success stories and best practices and (3) metadata standards. This was followed by a group discussion on the topic of the PID roadmap.
World Café
Challenges in the Implementation of PIDs
This World Cafe session explored the multifaceted challenges and opportunities surrounding the adoption of PIDs in general and PIDs for physical objects in particular. The discussion covered both technical hurdles and the need for cultural change within the research community.
Aspects that were discussed:
- Infrastructure & Interoperability: A central concern was the difficulty of aligning diverse research infrastructures and systems. Participants emphasized the need for automated PID assignment, but also acknowledged the associated costs and the importance of interoperability between different PID systems.
- Data & Metadata Quality: Maintaining clean, accurate, and up-to-date data and metadata was repeatedly identified as crucial. This includes addressing issues of granularity (like geolocation) and ensuring metadata reflects changes in sample location or data versioning.
- Cultural Shift & Researcher Buy-in: A significant challenge lies in convincing researchers to adopt PID practices. Participants discussed the need for increased awareness, training, and demonstrating the benefits of PIDs beyond simply data access. Many researchers currently prioritize data itself and see PIDs as an added burden.
- Sustainability & Funding: Long-term sustainability of PID systems requires dedicated financial resources for IT infrastructure, maintenance, and ongoing metadata curation. Scalability of these efforts is a key concern.
- Access & Registration Barriers: Limited access to registration services (like IGSN) and potential embargo periods for metadata disclosure can hinder implementation.
- Potential Applications: The discussion touched on the potential for applying PID principles in specific fields, such as archaeology.
Overall, the session highlighted the need for a holistic approach to PID adoption, encompassing technical solutions, cultural shifts, and sustainable funding models. Participants emphasized the importance of fostering collaboration and addressing the diverse needs of the research community to unlock the full potential of PIDs
Success Stories and Best Practices
It became clear that the IGSN is a standard PID for many physical objects. For cases in which no external PID is used, at least internal IDs are often existent. The consensus of the participants was that having internal IDs is the minimum of good practice, but using PIDs is best practice. Examples given ranged from drilling cores (which were mentioned repeatedly, and for which IGSN are commonly used), to collection items from the Natural History Museum of Berlin (using internal IDs) and geological samples (such as soil and plants, usually using IGSN). Another example was the GenBank, the genetic sequence database of the National Institutes of Health, which uses internal IDs. The ensuing discussion centered on improving the implementation and benefits of PIDs for physical objects, along three aspects.
Core issues:
- Timing of PID assignment: PIDs should be assigned immediately upon specimen/object collection, not later in the process.
- Workflows: Successful PID implementation requires meaningful workflows for assignment, metadata creation, and distribution.
- Metadata is undervalued: Prioritizing metadata alongside PIDs is essential for maximizing value.
- Lack of connection: Researchers often lack access to PID expertise.
Steps for improvement:
- Training & education: Intensive training on PID workflows is needed to foster enthusiasm and adoption.
- Seamless integration: PIDs need to be easily integrated into existing collection databases and research systems.
- Good practice Guidelines: Development and promotion of clear good practice guidelines are vital.
- Early assignment/reservation: Assign or reserve PIDs as early as possible (even before publication) to save time and accommodate embargoes.
Communicating value:
- Researcher benefit: Highlight how PIDs add value to research (e.g., improved data understanding).
- Metadata importance: Emphasize the role of metadata in unlocking the full potential of PIDs.
- Accessibility & visibility: Focus on improving how PID-registered objects are found and accessed.
In essence, the discussion emphasizes that PIDs are most effective when integrated into a well-designed workflow, supported by training, and communicated effectively to all stakeholders.
Metadata
This breakout session focused on improving metadata practices for describing physical objects. Participants identified key metadata elements and discussed pathways to better standardization and cooperation.
Key Takeaways:
- Relevant Metadata for Physical Objects: Core metadata for describing physical objects includes material, sample type, measurements (size/volume), date, origin/provenance, and contributor roles. Emphasis was placed on detailed origin information (including subtype - instrument, living organism etc.) and tracking provenance history. The importance of clearly documenting where a sample is located was repeatedly highlighted.
- Adapting Standards: The group advocated for leveraging ontologies and existing vocabularies (example SKOS, Simple Knowledge Organization System). A key process identified was a cyclical approach: starting with identified needs, building community consensus with experts, applying & approving standards in an agile manner, and then revisiting needs based on implementation.
- Promoting Cooperation: While time was limited, the discussion underscored the need for collaboration between research institutions, museums, and other organizations. Priorities include establishing a PID system for physical objects and acknowledging the diversity of needs and objects requiring description. Accessibility for both machines and humans was also noted as crucial.
Overall, the session highlighted a desire for more robust, standardized, and collaborative metadata practices to improve the accessibility, visibility, and long-term preservation of physical objects.
PID Roadmap
The focus of the group work was the development of a PID roadmap for Germany. The primary aim is to lay the foundations for a future national PID strategy. While countries such as Australia, Ireland and the Czech Republic have already made progress with national PID strategies, Germany must first draw up a roadmap due to the complexity of its federal structure and the need to involve numerous interest groups. Obtaining and incorporating feedback from the scientific community is crucial to ensure that the roadmap reflects diverse needs and promotes broad acceptance.
The initial phase will focus on raising awareness of the benefits of PID and openly addressing the associated challenges. The roadmap is seen as a preliminary step, with full implementation of the strategy requiring a broader consensus and the overcoming of structural hurdles.
The overarching message from the group discussions is that a successful PID roadmap for Germany needs to be holistic and community-driven. On the other hand, it is clear that the roadmap is not sufficient for all applications and that the addressees must be precisely defined in advance. It's not just about having PIDs, but about creating a sustainable ecosystem that supports their widespread adoption and effective use, particularly for physical objects.
Here is a summary of the various aspects that were discussed with regard to the importance of a roadmap
- Strategic & Bottom-Up Approach: Recommendations should come from the community but be directed towards strategic decision-makers.
- Addressee: Who is the intended recipient of this message? Consider the broadness and heterogeneity of the communities.
- Sustainability is essential: Long-term funding for infrastructure (repositories, minting services) and personnel (data stewards, collection managers) is crucial. Project-based funding is insufficient.
- Interoperability & Standardization: Focus on enabling international collaboration through standardized PIDs (ORCID, ROR, IGSN) and metadata schemas. Harmonization of metadata schema.
- Community Building & Education: Note on investment in training for developers, researchers, and data professionals. Promote awareness and build a network of competence.
- Incentivization & Mandates: Explore incentives for PID adoption (e.g., ORCID profiles) and consider mandates (like a “journal standard”) where appropriate.
- Practical Tools & Support: Refer to low-barrier PID creation tools and support services for users. Working out the role of repositories.
- Benefits: Highlight the benefits of PIDs through case studies and examples.
- Address Risks & Vulnerabilities: Acknowledge potential vulnerabilities (cyberattacks) and explore decentralized solutions.
- Future-Proofing: Consider how PIDs can support automation, machine-to-machine communication, and the long-term preservation of research data.
Presentations
Presentation | Abstract | Speaker |
Distribution of PIDs for physical objects in Germany | The presentation focuses on the dissemination of PIDs for physical objects. The PID Network project conducted an extensive survey in science and culture in 2024. These results will be examined in more detail, supplemented by a temporal visualisation. | Andreas Czerniak (Bielefeld University Library) |
PID4Cat: Persistent Identifiers for Catalysis Research | This talk introduces PID4Cat, a new solution for handle-based persistent identifiers (PIDs) that stores PID-related metadata in the handle record. Its generic metadata model is described as a LinkML model. The first application is in catalysis research. We will discuss the importance of PIDs in ensuring FAIR data principles and how PID4Cat facilitates early-stage data sharing and collaboration within the NFDI4Cat community. Additionally, we will cover the technical implementation of PID4Cat and its integration with services benefiting from automatic code generation from the PID4Cat-model. | David Linke (Leibniz Institute for Catalysis. e.V.) Preston Rodrigues (High-Performance Computing Center Stuttgart (HLRS), University of Stuttgart) |
IGSN – International Generic Sample Numbers: Uniquely identifying your samples | This presentation will provide an overview of International Generic Sample Numbers (IGSNs) and their significance, with a focus on the services available in Germany and at the Helmholtz Centre for Geoscience (GFZ). In addition, the latest results of the HMC project FAIR WISH, which is carried out in cooperation with the GFZ, the Alfred Wegener Institute (AWI) and Hereon, will be presented. This talk emphasises the importance of IGSNs for the scientific community and illustrates their application in improving data availability and usability. | Kirsten Elger (Helmholtz Center for Geosciences, GFZ) |
The persistent identification of archaeological object data in iDAI.world | iDAI.world is the digital research infrastructure of the German Archaeological Institute. It comprises systems for recording, documenting, analysing, storing, visualising and publishing research data. In addition to information on objects, buildings and geodata, it also includes contextual information on field research and scientific data. To date, the DAI has primarily used its own unique identifiers to address the object data. PIDs are used in the context of publications and in some cases for sample identification, and a technical concept that enables PID-supported citation is now being implemented as part of the DFG project ‘CiVers’. Concepts and issues relating to the modelling, publication and citation of object data are also being discussed together with the community. | Fabian Riebschläger Marcel Riedel (German Archaeological Institute, DAI) |
Application of PIDs and digital twins of plant genetic resources at IPK Gatersleben https://doi.org/10.5281/zenodo.15005668
| The Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) is a leading international plant science institute with a research focus on biodiversity and crop performance. In order to implement a sustainable data and material management infrastructure three pillars has been built in the last 15 years. An institutional policy for research data management, defined processes and a technical infrastructure. The technical backbone is a Research and Laboratory Information Management System (RALIMS) which was established 2011 and is operated as general-purpose data management system across all research groups and departments. This RALIMS based ecosystem of databases, file storage, desktop clients, web applications and APIs serves two major classes of data management processes. The first class are service processes for centrally managed instruments, facilities and service units. They follow institutional agreed processes and operated by permanent staff. Examples are the high-throughput sequencing, chemicals management or phenotyping service processes. Service processes comprises (a) defined personnel and organizational responsibilities including defined transition points between the laboratories, the scientist and the LIMS project team, (b) defined standard-compliant and machine-processable data formats, (c) mandatory metadata standards and d) defined data publication processes, i.e. the minting of PUIDs, like DOIs, and data upload into international data repositories. The second class of process are data flows in research projects. Here a more agile and are less rigidly structured processes are in place that reflecting the nature of innovation-driven science. Nevertheless, they are dovetailed with the core service processes and support immersive analytics driven knowledge generation in research projects. For example, research project for the genotypic and phenotypic characterization handle of thousands of plant samples and connect them with millions of data points. Scientist and technician work hand in and to interweave scientific data analysis and visualization pipelines and tools. This data servant approach, which is operated over more than 15 years, enabled the preservation of more than 6 million samples and terabytes of data files in a FAIR manner. The interplay of policies, processes and IT is a central backbone to support research data and material management at IPK and contributes data services to networks such as the European life-sciences infrastructure for biological information (ELIXIR), the German Bioinformatics Network (de.NBI) or the National Research Data Infrastructure (NFDI) in the consortia, FAIRAgro, DataPLANT and NFDI4Biodiversity. This talk provided an overview to the policies, technology and processes at the IPK to implement FAIR data and material management and show cased the application of digital twins in recent research projects. | Matthias Lange (IPK Gatersleben) |