Workshop on PID for publication services and current research information systems

20 March 2024 Bielefeld University Library & Online

Information and exchange on registries, multiple assignments and aggregators

Persistent identifiers (PID) play an important role in the unique identification of publications and research information. The DFG-funded project "PID Network Germany" offers an exchange platform for the discussion of PID, their use, dissemination, needs and obstacles in German-speaking countries as well as their improvement.

A workshop was held at Bielefeld University Library on 20 March. It focussed on the complexity of PID practices for and in open access publication services and research information systems (CRIS). There are various registries in which publication services and CRIS can be registered in order to increase the visibility of research at one's own institution. OpenDOAR, Re3Data, FAIRsharing, DOAJ, but also the Directory of Research Information Systems (DRIS) and the PROVIDER Portal der European Open Science Cloud (EOSC) are just a few examples. The current challenge is that these services and systems receive multiple identifiers, which are used by national and international aggregators such as BASE, OpenAIRE, OpenAlex and citation databases (e.g. OpenCitations) for indexing.

The workshop evaluated which PID systems are suitable for addressing publication services and CRIS and how PID can be used more effectively in services and systems. The aim was to identify and discuss needs and deficits in the application and implementation of PID in publication services and CRIS. In addition, solutions were discussed that can be used to optimise PID metadata workflows in the cycle of research and cultural objects, as well as in aggregation systems, for example from BASE and the lending services, such as DataCite and the German National Library (DNB).

The workshop started with a series of informative presentations in which around 190 interested people also took part online. The topics and slides are linked in the table below and can also be viewed on Zenodo.

Following the presentations, various aspects of persistent and unique identification of publication services and RIS were discussed with the participants on site. In three small groups, 45 participants exchanged views. The results are summarised below.

Summary of the interactive part

General section

The focal topic of the workshop "PID for open access publication services and research information services" was the starting point for the group discussions. However, not only dedicated PID for these services were discussed, but also various PID in these services, which was to be expected given the complexity of the PID systems and the backgrounds of the participants.

Some participants stated that they use the following PID for publication services or CRIS: DOI, URN, Handle, GND iD, ISSN, ISBN, IGSN. In addition, a number of other PIDs were named which are used for referencing and are therefore part of the PID cycle and administrative handling: ORCID iD, ROR iD, IDs from Scopus and Web of Science, FundRef, Gepris, CORDIS, ISNI, Wikidata (e.g. for organisations).

Challenges

A wide range of problems were identified in all groups. These are summarised below:

Difficulty in dealing with different understandings of "persistent" in different contexts
Repeated need for manual entry of PID in systems
A persistent need for system- or institution-internal identifiers (e.g. in Identity and Access Management (IDM) systems, ORCID iD is then only additional)
An often existing lack of clarity about the actual benefit or advantage of using PID.
Problems with retrieving publication data from ORCID IDs (possible reason: lack of approval by users)
Lack of clarity regarding name changes and mergers of organisations
Lack of granularity with ROR
>Legal debates with data protection officers about publication and disclosure of publication metadata
Different policies for the allocation of DOIs at registration agencies
Technical problems with DOI allocation
Incorrect information in the metadata, e.g. incorrect article names
Problems with metadata quality due to input from researchers
Need for implementation of PID by research funders

Needs

In addition, the participants discussed in their groups which PID they would like to use. There is a desire for the implementation of PID for samples (IGSN), for projects (RAiD) possibly in combination with Grant ID, for events via Confident (https://www.confident-conference.org/), for institutions, organisations, publishers (possibly ROR, Ringgold) and CRIS. The need for PID for scientific prizes was also mentioned.

Obstacles

Following on from this, the groups discussed difficulties and obstacles in the implementation of the aforementioned PIDs. The reasons cited for non-application were unclear benefits, immature PID profiles (e.g. RAiD), a lack of usage scenarios (e.g. if data is only used internally and not exchanged with other systems), a lack of user acceptance and a lack of mandate for prescribing the use of PID at the institution.

Requirements for implementation

In order to establish PID for publication services/CRIS, the following prerequisites are necessary from the participants' point of view:

Awareness: It is important to raise awareness of the importance of PGDs.
Sufficiently widespread distribution and use of PIDs
Ease of use (e.g. automated data retrieval, less manual input)
Promote developments: e.g. technical interoperability via APIs
Data sovereignty and transparency for users
Permanent resources (e.g. financial and human resources)
Costs of PIDs must be accepted and priced in as a matter of course
Rapid implementation through specifications from government to sponsors or sponsors to science
Creation of a basis at an early stage: integration of PID into training or curriculum
Creation of a hub for ID such as Wikidata or OpenAlex
Involvement of researchers: The use of IGSN and RAID must be actively pursued by researchers.
Establish clear community standards for the use of PID
With regard to IGSN: It should be defined at what point in time these IGSN should be assigned.
Demonstrate advantages: For example, the use of PID can facilitate internal monitoring

Focal topics

The discussion threads from the individual groups, each focussing on one topic (registries, multiple assignments and aggregators), are discussed below.

Topic: Registries

The participants use the re3data, OpenDOAR, ROR (e.g. for institution names) and DFG-GEPRIS registries (insofar as they are understood as registries), as well as aggregators that access data from registries. Only re3data was named as a registry with a directory of the publication service or RIS of its own institution.

Workflows were identified as important, but also complex topics. Problems with workflows include the time-consuming or impossible automated processing of unstructured external data as well as difficulties in identifying organisational and personnel responsibilities for registries (keyword: curation sovereignty).

There is a need for guidelines or similar documents for institutions to clarify the importance and processes for clear handling, responsibilities and workflows in connection with institution IDs. In addition, the need to use or at least consider other identifiers for institutions in certain use cases was pointed out, e.g. the identification numbers from the DFG database GERIT in project funding administration and the identification numbers of the Federal Statistical Office and/or the ETER database in social science surveys.

It was emphasised that it is important that information about these administrative challenges should also be provided by the PID Network project or other community projects.

Topic: Multiple assignments

This group particularly addressed problems with the multiple assignment of PIDs, for example because authors have several ORCID iDs - usually due to carelessness - that are not linked to each other. Publications can also be assigned a large number of PIDs (e.g. several DOI, handle, URN, ISBN) due to publication in different repositories. It was discussed when it makes sense to assign additional PIDs for secondary publications and in which cases this leads to confusion for users in practice. In addition, motivations for the use of publisher identifiers from a researcher's perspective were cited, such as impact measurement. Basically, the discussion about multiple assignments is about differentiating PIDs in their function as a unique identifier of the referenced object from their function as a reference for the storage location of the object.

Topic: Aggregators

The current challenge is that publication services and CRIS receive multiple identifiers, which are used by national and international aggregators such as BASE, OpenAIRE, OpenAlex and citation databases (e.g. OpenCitations) for indexing. The participants exchanged views on various aspects and discussed difficulties and challenges for aggregators in particular.

The following points were discussed:

Lack of a central reporting centre and no established workflow for changes in the repository by repo operators
Aggregators cannot check all data, are dependent on accuracy
The aggregators have the maintenance effort themselves, there is no bundling
Different interfaces (OAI/SPARQL) - gaps must be closed
Who decides when there is a new ID? Should it be the community's decision?
Question about the level of awareness of aggregators?
Lack of networking of aggregators
Hardly any exchange between aggregators and harvesters - lack of feedback loop to track and standardise changes
Cooperation/exchange with repository operators could be helpful
Difficulties with the international networking of aggregators

Programme

	Agenda item	Speaker	Documentation
09:30	Welcome by the Bielefeld University Library Directorate	Dirk Pieper, UB Bielefeld Direktion
09:35	Project introduction: PID Network Germany	Lena Messerschmidt, Helmholtz Open Science Office	https://doi.org/10.5281/zenodo.10842296
09:45	Pros & cons for PID of CRIS services in registries - using the example of the "Directory for Research Information Systems"	Pablo de Castro, euroCRIS/University of Strathclyde, Glasgow, UK	http://hdl.handle.net/11366/2522
10:30	re3data: Referencing repositories for the community	Charlotte Neidiger, re3data/ Karlsruher Institut für Technologie - KIT	https://doi.org/10.5281/zenodo.10822138
10:45	DataCite: Use of PIDs for repositories in DataCite services	Paul Vierkant, DataCite	https://doi.org/10.5281/zenodo.10820440
11:00-11:15	Break
11:15	PIDs in (open) publication services: An OJS approach	Zeynep Aydin, TIB Hannover	https://doi.org/10.5281/zenodo.10947608
11:40	DINI AG: EPub & FIS on PIDs	Daniel Beucke, Niedersächsischen Staats- und Universitätsbibliothek Göttingen; Sebastian Herwig, Westfälische Wilhelms-Universität Münster; Sabrina Petersohn, Geschäftsstelle der Kommission für Forschungsinformationen in Deutschland (KFiD)	https://doi.org/10.5281/zenodo.10853437
12:05	Handling of OA publication services & FIS in aggregators using the example of BASE and OpenAIRE	Andreas Czerniak, Universitätsbibliothek Bielefeld; Vitali Peil, Universitätsbibliothek Bielefeld	https://doi.org/10.4119/unibi/2988038
until 13:30	Lunch break
13:30	Group work part 1	All participants
14:10	Break
14:20	Group work part 2	All participants
15:30	Break
15:40	Wrap up

The language of the event was German.

If you have any questions or suggestions, you can contact us at any time at info.pidnetworklistserv.dfnde.

Thank you for your participation!

Project partners of PID Network Germany are DataCite, the German National Library, the Helmholtz Open Science Office, the German National Library of Science and Technology (TIB) and Bielefeld University Library. The project is funded by the German Research Foundation.