Feasibility Study: Assigning Handles to Terms of Terminologies
Terminology terms can be used to annotate data in a way that is both machine-readable and -usable. Keywords that describe datasets or publications can be annotated with the identifiers (e.g. URI) of the corresponding terms of a terminology. Such annotation is most valuable if identifiers resolve on the Web. This technique enables access to further information about the term and typically reduces ambiguity about the term, for both humans and machines. However, identifiers often do not resolve. This occurs for instance if a term is deleted in a new version of a terminology where the term was not marked as deprecated but deleted entirely. We therefore investigated whether terms had previously been deleted or replaced in the ESS collection terminologies (status as of 25/01/2024). We found that six of the 32 (19%) terminologies investigated had deleted classes or individuals, for which the corresponding identifiers were no longer resolvable. Further details of this study can be found in a report by Ganske (2024).
To achieve persistent identifiers (PIDs) for terms, the TIB TS could assign additional persistent identifiers to all terms in each terminology for future versions. These should always resolve to the corresponding webpage (landing page) for the term in the TIB TS. If (meta)data curators and systems use such TIB TS PIDs in data annotation, a persistent link to a description of the term in question would exist, even if the term is deleted from the original terminology at some future point in time. After investigating several options, we found that using handles from Handle.net is a simple and inexpensive solution, although it requires a Handle server at TIB.
This assignment of PIDs to all versions of a term in a terminology implies storing all future versions of that terminology in the TIB TS. Currently, the TIB TS stores and, thus, makes available in search only the latest version of a terminology. Therefore, major changes to the backend and the search function are required to enable this. Not only would storing all versions of a terminology increase the necessary storage space, but it would also complicate and lengthen the process of constructing the tree of dependencies for all terminologies. If older versions were to be ignored for this task, a second search tree would need to be constructed for them. In both cases, the search would also need to be adapted, as normal users do not want to see all versions of a term. Therefore, we have decided to start with a feasibility study involving the assignment of handles to a few terms from two terminologies (dfgfo and envo). We ingested the two versions dfgfo and dfgfo2024 and two older envo versions (envo2023 and envo2021), in addition to the latest release of envo on TIB TS and assigned handles to selected terms that had changed between the versions:
- Change of sub-class label between dfgfo and dfgfo2024:
Sub-class Geology and Palaeontology (https://hdl.handle.net/20.500.14488/DFGFO_314-01)
was changed to
sub-class Geology (https://hdl.handle.net/20.500.14488/DFGFO_342-01) - Change of sub-class label between dfgfo and dfgfo2024:
Sub-class Security and Dependability (https://hdl.handle.net/20.500.14488/DFGFO_409-03)
was changed to
sub-class Security and Dependability, Operating, Communication and Distributed Systems (https://hdl.handle.net/20.500.14488/DFGFO_443-03) - Change of IRI of class Area of alpine tundra between envo2021 and envo2023:
IRI http://purl.obolibrary.org/obo/ENVO_3400001 (http://hdl.handle.net/20.500.14488/3ezqlubc )
was changed to
IRI http://purl.obolibrary.org/obo/ENVO_03400001 (https://hdl.handle.net/20.500.14488/eknh2i0v?ts=1699999999 ).
We also developed a prototype script that compares metadata from two handles and stores the differences in JSON format. This provides a practical way to detect changes across ontology versions, such as modifications to subclass relationships or annotations.
In summary, Handles offer a reliable, low-cost solution for the persistent identification and versioning of semantic artefacts. They enable terms to be identified, resolvable, and (semantically) consistent over time. However, our experience has shown that deploying a Handle server requires significant learning, troubleshooting, and coordination before it can be offered as a production service.