Master Thesis

Students can pursue their Master Thesis by addressing a concern of interest to this project. This page provides a list of open and closed projects. In addition to the projects suggested here, we welcome proposals by students.

Semantic Research Data

The master thesis develops an approach for bi-directional translation of semantic research data (i.e., machine readable data that describe research data and their meaning) into data represented in conventional data structures used for research data analysis e.g., data frames.

Not least in light of the FAIR Data Principles, we think it is high time for data analysis to consume and produce semantic research data so that data effectively convey meaning. The emphasis here is on the production of semantic research data. Indeed, data consumed in data analysis, especially those published by research infrastructures, increasingly conform with principles such as FAIR and relevant community best practices. In contrast, research data produced in data analysis very often follow ad hoc practices that reflect the immediate needs of researchers, that is unless they undergo an expensive FAIRification process (or, more generally, processes that align produced data with best practices).

With particular focus on interoperability, the contribution of this thesis is toward approaches that ensure the data produced in data analysis are semantic at birth.

Research data analysis is often performed using Excel, Matlab, R, Python or similar computing environments. For Python, the integration of RDF and Pandas has been proposed but has seen no concrete development. Relevant are approaches that translate data structures, such as the W3C recommendations Generating RDF from Tabular Data on the Web or R2RML: RDB to RDF Mapping Language as well as vocabularies for the representation of datasets, such as The RDF Data Cube Vocabulary, or primary data such as sensor data with the Semantic Sensor Network ontology. Other standards are also relevant e.g., OGC Observations and Measurements, NetCDF and HDF5, or GeoJSON. For instance, the NetCDF-LD project bridged NetCDF data with Linked Data. Results by the W3C CSV on the Web WG are also relevant here.

While the landscape of research data is extremely heterogeneous, the primary focus of this thesis is to capture the meaning of data created in data analysis. A key aspect is thus the effective integration of existing technologies with computing environments and at least one language (e.g., Python or R) broadly used in research.


  • Semantic Research Data