Semantic Statistical Hypothesis Tests

This use case demonstrates how a result published by Haddad et al. (2017) in their article titled "Iron-regulatory proteins secure iron availability in cardiomyocytes to prevent heart failure" (https://doi.org/10.1093/eurheartj/ehw333) can be represented in a semantic form at the time of data analysis and how this result can be linked to article metadata and content at the time of article writing.

Specifically, we reproduce and propose an alternative (and complementary) representation of the statistical hypothesis test underlying the statement "IRE binding activity was significantly reduced in failing hearts" using the data shown in Figure 1B (p. 364). In contrast to the conventional representation of a p-value, a plot and a natural language statement, we demonstrate how this results can be represented as a machine readable description for the statistical hypothesis test, specifically a two sample t-test with unequal variance with two continuous variables (and their corresponding values), the study design dependent variable, and the p-value.

The semantic representation at the time of data analysis is enabled by extending a Jupyter notebook with corresponding program code and by leveraging existing vocabulary of the STATO general purpose statistics ontology, the Ontology for Biomedical Investigations (OBI), the Information Artifact Ontology (IAO), and the Gene Ontology (GO), among others. The use case also demonstrates processing the semantic representation. Concretely, we visualize the data underlying the statistical hypothesis test using SPARQL and matplotlib, the Python plotting library.

Furthermore, we use dokieli to demonstrate how, at the time of writing, authors can annotate their article to link natural language content with corresponding semantic representations, and crosslink to article (meta)data, such as title, authors and their ORCID, article DOI and other metadata. For this, we have created and published a document that reproduces relevant article content, annotated the content leveraging RDFa, and extended the Jupyter notebook so that the RDF statements are extracted from the document. As the notebook exemplifies, we can now formulate more interesting queries on article metadata and therein published results.