Semantic Statistical Hypothesis Tests

This use case demonstrates how a result published by Haddad et al. (2017) in their paper titled "Iron-regulatory proteins secure iron availability in cardiomyocytes to prevent heart failure" (doi:10.1093/eurheartj/ehw333) can be represented in a semantic form at the time of data analysis.

Specifically, we reproduce and propose an alternative (and complementary) representation of the statistical hypothesis test underlying the statement "IRE binding activity was significantly reduced in failing hearts" using the data shown in Figure 1B (p. 364). In contrast to the conventional representation of a p-value, a plot and a natural language statement, we demonstrate how this results can be represented as a machine readable description for the statistical hypothesis test, specifically a two sample t-test with unequal variance with two continuous variables (and their corresponding values), the study design dependent variable, and the p-value.

The semantic representation at the time of data analysis is enabled by extending a Jupyter notebook with a corresponding function and by means of existing vocabulary of the STATO general purpose statistics ontology, the Ontology for Biomedical Investigations (OBI), the Information Artifact Ontology (IAO), and the Gene Ontology (GO), among others.

The use case also prototypes storing, reading and processing of the semantic statistical hypothesis test. Persistence is enabled by the graph database. Processing is demonstrated by visualizing the data underlying the statistical hypothesis test using SPARQL and matplotlib, the Python plotting library.