A Framework for Knowledge Graphs based on Semantic Integration, Representation, and Curation of Scientific Data to enable Trustable and Interpretable Knowledge Exploration and Discovery

Data is a foundational asset for fostering a country’s economy and assuring citizens with a high-quality life. However, albeit recognized as pivotal infrastructures, global adoption of data-driven solutions is still lagging, particularly, in biomedicine. Thus, besides years of research in data integration, curation, and knowledge representation, the trustability of data-driven insights is still affected by the absence of interpretable methods to account for the correctness and bias in data-driven pipelines.

In TrustKG, we aim at enabling explainable data integration in pipelines for transforming scientific data into semantically rich and linked knowledge graphs. TrustKG will be equipped with computational logic and ontologies to express temporality and causation relationships among ingested, curated, and integrated data. Further, TrustKG methods will resort to experts’ wisdom and computational techniques for validating and explaining data management decisions and outcomes.

TrustKG is led by Prof. Dr. Maria-Esther Vidal and supported by Leibniz Association in the program "Leibniz Best Minds: Programme for Women Professors", project TrustKG-Transforming Data in Trustable Insights with grant P99/2020.

Currently, formalisms for explainability target computational methods for interpreting machine learning (ML) algorithms, as well as potential bias in their insights. TrustKG will advance the state of the art and enable interpretable large-scale data integration to empower AI and ML approaches with semantic descriptions of the data received as input. These descriptions include the explanation of data alignments, and the decisions made to curate, integrate and collect the input data of an AI method.  Logical entailments will augment AI methods’ outcomes. 

As a result, we expect a paradigm shift in semantic data integration towards explainable AI. TrustKG will be applied in the context of lung and breast cancer. Personalized therapies will be entailed from the evolution of a patient disease predicted based on the patient profile. The description of patient profiles integrated with available knowledge about therapies will yield an explanation of the disease evolution and therapy effectiveness.

Work packages:

WP1 - Understanding Causality in a Knowledge Graph  Pipeline

WP2 - Knowledge Graph based Knowledge Extraction

WP3 - Semantic Data Integration and Knowledge Graph Creation

WP4- Explainable Analytical and Predictive Models

WP5- Exploration and Visualization of Explainability

WP6- Evaluation of the TrustKG framework