About the data

2000 patients and 500 healthy controls.

All patients have a manually curated ICD10 diagnosis code.

The clinical lab data:

between 0 to 30 terms and parameters per subject with abnormal findings being translated into HPO-Terms and introduced into knowledge graph.

Unstructured data

Genomics data

Top 10-20 prominent genes selected from about 6000 mutations per subject. Genetic variations that showed a CADD-score of 6 or higher were selected and only the variant with highest value per gene was chosen.

Proteomics data