About the data

2000 patients and 500 healthy controls.

All patients have a manually curated ICD10 diagnosis code.

Data Modalities:​

The clinical lab data:​

between 0 to 30 terms and parameters per subject with abnormal findings being translated into HPO-Terms and introduced into knowledge graph.​​

  • Blood Test Measurements​

  • Urine Test Measurement. ​

Unstructured data ​

  • Medical History Questionnaire (about 900 questions) ​

  • Doctor’s Letters​

Genomics data​

  • Top 10-20 prominent genes selected from about 6000 mutations per subject. Genetic variations that showed a CADD-score of 6 or higher were selected and only the variant with highest value per gene was chosen.​​

Proteomics data ​

  • Between 1000-2000 proteins with quantitative values for each subject.​​ ​
Last modified November 12, 2024: changed about the data (9fdf52a)