About the data

The data includes more than 2000 patients and about 500 healthy controls.​

Patient data comes from different departments and most of them are not directly related to a rare disease. ​

All patients have a manually curated ICD10 diagnosis code.​

The data is comprised of clinical lab data, unstructured medical history and doctor’s notes data, genomic data and proteomic data.​

  • The clinical lab data includes blood tests and urine tests. Subjects can have between 0 to 30 terms and parameters. Abnormal findings in these tests are translated into HPO-Terms and added to the graph.​
  • The unstructured data includes Medical History Questionnaire (900 questions) and Doctor’s Letters. These are translated into HPO-Terms.​
  • The genomic data includes the top 10-20 prominent genes selected from about 6000 mutations per subject. Genetic variations that showed a CADD-score of 6 or higher were selected and only the variant with highest value per gene was chosen.​
  • The proteomic data includes between 1000-2000 proteins with quantitative values for each subject.​

Last modified April 24, 2024: Little changes, fixing bugs (5ecf6fa)