The data includes more than 2000 patients and about 500 healthy controls.
Patient data comes from different departments and most of them are not directly related to a rare disease.
All patients have a manually curated ICD10 diagnosis code.
The data is comprised of clinical lab data, unstructured medical history and doctor’s notes data, genomic data and proteomic data.
- The clinical lab data includes blood tests and urine tests. Subjects can have between 0 to 30 terms and parameters. Abnormal findings in these tests are translated into HPO-Terms and added to the graph.
- The unstructured data includes Medical History Questionnaire (900 questions) and Doctor’s Letters. These are translated into HPO-Terms.
- The genomic data includes the top 10-20 prominent genes selected from about 6000 mutations per subject. Genetic variations that showed a CADD-score of 6 or higher were selected and only the variant with highest value per gene was chosen.
- The proteomic data includes between 1000-2000 proteins with quantitative values for each subject.