This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

About the data

    The data includes more than 2000 patients and about 500 healthy controls.​

    Patient data comes from different departments and most of them are not directly related to a rare disease. ​

    All patients have a manually curated ICD10 diagnosis code.​

    The data is comprised of clinical lab data, unstructured medical history and doctor’s notes data, genomic data and proteomic data.​

    • The clinical lab data includes blood tests and urine tests. Subjects can have between 0 to 30 terms and parameters. Abnormal findings in these tests are translated into HPO-Terms and added to the graph.​
    • The unstructured data includes Medical History Questionnaire (900 questions) and Doctor’s Letters. These are translated into HPO-Terms.​
    • The genomic data includes the top 10-20 prominent genes selected from about 6000 mutations per subject. Genetic variations that showed a CADD-score of 6 or higher were selected and only the variant with highest value per gene was chosen.​
    • The proteomic data includes between 1000-2000 proteins with quantitative values for each subject.​