Health Sciences III.
Introduction:The fragmentation of health and biomedical research data poses a significant challenge to precision medicine, necessitating efficient utilization of large, complex health data resources for data sharing across institutions and borders.
Biobanks,-sample archives and data integration centers-,are providing high-quality datasets. The analysis of this data in federated datasets can yield higher statistical power. Harmonization is a prerequisite for datasharing, mapping unique clinical and molecular characteristics into a unified data model and standard codes(following the guidelines of
international working groups such as BBMRI-ERIC and ERN).This practice maximizes comparability and compatibility between different datasets, making healthcare information available for privacy-preserving data sharing and learning.
Aims:The Semmelweis Federated Data Warehouse project aims to connect clinical, laboratory, and genomic profiles by identifying mutual synergies. Two dataset types exist: rare disease-specific collections containing detailed clinical phenotype descriptions and germline genetic (NGS-based) profiles, and population and case-control studies for
common diseases, focusing on common SNVs. Joint analysis may be based on comorbidities and age at onset described in rare patients and their relatives.
However, the effectiveness of extracting comorbidity data using clinical descriptions of rare diseases is a crucial question. Family history data is also rich, but there are no established standards for structured extraction.
Methods:Over 500 patients with exome sequencing data were enrolled, and medical documentation was used to screen for coding features using ICD-10, ORPHA, and HPO ontologies.
Result:The analysis of diagnosis and symptom level descriptors in medical texts, mapping them to standard ontologies, the denseness and information content is presented.
Conclusion: Rare disease phenotype-genotype data aids in understanding common disease heritability, as comprehensive genomic data is generated. Phenotypic profiles can be effectively extracted from personal and family history of medical documentation. Unified coding allows hierarchical analysis and may serve as a
best practice for data harmonization. Funding:Supported by the TKP2021-NVA-15 grant of the Thematic Area Excellence Program of the National Research, Development and Innovation Office.