PhD Scientific Days 2026

Budapest, 16-18 June 2026

Poster Session 3.L - Health Sciences

LLM-Guided Drug Normalization for Rapid OMOP CDM Integration of Hungarian Drug Data

Name of the presenter

Bali, Orsolya

Institute/workplace of the presenter

Semmelweis University

Authors

Orsolya Bali1, Loretta Kiss, Eszter Kővári, Ágota Mészáros, Tibor Héja, Csaba Nemes, Zsolt Bagyura2, Mónika Hujter , Zsófia Práger3
1: Semmelweis University, Institute of Clinical Data Management
2: 1Institute for Clinical Data Management, Semmelweis University
3: Hiflylabs Zrt

Text of the abstract

Introduction
Standardized clinical data is crucial for real-world evidence studies, but drug coding in Europe—especially in Hungary—faces challenges due to free-text medication entries and inconsistent formats. Manual mapping is slow, and current semi-automated methods struggle with non-English text.
Aims
We aim to present an efficient pipeline that extracts and normalizes Hungarian hospital drug records into OMOP CDM tables using LLM-based normalization and API-driven code mapping.
Methods
Our approach normalizes and maps Hungarian drug entries to OMOP CDM concepts, building on prior extraction work. We compared script-based and LLM-based normalization for brand-to-ingredient resolution, translation, and structured attribute extraction. In mapping, five parallel methods were evaluated against a clinician-reviewed standard (133 terms), with performance measured by precision, recall, F1-score, and coverage.
Results
LLM-based normalization (Claude Sonnet 4.6, few-shot prompting) achieved a 99.5% rate (429/431 entries), outperforming script-based methods—especially for multi-ingredient compounds. Local vocabulary queries had the highest mapping F1-score (0.709), but no method was accurate enough for full automation.
Conclusion
LLM-guided normalization effectively converted Hungarian clinical text to standardized English drug attributes. All five mapping methods required expert review for deployment. Access to RxNorm Extension improved accuracy over core RxNorm, making it essential for European drug mapping. The approach is feasible, but a scalable, governed architecture is needed for ongoing OMOP ETL deployment.
Funding
SUPPORTED BY THE 2025-2.1.2-EKÖP-KDP UNIVERSITY RESEARCH SCHOLARSHIP PROGRAMME OF THE MINISTRY FOR CULTURE AND INNOVATION FROM THE SOURCE OF THE NATIONAL RESEARCH, DEVELOPMENT AND INNOVATION FUND.