PhD Scientific Days

Pharmaceutical Sciences and Health Technologies 1.

REFLECTIVE-TIAB: Agentic Reflective Prompt Evolution for Cost-Effective Large-Scale Title and Abstract Screening

Name of the presenter

Imre, Attila

Institute/workplace of the presenter

Center for Health Technology Assessment

Authors

Attila Imre^1,2,3,4, Ákos Józwiak^1,2, Judit Hagymásy^1,2, Judit Tittmann¹, Ágnes Nagy¹, Sándor Kovács^1,2, Przemyslaw Kardas⁵, Job FM van Boven⁶, Irene Mommers⁶, Balázs Nagy^2,3,4, Tamás Ágh^1,2
1: Center for Health Technology Assessment and Pharmacoeconomic Research, University of Pecs, Hungary
2: Syreon Research Institute, Budapest, Hungary
3: Center for Health Technology Assessment, Semmelweis University, Budapest, Hungary
4: Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
5: Department of Family Medicine, Medical University of Lodz, Lodz, Poland
6: Department of Clinical Pharmacy & Pharmacology, Groningen Research Institute for Asthma and COPD (GRIAC), University Medical Center Groningen, University of Groningen, Groningen, the Netherlands

Text of the abstract

Introduction: Title and abstract screening is a labour-intensive stage of systematic reviews. Large language models (LLMs) can automate this process, but performance depends heavily on prompt design and model selection, which is typically manual and time-consuming.
Aims: Our objective was to evaluate whether automated, reflection-driven prompt optimisation improves LLM performance during title and abstract screening.
Method: REFLECTIVE-TIAB uses the GEPA reflective prompt optimiser to improve prompts under an asymmetric loss penalising false negatives. Nine LLMs screened 8,520 de-duplicated records from a COPD exacerbation predictor search. A 100-abstract gold standard was constructed from inter-model disagreements and was expert-labelled. The prompt was optimised on Llama 3.3 70B via DSPy/GEPA and evaluated across all nine models.
Results: Optimisation improved recall across all LLMs (+3.7% to +37.1%). Gemini 3 Flash Preview achieved the highest performance (91% accuracy, F1 81.6%) while costing 25-fold less per abstract than GPT-5.2, which ranked among the lowest-performing models. A prompt optimised on a single open-source model generalised to all nine without retraining. Total optimisation cost was $6.36.
Conclusion: REFLECTIVE-TIAB provides automated, model-transferable prompt optimisation for literature screening at negligible cost. Model price did not predict screening performance. The framework could substantially reduce screening workload while preserving comprehensiveness.
Funding: This research is part of the COPD-ALERT project. The “COPD-ALERT - Prediction of COPD exacerbations through artificial intelligence based monitoring of medication adherence and other medical data” project is granted by the 2024-1.2.3-HU-RIZONT International Excellence Program (National Research, Development and Innovation Office – NKFIH). Supported by the 2025-2.1.1-EKÖP-2025-00014 University Research Scholarship Programme of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund.

PhD Scientific Days 2026

Budapest, 16-18 June 2026

REFLECTIVE-TIAB: Agentic Reflective Prompt Evolution for Cost-Effective Large-Scale Title and Abstract Screening

Name of the presenter

Institute/workplace of the presenter

Authors

Text of the abstract