Poster Session 3.W - Pharmaceutical Sciences and Health Technologies
Tran, Vuong
Department Laboratory Medicine, Semmelweis University
Huu Minh Vuong Tran1
1: Department Laboratory Medicine, Semmelweis University
1. Introduction
Logistic regression is commonly used in medical research to estimate the relationship between independent variables and a binary outcome. However, original datasets are rarely published, limiting reproducibility and secondary analyses. Typically, only regression outputs (odds ratios, confidence intervals, p-values) and contingency tables are reported. This study aims to develop a prototype algorithm in R to reconstruct datasets closely resembling the original using only the reported information.
2. Aims
To reconstruct a small dataset (n = 250) with two independent variables/predictors (both of which are binary) and one binary outcome using only the results of multivariable logistic regression and contingency tables.
3. Methods
Firstly, a dataset of binary variables is randomly generated to be considered as the original. Then, the results of multivariable logistic regression and contingency tables are recorded, with the odds-ratios and confidence intervals rounded to two decimal places, and the p-values rounded to three decimal places. Secondly, another dataset is randomly generated with characteristics exactly matching the contingency tables of the original and multivariable logistic regression is performed. The algorithm iteratively edits each predictor column in the dataset, comparing the difference in odds-ratios and confidence intervals each time. The process continues until either the difference is 0 or no further improvement is possible. Finally, the original and generated datasets are compared row-wise to assess the algorithm’s accuracy, which is defined as the percentage of exactly matching rows.
4. Result
The algorithm was able to reconstruct 50 different original datasets with 11 exact matches and a mean accuracy of 97.10%.
5. Conclusion
The study showed that it is possible to reconstruct datasets with some degree of accuracy from the outputs of multivariable logistic regression. However, since exact reconstruction has not been achieved with consistency, the algorithm in it’s current state is only suited for educational purposes.
6. Funding
SUPPORTED BY THE 2025-2.1.1-EKOP-2025-00014 UNIVERSITY RESEARCHERS’ SCHOLARSHIP PROGRAM OF THE MINISTRY FOR CULTURE AND INNOVATION FROM THE SOURCE OF THE NATIONAL RESEARCH, DEVELOPMENT AND INNOVATION FUND.