PhD Scientific Days

Poster Session I. - D: Pathological and Oncological Sciences

Comparison of Machine Learning Models for Prognostic Prediction in Breast Cancer

Name of the presenter

Murmu Ankita

Institute/workplace of the presenter

Semmelweis University

Authors

Ankita Murmu¹, Anita Rácz², Balázs Győrffy¹

1: Semmelweis University
2: HUN-REN Research Center for Natural Sciences

Text of the abstract

Introduction: Understanding the complex interplay of features influencing treatment outcomes is crucial for prognostic modelling and clinical decision making.

Aim: We aimed to evaluate and compare three machine learning models based on different performance metrics to predict the probability of death at 5-year survival in breast cancer patients.

Method: We used the Surveillance, Epidemiology, and End Results database to retrieve data of breast cancer patients after the year 2000. The inclusion criteria were adult patients (≥18 years) with available overall survival data. After excluding incomplete records, cases above diagnosis year 2016 and censored data, 772,472 cases were analyzed. These data were further divided into groups based on whether the patient’s received chemotherapy or did not receive chemotherapy. We utilized gradient boosting, extreme gradient boosting and neural network machine learning algorithms in the R environment to build chemotherapy and no chemotherapy models.

Results: The gradient boosting model showed a slightly higher AUC of 0.85 compared to extreme gradient boosting (AUC: 0.83) and neural network (AUC: 0.81) in chemotherapy models. Similarly, in no chemotherapy models, gradient boosting (AUC: 0.90) outperformed extreme gradient boosting (AUC: 0.89) and neural network (AUC: 0.88). In terms of accuracy, the gradient boosting and extreme gradient boosting showed almost similar accuracies of 0.77 and 0.76 respectively, while neural network showed accuracy of 0.73 in chemotherapy models. In no chemotherapy models, the accuracy for gradient boosting, extreme gradient boosting and neural network increased to 0.83, 0.82 and 0.81, respectively. The sensitivity of neural network decreased to 0.66 compared to gradient boosting (0.78) and extreme gradient boosting (0.70) in chemotherapy models. By ranking the three algorithms based on different metrics, gradient boosting showed the best overall performance. Among all the input features, lymph node positivity was found to be the most important feature in both chemotherapy and no chemotherapy models.

Conclusion: These findings highlight the potential of machine learning algorithms that can be incorporated in clinical settings and assist in improving patient outcomes.

PhD Scientific Days 2025

Budapest, 7-9 July 2025

Comparison of Machine Learning Models for Prognostic Prediction in Breast Cancer

Name of the presenter

Institute/workplace of the presenter

Authors

Text of the abstract