Poster Session I. - D: Pathological and Oncological Sciences
Murmu Ankita
Semmelweis University
Ankita Murmu1, Anita Rácz2, Balázs Győrffy1
1: Semmelweis University
2: HUN-REN Research Center for Natural Sciences
Introduction: Understanding the complex interplay of features influencing treatment outcomes is crucial for prognostic modelling and clinical decision making.
Aim: We aimed to evaluate and compare three machine learning models based on different performance metrics to predict the probability of death at 5-year survival in breast cancer patients.
Method: We used the Surveillance, Epidemiology, and End Results database to retrieve data of breast cancer patients after the year 2000. The inclusion criteria were adult patients (≥18 years) with available overall survival data. After excluding incomplete records, cases above diagnosis year 2016 and censored data, 772,472 cases were analyzed. These data were further divided into groups based on whether the patient’s received chemotherapy or did not receive chemotherapy. We utilized gradient boosting, extreme gradient boosting and neural network machine learning algorithms in the R environment to build chemotherapy and no chemotherapy models.
Results: The gradient boosting model showed a slightly higher AUC of 0.85 compared to extreme gradient boosting (AUC: 0.83) and neural network (AUC: 0.81) in chemotherapy models. Similarly, in no chemotherapy models, gradient boosting (AUC: 0.90) outperformed extreme gradient boosting (AUC: 0.89) and neural network (AUC: 0.88). In terms of accuracy, the gradient boosting and extreme gradient boosting showed almost similar accuracies of 0.77 and 0.76 respectively, while neural network showed accuracy of 0.73 in chemotherapy models. In no chemotherapy models, the accuracy for gradient boosting, extreme gradient boosting and neural network increased to 0.83, 0.82 and 0.81, respectively. The sensitivity of neural network decreased to 0.66 compared to gradient boosting (0.78) and extreme gradient boosting (0.70) in chemotherapy models. By ranking the three algorithms based on different metrics, gradient boosting showed the best overall performance. Among all the input features, lymph node positivity was found to be the most important feature in both chemotherapy and no chemotherapy models.
Conclusion: These findings highlight the potential of machine learning algorithms that can be incorporated in clinical settings and assist in improving patient outcomes.