Poster Session II. - Z: Euniwell
Suarez Daniel
Murcia University
Daniel Suárez Hernández1,1, Sandra Catalan Pallares2, Paul Ximo Pluitjer Izquierdo2, Ruben Cabrera Beirouthy3, Manuel Francisco Dolz Zaragoza2, Javier Urios Durá4, Sara Jofresa Iserte5
1: Murcia University
2: Jaime I University
3: Orihuela Hospital
4: Benejuzar Health Centre
5: San Miguel Health Centre
Before the advent of AI in the current field of research, it was absolutely necessary for an ophthalmologist to thoroughly study each case of DR. Due to the low proportion of ophthalmologists compared to the population, the wait time for a medical appointment could be excessively long, allowing the disease to progress to a more severe stage that, if diagnosed early, could have been avoided. For a neural network to perform a task, it must first be trained with data, such as images or text, that allow the network to "understand" the problem. In the case of diabetic retinopathy (DR), these data sets are retinal images.
The experiments show that ResNet-50 and DenseNet-169 are the models with the best results for DR classification. ResNet-50 shows the best overall metrics, followed by DenseNet-169, both outperforming the RETFound model in most metrics, except accuracy and training time. However, although metrics such as AUPRC and F1-Score are competitive, they are not sufficiently high for fully autonomous diagnosis in a medical setting.
Regarding DR classification by levels (classes), the imbalance in the dataset drastically affects the model's accuracy. The best-represented classes 0 and 2 perform well, while classes 1 and 3 perform worse. Class 4, although less prevalent, achieves acceptable metrics due to more evident features in the images. Models such as DenseNet and ResNet, which retain information from early layers, show improved performance compared to VGG-19, highlighting the importance of preserving details for correct DR level classification.
In the inference tests, the results are lower than those of the training model, possibly due to the specific tuning for the training data. However, ResNet-50 and DenseNet-169 outperformed RETFound in the most frequently represented classes (0 and 1), although they have limitations in less frequent classes (2, 3, and 4). This reinforces the need to improve the representativeness of the dataset and fine-tune the models to balance their performance across all classes. Finally, it should be noted that, in a medical context, errors that underestimate disease severity can be more serious than those that overestimate it, so these models should be implemented with a conservative approach to ensure patient safety.
Funding: Research conducted with a grant from the Fisabio Foundation Unisalut Program Valencian Community.