Health Sciences 1.
Tóth, Zsófia
Department of Dermatology, Venereology and Dermatooncology Semmelweis University, Faculty of Medicine
Phyllida Kerstin Hamilton-Meikle1, Tóth Zsófia1
1: Department of Dermatology, Venereology and Dermatooncology Semmelweis University, Faculty of Medicine
Introduction: Basal cell carcinoma (BCC) is the most common malignant skin tumor worldwide. Due to its high incidence, timely diagnosis remains a major challenge in everyday clinical practice, highlighting the need for novel diagnostic approaches. Recent advances in large language models (LLMs) have created new opportunities for using artificial intelligence-based (AI) programs in the dermatological diagnostic process.
Aim: This study aimed to compare the diagnostic accuracy of three multimodal LLMs - ChatGPT-5 (OpenAI), Gemini 2.5 Flash (Google), and Claude Sonnet 4 (Anthropic) - in differentiating BCC from non-BCC lesions and in classifying BCC subtypes, based on both clinical and dermoscopic images.
Methods: A total of 772 images were analyzed retrospectively, of which 402 were histopathologically confirmed BCC lesions (290 clinical and 112 dermoscopic images) and 370 images belonged to a BCC-mimicker cohort (250 clinical and 120 dermoscopic images). Each case received identical diagnostic prompts, followed by a clarification query requesting a single definitive answer. Sensitivity, specificity, and overall accuracy were calculated separately for clinical and dermoscopic datasets.
Results: Among the three LLMs, in the case of clinical images, ChatGPT-5 achieved the highest diagnostic accuracy (75%). In contrast, for dermoscopic images, Claude Sonnet 4 reached the best result (69.8%). Our findings indicate that the performance of all models in correctly classifying BCC subtypes was limited.
Conclusions: These findings indicate that AI-based LLMs may provide an additional tool complementing human examination in BCC recognition and classification. Further refinement and domain-specific training could enhance their diagnostic reliability and future integration into dermatological decision-support systems.
Funding: Supported by PhD research grant.