⚡ Quick Summary
A recent study evaluated the diagnostic accuracy of artificial intelligence (AI) models against 263 pediatric clinicians for diagnosing childhood exanthems. The findings revealed that AI models, particularly ChatGPT and Gemini, achieved diagnostic accuracy that matched or exceeded that of pediatric specialists.
🔍 Key Details
- 👩⚕️ Participants: 263 pediatric clinicians (107 residents, 156 specialists)
- 🧠 AI Models: ChatGPT, Gemini, Copilot
- 📸 Methodology: Blinded multiple-choice questionnaire with clinical photographs and data
- 📊 Definitive diagnoses: Confirmed by clinical features, laboratory findings, and consensus of specialists
🔑 Key Takeaways
- 🏆 Performance: ChatGPT diagnosed 86.9% of cases correctly, Gemini 82.0%, and Copilot 72.1%.
- 📈 Specialist vs. Resident: Specialists outperformed residents with a median score of 46 compared to 41 (P < 0.001).
- 🌟 AI vs. Specialists: Both ChatGPT and Gemini exceeded the upper bound of the specialist population median.
- 🔍 Disease-level accuracy: Ranged from 0% (insect bites) to 100% (for 9 conditions).
- 👩🎓 Experience matters: Fourth-year residents scored higher than first- and second-year residents (P = 0.001).
- ⚠️ Contextual reasoning: AI struggled with conditions requiring contextual understanding, highlighting the need for physician oversight.

📚 Background
Diagnosing childhood exanthematous diseases can be quite challenging due to their overlapping clinical presentations. Traditional diagnostic methods rely heavily on the expertise of pediatric clinicians, but with advancements in technology, artificial intelligence is emerging as a potential tool to assist in these complex cases. This study aims to explore whether AI can match or exceed the diagnostic capabilities of experienced pediatric specialists.
🗒️ Study
The study involved a volunteer sample of 263 pediatric clinicians, including both residents and specialists, who were tasked with diagnosing 61 cases of childhood exanthems. Each clinician completed a blinded questionnaire that included clinical photographs and relevant clinical data. The same cases were also presented to three AI models: ChatGPT, Gemini, and Copilot, allowing for a direct comparison of diagnostic accuracy.
📈 Results
The results were promising for AI applications in pediatric diagnostics. ChatGPT achieved an impressive 86.9% accuracy, while Gemini followed closely with 82.0%. Copilot, although less accurate, still performed above the resident median. Notably, both ChatGPT and Gemini surpassed the upper bound of the specialist population median, indicating their potential as reliable diagnostic tools.
🌍 Impact and Implications
The implications of this study are significant. The ability of AI models to match or exceed specialist-level performance in diagnosing pediatric exanthems suggests that these technologies could play a crucial role in clinical settings. However, the study also emphasizes the importance of physician oversight, particularly in cases where contextual reasoning is essential for accurate diagnosis. This balance between AI and human expertise could lead to improved diagnostic accuracy and patient outcomes in pediatric care.
🔮 Conclusion
This study highlights the transformative potential of artificial intelligence in the field of pediatric diagnostics. With AI models like ChatGPT and Gemini demonstrating high levels of accuracy, there is a promising future for integrating these technologies into clinical practice. Continued research and development in this area could pave the way for enhanced diagnostic tools that support healthcare professionals in delivering optimal care for children.
💬 Your comments
What are your thoughts on the integration of AI in pediatric diagnostics? Do you believe it can enhance clinical decision-making? 💬 Share your insights in the comments below or connect with us on social media:
Diagnostic accuracy of artificial intelligence versus 263 pediatric clinicians for childhood exanthems.
Abstract
Pediatric exanthematous diseases pose diagnostic challenges because clinical presentations overlap. To determine whether current artificial intelligence (AI) models achieve diagnostic accuracy within or above the performance distribution of pediatric residents and specialists for common rash-associated diseases. Participants and AI models were evaluated against definitive diagnoses confirmed by clinical features, laboratory findings, and consensus of two pediatric infectious disease specialists. A volunteer sample of 263 pediatric clinicians: 107 residents (years 1 through 4) and 156 specialists. Each clinician completed a blinded multiple-choice questionnaire with a clinical photograph and accompanying clinical data per case. The same cases were presented to three AI models: ChatGPT, Gemini, and Copilot. Among 263 clinicians (107 residents, 156 specialists), specialists scored higher than residents (median, 46 [IQR, 42-50] vs 41 [IQR, 36-46]; P < .001; r = 0.32). ChatGPT correctly diagnosed 53 of 61 cases (86.9%), Gemini 50 (82.0%), and Copilot 44 (72.1%). Both ChatGPT and Gemini exceeded the upper bound of the specialist population median 95% CI (47.17). All three AI models scored above the resident 95% CI upper bound (42.76). Disease-level accuracy ranged from 0% (insect bites, all models) to 100% (9 conditions, all models). Fourth-year residents scored higher than first- and second-year residents (P = .001; ε2 = 0.13).Conclusions: AI models given clinical data alongside images matched or exceeded specialist-level performance for pediatric exanthems. Accuracy varied by disease; failures clustered in conditions that require contextual reasoning. Physician oversight remains necessary where AI accuracy is lowest. What is Known • Childhood exanthematous diseases pose significant diagnostic challenges due to their overlapping clinical presentations. • Artificial intelligence models are becoming increasingly proficient in accurately diagnosing these conditions. What is New • In this diagnostic accuracy study of 61 cases evaluated by 263 clinicians and 3 artificial intelligence models, ChatGPT (86.9%) and Gemini (82.0%) exceeded the 95% CI upper bound of the specialist population median. Disease-level accuracy ranged from 0 to 100% across models • Artificial intelligence models given clinical data alongside images can match or exceed specialist-level accuracy for common pediatric exanthems, but failures in context-dependent diagnoses require physician oversight.
Author: [‘Gençeli M’, ‘Metin Akcan Ö’, ‘Soran GB’, ‘Çokbiçer A’, ‘Saraç U’, ‘Üstüntaş T’, ‘Yücel M’, ‘Doğan M’, ‘Yılık Kömür E’, ‘Gençeli S’, ‘Yılmaz Dağlı H’, ‘Sarı M’, ‘Kılıç AO’, ‘Şahin S’, ‘Akkuş A’]
Journal: Eur J Pediatr
Citation: Gençeli M, et al. Diagnostic accuracy of artificial intelligence versus 263 pediatric clinicians for childhood exanthems. Diagnostic accuracy of artificial intelligence versus 263 pediatric clinicians for childhood exanthems. 2026; 185:(unknown pages). doi: 10.1007/s00431-026-07044-9