Quick Summary
A recent study published in The BMJ indicates that many prominent AI chatbots exhibit signs of mild cognitive impairment, similar to early dementia symptoms. This research raises questions about the reliability of these models in medical diagnostics.
Key Findings
- Almost all evaluated large language models (LLMs) displayed cognitive decline when assessed using the Montreal Cognitive Assessment (MoCA) test.
- Older versions of chatbots performed worse, paralleling age-related cognitive decline seen in humans.
- The study challenges the notion that AI will soon replace human doctors due to these cognitive limitations.
Research Methodology
- The study assessed several leading chatbots, including ChatGPT versions 4 and 4o, Claude 3.5, and Gemini versions 1 and 1.5.
- Each chatbot underwent the MoCA test, which evaluates attention, memory, language, visuospatial skills, and executive functions.
- Scores ranged from a maximum of 30, with ChatGPT 4o achieving the highest score of 26, while Gemini 1.0 scored the lowest at 16.
Performance Insights
- All chatbots struggled with visuospatial skills and executive tasks, such as drawing a clock face and completing the trail making task.
- Gemini models notably failed at a delayed recall task, indicating significant cognitive limitations.
- Despite performing well in naming, attention, language, and abstraction tasks, the overall cognitive decline raises concerns about their clinical applicability.
Implications for Medical Use
- The findings suggest that AI chatbots may not be suitable for medical diagnostics due to their cognitive impairments.
- Researchers predict that neurologists may soon find themselves treating AI models exhibiting cognitive decline.
Conclusion
This study highlights the essential differences between human cognition and AI capabilities, emphasizing the limitations of current AI models in clinical settings.