Key Findings from the Research
- The study is the first independent evaluation of ChatGPT Health since its launch in January 2026.
- Researchers found that the tool often under-triaged cases that required emergency care, missing over half of such instances.
- Inconsistent alerts for suicide-risk situations were noted, with the system failing to trigger warnings when users described specific self-harm plans.
Concerns Raised by Experts
Isaac S. Kohane, MD, PhD, emphasized the importance of independent evaluations for AI tools used in medical decision-making, stating:
“When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high.”
Study Methodology
The research team created 60 clinical scenarios across 21 medical specialties, assessing the AI’s recommendations against physician consensus. The scenarios included:
- Minor conditions suitable for home care.
- True medical emergencies requiring immediate attention.
Performance Insights
While ChatGPT Health performed well in clear-cut emergencies, it struggled with nuanced cases where clinical judgment is crucial. For example:
- In an asthma scenario, the AI recognized early signs of respiratory failure but still advised waiting instead of seeking immediate treatment.
Recommendations for Users
The authors advise individuals experiencing concerning symptoms, such as:
- Chest pain
- Shortness of breath
- Severe allergic reactions
- Changes in mental status
to seek medical care directly rather than relying solely on AI guidance. In cases of self-harm, contacting the 988 Suicide and Crisis Lifeline or visiting an emergency department is crucial.
Future Directions
The researchers plan to continue evaluating updates to ChatGPT Health and other AI health tools, focusing on areas like pediatric care and medication safety. They stress the need for ongoing assessments to ensure that advancements in technology lead to safer patient care.
For more information, visit the Mount Sinai Health System.
