๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - April 30, 2026

Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody-Associated Disease: Cross-Sectional Content Analysis.

๐ŸŒŸ Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

โšก Quick Summary

A recent study evaluated the responses of AI chatbots to a standardized patient query regarding Myelin Oligodendrocyte Glycoprotein Antibody-Associated Disease (MOGAD). The findings revealed significant variations in scientific quality, understandability, and transparency across different platforms, emphasizing the need for careful evaluation of chatbot outputs in complex medical conditions.

๐Ÿ” Key Details

  • ๐Ÿง  Condition Studied: Myelin Oligodendrocyte Glycoprotein Antibody-Associated Disease (MOGAD)
  • ๐Ÿค– Platforms Analyzed: 10 widely accessible AI chatbot platforms
  • ๐Ÿ” Evaluation Tools: DISCERN, PEMAT-P, WRR, FKGL, Coleman-Liau Index
  • ๐Ÿ“Š Study Design: Cross-sectional content analysis

๐Ÿ”‘ Key Takeaways

  • ๐Ÿ“ˆ Significant differences were found in quality metrics across chatbot platforms (P<.001).
  • ๐Ÿ” Search-focused platforms had higher understandability scores compared to conversation-focused ones.
  • ๐Ÿ’ฐ Paid-access platforms outperformed free-access platforms in quality and transparency metrics.
  • ๐Ÿ“š Readability remained limited, with all outputs exceeding recommended public health thresholds (FKGLโ‰ฅ8).
  • ๐Ÿค High interrater reliability was observed among neurologists evaluating the responses.
  • ๐ŸŒ The study highlights the importance of context-sensitive evaluations for chatbot outputs in rare diseases.

๐Ÿ“š Background

The rise of large language model-based chatbots has transformed how the public accesses medical information. However, the quality and clarity of information provided by these tools, especially for rare and complex neurological conditions like MOGAD, remain uncertain. This study aims to shed light on these issues by evaluating chatbot responses to a patient-centered query.

๐Ÿ—’๏ธ Study

Conducted by a team of neurologists, this study involved querying ten different AI chatbot platforms with the question, “What is MOGAD, and how is MOGAD treated?” The responses were anonymized and assessed for scientific quality, understandability, citation transparency, and readability using established evaluation tools.

๐Ÿ“ˆ Results

The analysis revealed that search-focused platforms achieved a median PEMAT-P score of 52.6, indicating better understandability compared to conversation-focused platforms, which had a median score of 46.7 (P=.04). Furthermore, paid-access platforms scored significantly higher in DISCERN (median 42 vs 33, P<.001) and WRR (median 26.8 vs 10.7, P<.001) compared to free-access platforms. Despite these differences, no significant variations were found in response length or readability metrics.

๐ŸŒ Impact and Implications

The findings of this study underscore the necessity for a context-sensitive evaluation of AI chatbot outputs, particularly for rare and clinically complex conditions like MOGAD. As these technologies become more integrated into healthcare, ensuring the delivery of accurate and understandable information is crucial for patient education and decision-making.

๐Ÿ”ฎ Conclusion

This study highlights the variability in AI chatbot responses regarding MOGAD, revealing significant differences in quality and understandability. As AI continues to evolve in the medical field, ongoing evaluation and improvement of these tools will be essential to enhance patient access to reliable health information. The future of AI in healthcare holds promise, but it requires careful scrutiny to ensure that patients receive the best possible guidance.

๐Ÿ’ฌ Your comments

What are your thoughts on the use of AI chatbots in healthcare? Do you believe they can effectively support patient education? ๐Ÿ’ฌ Share your insights in the comments below or connect with us on social media:

Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody-Associated Disease: Cross-Sectional Content Analysis.

Abstract

BACKGROUND: Large language model-based chatbots are increasingly used by the public to access medical information. Although these tools can improve access and convenience, their quality, clarity, and transparency remain uncertain for rare and diagnostically complex neurological conditions, such as myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD).
OBJECTIVE: This study aimed to evaluate the scientific quality, understandability, citation transparency, and readability of responses generated by widely used artificial intelligence chatbot platforms to a standardized, patient-centered query on MOGAD.
METHODS: We conducted a cross-sectional content analysis using the query, “What is MOGAD, and how is MOGAD treated?” Ten widely accessible chatbot platforms were queried once on the same day in new sessions. Responses were anonymized and independently evaluated by 7 blinded neurologists using DISCERN (treatment-related scientific quality), Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), and the Web Resource Rating (WRR; citation transparency). Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Coleman-Liau Index, and word count was recorded. Platforms were compared by functional orientation and the access model. Mann-Whitney U and Kruskal-Wallis tests with Dunn post hoc tests were used. Interrater reliability was assessed using intraclass correlation coefficients.
RESULTS: Significant differences were observed across platforms for DISCERN, PEMAT-P, and WRR scores (all P<.001). Search-focused platforms achieved higher understandability than conversation-focused platforms (median PEMAT-P 52.6, IQR 47.4-54 vs 46.7, IQR 42-47.3; P=.04), whereas conversation-focused platforms had higher WRR scores (median 26.8, IQR 19.6-26.8 vs 19.6, IQR 19.6-25.9; P=.001). DISCERN scores did not differ significantly by functional orientation (P=.11). Paid-access platforms outperformed free-access platforms in DISCERN (median 42, IQR 36-45 vs 33, IQR 23.8-41.3; P<.001), PEMAT-P (median 52.6, IQR 46-54 vs 46, IQR 26.3-47.4; P=.002), and WRR (median 26.8, IQR 23.2-26.8 vs 10.7, IQR 3.57-19.6; P<.001). However, no statistically significant differences were observed between paid and free platforms in response length (median word count 336, IQR 271-369 vs 206, IQR 116-294; P=.11) or readability metrics. FKGL scores were comparable between paid and free outputs (median 17.54, IQR 16.6-18.4 vs 17.56, IQR 16.5-17.6; P=.61), and Coleman-Liau Index values similarly showed no significant difference by access model (median 21.30, IQR 20.6-22.3 vs 21.71, IQR 20.9-22.1; P=.91). Readability remained limited: all outputs exceeded recommended public health readability thresholds (FKGLโ‰ฅ8). High interrater agreement was observed (intraclass correlation coefficient=0.902 for DISCERN, 0.887 for WRR, and 0.838 for PEMAT-P). CONCLUSIONS: Artificial intelligence chatbot responses to a patient-centered MOGAD query varied substantially in scientific quality, understandability, transparency, and readability. Search-focused systems were more understandable, whereas conversation-focused systems showed greater citation transparency. Paid-access platforms achieved higher quality and transparency scores, without differences in readability or response length. All outputs exceeded recommended public health readability thresholds. These findings highlight the need for context-sensitive evaluation of chatbot outputs in rare and clinically complex conditions such as MOGAD.

Author: [‘Sรถnmez MT’, ‘Yetkin MF’, ‘Mehdiyev DA’, ‘ร‡elik ND’, ‘Ercan MB’, ‘ร–ztรผrk P’, ‘AkboฤŸa YE’, ‘Koรง ER’, ‘Mungan S’]

Journal: JMIR Med Inform

Citation: Sรถnmez MT, et al. Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody-Associated Disease: Cross-Sectional Content Analysis. Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody-Associated Disease: Cross-Sectional Content Analysis. 2026; 14:e81720. doi: 10.2196/81720

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.