๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - January 7, 2026

Evaluating ChatGPT’s Adherence to Hoarseness Guidelines: A Three-Rater Study Including an Otolaryngologist, an Audiologist, and the Model Itself.

๐ŸŒŸ Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

โšก Quick Summary

A recent study evaluated ChatGPT’s adherence to the 2018 Clinical Practice Guideline on Hoarseness, revealing that 86.7% of its responses were fully consistent with expert recommendations. This suggests a promising role for AI in patient education and clinical decision-making regarding voice disorders.

๐Ÿ” Key Details

  • ๐Ÿ“Š Participants: An otolaryngologist, an audiologist, and ChatGPT
  • ๐Ÿ“ Guideline Statements: 13 statements converted into 15 clinical questions
  • โš™๏ธ Evaluation Scale: Three-point scale (consistent, partially consistent, inconsistent)
  • ๐Ÿ† Overall Agreement: 97.8% across raters

๐Ÿ”‘ Key Takeaways

  • ๐Ÿค– ChatGPT demonstrated high concordance with clinical guidelines for hoarseness.
  • ๐Ÿ“ˆ 86.7% of responses were rated as fully consistent by all evaluators.
  • ๐Ÿ” Two responses were partially consistent, indicating room for improvement.
  • ๐ŸŒŸ No responses were deemed inconsistent, showcasing reliability.
  • ๐Ÿ‘ฉโ€โš•๏ธ Expert oversight is essential for effective use of AI in clinical settings.
  • ๐Ÿ’ก Potential applications include enhancing patient education and decision-making.
  • ๐Ÿ“… Study published in the Journal of Voice, 2026.
  • ๐Ÿ”— DOI: 10.1016/j.jvoice.2025.12.015

๐Ÿ“š Background

Hoarseness, or dysphonia, is a common condition that can significantly impact communication and quality of life. The 2018 Clinical Practice Guideline on Hoarseness provides a framework for evaluating and managing this condition. With the rise of artificial intelligence, particularly large language models like ChatGPT, there is growing interest in their potential to assist in clinical settings, especially in areas such as patient education and decision support.

๐Ÿ—’๏ธ Study

The study aimed to assess how well ChatGPT aligns with established clinical guidelines on hoarseness. Thirteen guideline statements were transformed into 15 open-ended clinical questions, which were then independently answered by ChatGPT. The responses were evaluated by an otolaryngologist, an audiologist, and ChatGPT itself, with a senior otolaryngologist providing final adjudication.

๐Ÿ“ˆ Results

Out of the 15 items evaluated, 13 responses (86.7%) were rated as fully consistent by all three raters. The remaining two responses were rated as partially consistent by one evaluator each. Notably, no responses were classified as inconsistent, leading to an impressive overall agreement rate of 97.8% across all raters.

๐ŸŒ Impact and Implications

The findings from this study highlight the potential of large language models like ChatGPT as valuable adjuncts in the clinical management of voice disorders. By demonstrating high concordance with expert guidelines, these AI tools could enhance patient education and support clinical decision-making, provided they are used under appropriate expert supervision. This could lead to improved outcomes for patients suffering from hoarseness and other voice-related issues.

๐Ÿ”ฎ Conclusion

This study underscores the significant potential of AI in healthcare, particularly in the realm of voice disorders. With high levels of agreement with clinical guidelines, ChatGPT could serve as a useful tool for clinicians, enhancing the quality of care provided to patients. Continued research and development in this area are essential to fully realize the benefits of AI in clinical practice.

๐Ÿ’ฌ Your comments

What are your thoughts on the integration of AI in clinical settings, especially for conditions like hoarseness? We would love to hear your insights! ๐Ÿ’ฌ Leave your comments below or connect with us on social media:

Evaluating ChatGPT’s Adherence to Hoarseness Guidelines: A Three-Rater Study Including an Otolaryngologist, an Audiologist, and the Model Itself.

Abstract

OBJECTIVE: To assess the alignment of Chat Generative Pre-trained Transformer (ChatGPT), based on Generative Pre-trained Transformer 4 (GPT-4) with the 2018 Clinical Practice Guideline on Hoarseness (Dysphonia), using a structured three-rater evaluation involving an otolaryngologist, an audiologist, and ChatGPT.
METHODS: Thirteen guideline statements were converted into 15 open-ended clinical questions and independently answered by ChatGPT. Responses were assessed for consistency with the guideline using a three-point scale (consistent, partially consistent, inconsistent). Evaluations were performed by an otolaryngologist, an audiologist, and ChatGPT itself, with final adjudication by a senior otolaryngologist.
RESULTS: Of 15 items, 13 responses (86.7%) were rated as fully consistent by all three raters. Two responses (13.3%) were rated as partially consistent by one evaluator each. No responses were deemed inconsistent. Overall agreement across raters was 97.8%.
CONCLUSION: ChatGPT’s responses showed high concordance with expert recommendations in the evaluation and management of hoarseness. These findings support the potential of large language models as adjunctive tools for patient education and clinical decision-making in voice disorders, when used under expert oversight.

Author: [‘Durgut M’, ‘Durgut O’]

Journal: J Voice

Citation: Durgut M and Durgut O. Evaluating ChatGPT’s Adherence to Hoarseness Guidelines: A Three-Rater Study Including an Otolaryngologist, an Audiologist, and the Model Itself. Evaluating ChatGPT’s Adherence to Hoarseness Guidelines: A Three-Rater Study Including an Otolaryngologist, an Audiologist, and the Model Itself. 2026; (unknown volume):(unknown pages). doi: 10.1016/j.jvoice.2025.12.015

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.