⚡ Quick Summary
This study evaluated the performance of three large language models (LLMs) in generating patient education materials (PEMs) for thyroid eye disease (TED). The results indicate that LLMs can produce high-quality, understandable, and empathetic content, although improvements in readability are still needed.
🔍 Key Details
- 📊 Models evaluated: ChatGPT-4o, Claude 3.5, Gemini 1.5
- 📝 Content types: Educational brochures and FAQs
- ⚙️ Evaluation metrics: Quality, understandability, actionability, accuracy, empathy
- 📈 Readability tools used: Readable.com (FKGL and SMOG)
🔑 Key Takeaways
- 📚 LLMs can generate high-quality PEMs for TED.
- 💡 Both brochure prompts achieved excellent scores in quality and understandability.
- ❗ Actionability of the generated content was below the desired threshold.
- 🔍 Simplified content was easier to understand than standard responses.
- 🏆 LLMs outperformed Google in generating FAQ responses.
- 📉 Readability remains a challenge, with room for improvement.
- 🌟 Empathy was rated highly in the generated content.
- 🗓️ Study published in the journal Endocrine.
📚 Background
Thyroid eye disease (TED) is a condition that can significantly impact patients’ quality of life. Effective patient education is crucial for improving understanding and management of the disease. Traditional methods of creating educational materials can be time-consuming and may not always meet the needs of diverse patient populations. The advent of large language models offers a promising alternative for generating tailored educational content.
🗒️ Study
This study aimed to assess the capabilities of three LLMs—ChatGPT-4o, Claude 3.5, and Gemini 1.5—in producing PEMs for TED. Researchers designed specific prompts to generate educational brochures and responses to frequently asked questions. The generated content was then systematically evaluated based on various dimensions, including quality, understandability, and empathy.
📈 Results
The results showed that both brochure prompts (A and B) performed excellently, achieving a DISCERN score of ≥4 and a PEMAT Understandability score of ≥70%. However, both prompts fell short of the actionability standard with scores below 70%. Notably, the simplified content generated from prompt B was easier to understand, although it still did not reach the ideal readability level.
🌍 Impact and Implications
The findings of this study highlight the significant potential of LLMs in enhancing patient education for TED. By generating high-quality, understandable, and empathetic content, LLMs can improve patient awareness and understanding of their condition. However, addressing the readability issue is essential for maximizing the effectiveness of these educational materials. The integration of LLMs into patient education could lead to better health outcomes and more informed patients.
🔮 Conclusion
This study underscores the transformative potential of large language models in generating patient education materials for thyroid eye disease. While the models demonstrated strong performance in quality and empathy, there remains a need for improvement in readability. Continued research and development in this area could pave the way for more effective patient education strategies, ultimately enhancing patient care and outcomes.
💬 Your comments
What are your thoughts on the use of large language models for patient education? We would love to hear your insights! 💬 Share your comments below or connect with us on social media:
Large language models: unlocking new potential in patient education for thyroid eye disease.
Abstract
PURPOSE: This study aims to evaluate the performance of three large language models (LLMs) in generating patient education materials (PEMs) for thyroid eye disease (TED), intending to improve patients’ understanding and awareness of TED.
METHODS: We evaluated the performance of ChatGPT-4o, Claude 3.5, and Gemini 1.5 in generating PEMs for TED by designing different prompts. First, we produced TED patient educational brochures based on prompts A and B, respectively. Prompt B asked to make the content simple for sixth graders. Next, we designed two responses to frequently asked questions (FAQs) about TED: standard responses and simplified responses, where the simplified responses were optimized through specific prompts. All generated content was systematically evaluated based on dimensions such as quality, understandability, actionability, accuracy, and empathy. The readability of the content was analyzed using the online tool Readable.com (including FKGL: Flesch-Kincaid Grade Level and SMOG: Simple Measure of Gobbledygook).
RESULTS: Both prompt A and prompt B generated brochures that performed excellently in terms of quality (DISCERN ≥ 4), understandability (PEMAT Understandability ≥70%), accuracy (Score ≥4), and empathy (Score ≥4), with no significant differences between the two. However, both failed to meet the “actionable” standard (PEMAT Actionability <70%). Regarding readability, prompt B was easier to understand than prompt A, although the optimized version of prompt B still did not reach the ideal readability level. Additionally, a comparative analysis of FAQs about TED on Google using LLMs showed that, regardless of whether the response was standard or simplified, the LLM's performance outperformed Google, yielding results similar to those generated by the brochures.
CONCLUSION: Overall, LLMs, as a powerful tool, demonstrate significant potential in generating PEMs for TED. They are capable of producing high-quality, understandable, accurate, and empathetic content, but there is still room for improvement in terms of readability.
Author: [‘Gao Y’, ‘Xu Q’, ‘Zhang O’, ‘Wang H’, ‘Wang Y’, ‘Wang J’, ‘Chen X’]
Journal: Endocrine
Citation: Gao Y, et al. Large language models: unlocking new potential in patient education for thyroid eye disease. Large language models: unlocking new potential in patient education for thyroid eye disease. 2025; (unknown volume):(unknown pages). doi: 10.1007/s12020-025-04339-z