โก Quick Summary
This study evaluated the quality, reliability, and readability of responses from ChatGPT-4 regarding Allergen Immunotherapy (AIT). The findings revealed that while ChatGPT-4 provided generally well-structured information, it lacked sufficient reliability and was difficult for patients to understand.
๐ Key Details
- ๐ Questions analyzed: 24 questions related to AIT
- ๐งฉ Evaluation tools: DISCERN instrument, JAMA Benchmark criteria, Flesch-Kincaid Readability Tests
- โ๏ธ Study focus: Quality, reliability, and readability of ChatGPT-4 responses
- ๐ Overall rating: Fair quality, insufficient reliability, difficult readability
๐ Key Takeaways
- ๐ ChatGPT-4 responses were rated as “fair quality” on the DISCERN instrument.
- ๐ก Strengths included good-quality responses on preventive effects of AIT in children.
- ๐ฉโ๐ฌ JAMA Benchmark scores indicated “insufficient information” due to lack of authorship and disclosure.
- ๐ Readability was classified as “very difficult,” requiring a college graduate-level understanding.
- ๐ค Healthcare professionals should supervise the use of ChatGPT-4 in clinical settings.
- ๐ Study conducted in accordance with EAACI clinical guidelines.
- ๐ PMID: 41319041.

๐ Background
Allergen Immunotherapy (AIT) is a crucial treatment for allergic diseases, offering potential long-term relief. However, patients often experience uncertainty and seek information from various sources, including AI tools like ChatGPT-4. Understanding the quality of information provided by such tools is essential for ensuring patient safety and informed decision-making.
๐๏ธ Study
The study aimed to assess the responses of ChatGPT-4 regarding AIT by introducing 24 carefully selected questions. Independent reviewers utilized validated instruments to evaluate the responses based on established clinical guidelines, focusing on quality, reliability, and readability.
๐ Results
The results indicated that ChatGPT-4’s responses were generally rated as “fair quality” according to the DISCERN instrument. However, the JAMA Benchmark scores highlighted a significant gap in reliability, with a median score of 0-1, primarily due to missing authorship and disclosure. Furthermore, the readability analysis revealed that most responses were classified as “very difficult” to understand, necessitating a college graduate-level comprehension.
๐ Impact and Implications
The findings of this study underscore the importance of critically evaluating AI-generated health information. While ChatGPT-4 can provide structured responses, its insufficient reliability and difficult readability raise concerns for patient-directed use. Healthcare professionals must remain vigilant and supervise the application of such tools, especially in sensitive areas like dosing and safety.
๐ฎ Conclusion
This study highlights the potential and limitations of using AI tools like ChatGPT-4 in the context of Allergen Immunotherapy. While the model demonstrates the ability to generate structured responses, the lack of reliability and readability for patients suggests that further development and specialized models are necessary. Continued research in this area is essential to enhance the utility of AI in healthcare.
๐ฌ Your comments
What are your thoughts on the use of AI in providing health information? Do you believe tools like ChatGPT-4 can be improved for better patient understanding? ๐ฌ Share your insights in the comments below or connect with us on social media:
Evaluation of the Quality and Reliability of ChatGPT-4’s Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment.
Abstract
BACKGROUND: Chat Generative Pre-Trained Transformer 4 (ChatGPT-4) represents an advancing large language model (LLM) with potential applications in medical education and patient care. While Allergen Immunotherapy (AIT) can change the course of allergic diseases, it can also bring uncertainty to patients, who turn to readily available resources such as ChatGPT-4 to address these doubts. This study aimed to use validated tools to evaluate the information provided by ChatGPT-4 regarding AIT in terms of quality, reliability, and readability.
METHODS: In accordance with EAACI clinical guidelines about AIT, 24 questions were selected and introduced in ChatGPT-4. Independent reviewers evaluated ChatGPT-4 responses using three validated tools: the DISCERN instrument (quality), JAMA Benchmark criteria (reliability), and Flesch-Kincaid Readability Tests (readability). Descriptive statistics summarized findings across categories.
RESULTS: ChatGPT-4 responses were generally rated as “fair quality” on DISCERN, with strengths in classification/formulations and special populations. Notably, the tool provided good-quality responses on the preventive effects of AIT in children and premedication to reduce adverse reactions. However, JAMA Benchmark scores consistently indicated “insufficient information” (medianย =ย 0-1), primarily due to absent authorship, attribution, disclosure, and currency. Readability analyses revealed a college graduate-level requirement, with most responses classified as “very difficult” to understand. Overall, ChatGPT-4 demonstrated fair quality, insufficient reliability, and difficult readability for patients.
CONCLUSIONS: ChatGPT-4 provides generally well-structured responses on AIT but lacks reliability and readability for clinical or patient-directed use. Until specialized, reference-based models are developed, healthcare professionals should supervise its use, particularly in sensitive areas such as dosing and safety.
Author: [‘Cherrez-Ojeda I’, ‘Zuberbier T’, ‘Rodas-Valero G’, ‘Sanchez J’, ‘Rudenko M’, ‘Dramburg S’, ‘Demoly P’, ‘Caimmi D’, ‘Gรณmez RM’, ‘Ramon GD’, ‘Fouda GE’, ‘Quimby KR’, ‘Chong-Neto H’, ‘Llosa OC’, ‘Larco JI’, ‘Monge Ortega OP’, ‘Faytong-Haro M’, ‘Pfaar O’, ‘Bousquet J’, ‘Robles-Velasco K’]
Journal: Clin Transl Allergy
Citation: Cherrez-Ojeda I, et al. Evaluation of the Quality and Reliability of ChatGPT-4’s Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment. Evaluation of the Quality and Reliability of ChatGPT-4’s Responses on Allergen Immunotherapy Using Validated Instruments for Health Information Quality Assessment. 2025; 15:e70130. doi: 10.1002/clt2.70130