๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - March 18, 2026

Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents.

๐ŸŒŸ Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

โšก Quick Summary

A recent study evaluated the safety and user perception of ChatGPT in pediatric healthcare, revealing that while 73.2% of responses from ChatGPT3.5 were rated as correct, over a quarter contained inaccuracies, highlighting the need for caution in clinical use. Nevertheless, 88.0% of parents expressed a willingness to continue using the tool for health information.

๐Ÿ” Key Details

  • ๐Ÿ“Š Evaluation: 41 pediatric healthcare questions assessed by 9 experts
  • ๐Ÿงฉ Topics: Vitamin D, food allergies, sleep problems
  • โš™๏ธ Versions: Responses evaluated from ChatGPT3.5 and ChatGPT4
  • ๐Ÿ† Expert Ratings: 73.2% (ChatGPT3.5) and 68.3% (ChatGPT4) rated as correct
  • ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Parent Feedback: 80% rated responses as clear
  • ๐Ÿค Trust Level: 73.1% of parents trusted the information provided

๐Ÿ”‘ Key Takeaways

  • ๐Ÿ“Š ChatGPT3.5 responses were rated as “completely correct” or “correct but not comprehensive” in 73.2% of cases.
  • ๐Ÿ“‰ ChatGPT4 had a slightly lower accuracy rating at 68.3%.
  • ๐Ÿ‘ฉโ€โš•๏ธ Expert consensus
  • ๐Ÿ’ฌ Over 80% of parents found the responses clear and understandable.
  • ๐Ÿค” Trust in ChatGPT’s medical information was expressed by 73.1% of parents.
  • ๐Ÿ”„ No significant differences were found between the two versions in terms of clarity and trust.
  • โš ๏ธ Caution advised: Over a quarter of responses contained inaccuracies.
  • ๐ŸŒฑ Potential for improvement: Future iterations should focus on enhancing accuracy and clarity.

๐Ÿ“š Background

The rise of the internet has significantly increased access to medical information, yet concerns about the quality and accuracy of online content remain prevalent. In pediatric healthcare, where parents often seek guidance for their children’s health issues, tools like ChatGPT could potentially bridge the gap between accessible information and expert advice.

๐Ÿ—’๏ธ Study

This study involved nine experts who independently evaluated 41 pediatric healthcare questions across three topics: vitamin D, food allergies, and sleep problems. Each question was answered by both ChatGPT3.5 and ChatGPT4, with ratings determined by expert consensus. Additionally, 27 parents provided feedback on the clarity and trustworthiness of the responses.

๐Ÿ“ˆ Results

The findings indicated that 73.2% of responses from ChatGPT3.5 were rated as “completely correct” or “correct but not comprehensive,” while 68.3% of ChatGPT4 responses received similar ratings. Notably, over 80% of parents rated the responses as clear, and 73.1% expressed trust in the information provided. However, the study also highlighted that a significant portion of responses contained inaccuracies, emphasizing the need for careful interpretation of the results.

๐ŸŒ Impact and Implications

The implications of this study are significant for both parents and healthcare professionals. While ChatGPT shows promise in expanding access to health information, the presence of inaccuracies raises concerns about its reliability in clinical decision-making. This research underscores the importance of integrating AI tools in a manner that complements professional healthcare advice, ensuring that parents can make informed decisions about their children’s health.

๐Ÿ”ฎ Conclusion

This study highlights the potential of ChatGPT to enhance access to pediatric healthcare information, but it also calls for caution due to the identified inaccuracies. As AI technology continues to evolve, future improvements should focus on enhancing the accuracy and clarity of responses, fostering a better integration between parents and healthcare professionals. The journey towards reliable AI in healthcare is ongoing, and further research is essential to unlock its full potential.

๐Ÿ’ฌ Your comments

What are your thoughts on the use of AI like ChatGPT in pediatric healthcare? Do you believe it can be a valuable resource for parents? Let’s start a conversation! ๐Ÿ’ฌ Leave your thoughts in the comments below or connect with us on social media:

Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents.

Abstract

BACKGROUND/OBJECTIVE: Although the internet has broadened access to medical resources, concerns persist regarding the quality and accuracy of available content. ChatGPT, a general-purpose large language model, may help bridge this gap. This study evaluates its safety and user perception in addressing pediatric healthcare queries.
METHODS: Nine experts independently evaluated 41 questions, with three experts assigned to each of the following topics: vitamin D (15 questions), food allergies (16 questions), and sleep problems (10 questions). Each question was answered separately by ChatGPT3.5 and ChatGPT4. Ratings were determined by expert consensus or, in cases of disagreement, the lowest rating. Additionally, 27 parents evaluated ChatGPT’s responses.
RESULTS: Experts rated 73.2% of responses from ChatGPT3.5 as “completely correct” or “correct but not comprehensive,” while 26.8% were rated as “partially incorrect” or “completely incorrect.” For ChatGPT4, these figures were 68.3% and 31.7%, respectively. The difference in accuracy ratings between the two versions was not statistically significant (chi-square test, pโ€‰=โ€‰.819). Over 80% of parents rated the responses as “completely clear with no further doubts” or “very clear with few doubts,” with no significant difference found between versions (generalized mixed-effects model, pโ€‰=โ€‰.617). A total of 73.1% of parents expressed trust in ChatGPT’s medical information, and 88.0% indicated a likelihood of continued use. Rating trends between parents and clinicians were consistent for both ChatGPT3.5 and ChatGPT4 responses (McNemar’s test, ChatGPT3.5: pโ€‰=โ€‰.481; ChatGPT4: pโ€‰=โ€‰.143).
CONCLUSION: As over a quarter of responses contained expert-identified inaccuracies, the current performance of ChatGPT is insufficient for safe and reliable use in clinical decision-making. Nevertheless, it has potential to expand health information access for parents. However, these findings should be interpreted with caution given the small sample size and potential selection bias regarding parents’ educational backgrounds. Future improvements should enhance accuracy, clarity, and integration between parents and healthcare professionals.

Author: [‘Tan J’, ‘Wang L’, ‘Wang G’, ‘Yang Y’, ‘Jia F’, ‘Chi X’, ‘Xie X’, ‘Li T’, ‘Yang B’, ‘Zhang H’, ‘Gong M’, ‘Wu Y’, ‘Shi X’, ‘Chen L’]

Journal: Digit Health

Citation: Tan J, et al. Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents. Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents. 2026; 12:20552076261427505. doi: 10.1177/20552076261427505

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.