โก Quick Summary
A recent study evaluated the safety and user perception of ChatGPT in pediatric healthcare, revealing that while 73.2% of responses from ChatGPT3.5 were rated as correct, over a quarter contained inaccuracies, highlighting the need for caution in clinical use. Nevertheless, 88.0% of parents expressed a willingness to continue using the tool for health information.
๐ Key Details
- ๐ Evaluation: 41 pediatric healthcare questions assessed by 9 experts
- ๐งฉ Topics: Vitamin D, food allergies, sleep problems
- โ๏ธ Versions: Responses evaluated from ChatGPT3.5 and ChatGPT4
- ๐ Expert Ratings: 73.2% (ChatGPT3.5) and 68.3% (ChatGPT4) rated as correct
- ๐จโ๐ฉโ๐งโ๐ฆ Parent Feedback: 80% rated responses as clear
- ๐ค Trust Level: 73.1% of parents trusted the information provided
๐ Key Takeaways
- ๐ ChatGPT3.5 responses were rated as “completely correct” or “correct but not comprehensive” in 73.2% of cases.
- ๐ ChatGPT4 had a slightly lower accuracy rating at 68.3%.
- ๐ฉโโ๏ธ Expert consensus
- ๐ฌ Over 80% of parents found the responses clear and understandable.
- ๐ค Trust in ChatGPT’s medical information was expressed by 73.1% of parents.
- ๐ No significant differences were found between the two versions in terms of clarity and trust.
- โ ๏ธ Caution advised: Over a quarter of responses contained inaccuracies.
- ๐ฑ Potential for improvement: Future iterations should focus on enhancing accuracy and clarity.

๐ Background
The rise of the internet has significantly increased access to medical information, yet concerns about the quality and accuracy of online content remain prevalent. In pediatric healthcare, where parents often seek guidance for their children’s health issues, tools like ChatGPT could potentially bridge the gap between accessible information and expert advice.
๐๏ธ Study
This study involved nine experts who independently evaluated 41 pediatric healthcare questions across three topics: vitamin D, food allergies, and sleep problems. Each question was answered by both ChatGPT3.5 and ChatGPT4, with ratings determined by expert consensus. Additionally, 27 parents provided feedback on the clarity and trustworthiness of the responses.
๐ Results
The findings indicated that 73.2% of responses from ChatGPT3.5 were rated as “completely correct” or “correct but not comprehensive,” while 68.3% of ChatGPT4 responses received similar ratings. Notably, over 80% of parents rated the responses as clear, and 73.1% expressed trust in the information provided. However, the study also highlighted that a significant portion of responses contained inaccuracies, emphasizing the need for careful interpretation of the results.
๐ Impact and Implications
The implications of this study are significant for both parents and healthcare professionals. While ChatGPT shows promise in expanding access to health information, the presence of inaccuracies raises concerns about its reliability in clinical decision-making. This research underscores the importance of integrating AI tools in a manner that complements professional healthcare advice, ensuring that parents can make informed decisions about their children’s health.
๐ฎ Conclusion
This study highlights the potential of ChatGPT to enhance access to pediatric healthcare information, but it also calls for caution due to the identified inaccuracies. As AI technology continues to evolve, future improvements should focus on enhancing the accuracy and clarity of responses, fostering a better integration between parents and healthcare professionals. The journey towards reliable AI in healthcare is ongoing, and further research is essential to unlock its full potential.
๐ฌ Your comments
What are your thoughts on the use of AI like ChatGPT in pediatric healthcare? Do you believe it can be a valuable resource for parents? Let’s start a conversation! ๐ฌ Leave your thoughts in the comments below or connect with us on social media:
Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents.
Abstract
BACKGROUND/OBJECTIVE: Although the internet has broadened access to medical resources, concerns persist regarding the quality and accuracy of available content. ChatGPT, a general-purpose large language model, may help bridge this gap. This study evaluates its safety and user perception in addressing pediatric healthcare queries.
METHODS: Nine experts independently evaluated 41 questions, with three experts assigned to each of the following topics: vitamin D (15 questions), food allergies (16 questions), and sleep problems (10 questions). Each question was answered separately by ChatGPT3.5 and ChatGPT4. Ratings were determined by expert consensus or, in cases of disagreement, the lowest rating. Additionally, 27 parents evaluated ChatGPT’s responses.
RESULTS: Experts rated 73.2% of responses from ChatGPT3.5 as “completely correct” or “correct but not comprehensive,” while 26.8% were rated as “partially incorrect” or “completely incorrect.” For ChatGPT4, these figures were 68.3% and 31.7%, respectively. The difference in accuracy ratings between the two versions was not statistically significant (chi-square test, pโ=โ.819). Over 80% of parents rated the responses as “completely clear with no further doubts” or “very clear with few doubts,” with no significant difference found between versions (generalized mixed-effects model, pโ=โ.617). A total of 73.1% of parents expressed trust in ChatGPT’s medical information, and 88.0% indicated a likelihood of continued use. Rating trends between parents and clinicians were consistent for both ChatGPT3.5 and ChatGPT4 responses (McNemar’s test, ChatGPT3.5: pโ=โ.481; ChatGPT4: pโ=โ.143).
CONCLUSION: As over a quarter of responses contained expert-identified inaccuracies, the current performance of ChatGPT is insufficient for safe and reliable use in clinical decision-making. Nevertheless, it has potential to expand health information access for parents. However, these findings should be interpreted with caution given the small sample size and potential selection bias regarding parents’ educational backgrounds. Future improvements should enhance accuracy, clarity, and integration between parents and healthcare professionals.
Author: [‘Tan J’, ‘Wang L’, ‘Wang G’, ‘Yang Y’, ‘Jia F’, ‘Chi X’, ‘Xie X’, ‘Li T’, ‘Yang B’, ‘Zhang H’, ‘Gong M’, ‘Wu Y’, ‘Shi X’, ‘Chen L’]
Journal: Digit Health
Citation: Tan J, et al. Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents. Safety and user perception of general-purpose large language models in pediatric healthcare: Evaluations of ChatGPT by doctors and parents. 2026; 12:20552076261427505. doi: 10.1177/20552076261427505