Follow us
๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - December 26, 2024

Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions.

๐ŸŒŸ Stay Updated!
Join Dr. Ailexa’s channels to receive the latest insights in health and AI.

โšก Quick Summary

A recent study evaluated the accuracy and quality of responses generated by the AI chatbot ChatGPT in healthcare-related inquiries. The findings revealed that while ChatGPT performed well in certain areas, it is not yet reliable for clinical decision-making due to inconsistencies and inaccuracies in its responses.

๐Ÿ” Key Details

  • ๐Ÿงช Study Focus: Assessing ChatGPT’s responses to healthcare inquiries
  • ๐Ÿ” Methodology: 18 open-ended questions across three clinical areas
  • ๐Ÿ‘ฅ Investigators: Five independent reviewers
  • ๐Ÿ“Š Evaluation Scale: 4-point scale for response quality
  • ๐Ÿ“š References: Accuracy checked against established professional resources

๐Ÿ”‘ Key Takeaways

  • ๐Ÿค– ChatGPT’s Performance: More accurate in “what” questions (8 out of 12) compared to “why” and “how” questions.
  • โš ๏ธ Inconsistencies: Different responses to the same question were noted.
  • โŒ Errors Identified: Calculation mistakes, unit misuse, and protocol errors could lead to harmful clinical decisions.
  • ๐Ÿ“œ Invalid References: Citations provided by ChatGPT were sometimes non-existent in the literature.
  • ๐Ÿšซ Coaching Role: ChatGPT is not ready to coach healthcare learners or professionals.
  • ๐Ÿ”„ Consensus Method: The Delphi method was used to ensure consistency among reviewers.
  • ๐Ÿ“‰ Clinical Decision-Making: The unreliability of ChatGPT poses serious risks in clinical settings.

๐Ÿ“š Background

The rise of interactive artificial intelligence tools like ChatGPT has sparked interest in their potential applications within healthcare. However, the reliability of these tools as sources of information for healthcare providers and trainees remains largely unexamined. Understanding the accuracy and quality of AI-generated responses is crucial for ensuring safe and effective patient care.

๐Ÿ—’๏ธ Study

This study aimed to rigorously assess the consistency, quality, and accuracy of ChatGPT’s responses to healthcare-related inquiries. A total of 18 open-ended questions were posed to the AI, covering various clinical scenarios. The responses were evaluated by five independent investigators who ranked them based on a standardized scale.

๐Ÿ“ˆ Results

The analysis revealed that ChatGPT provided more accurate responses to “what” questions, achieving an accuracy rate of 66.7%. However, the chatbot’s performance was less reliable for “why” and “how” questions. Notably, inconsistencies were observed, with ChatGPT occasionally offering completely different answers to the same question. Errors in calculations and references were also identified, raising concerns about the potential for harmful clinical decisions.

๐ŸŒ Impact and Implications

The findings of this study highlight significant implications for the use of AI in healthcare. While AI tools like ChatGPT can assist in generating information, their current limitations necessitate caution. The potential for misleading information could adversely affect patient care, emphasizing the need for healthcare professionals to critically evaluate AI-generated content before making clinical decisions.

๐Ÿ”ฎ Conclusion

This study underscores the limitations of AI chatbots in healthcare contexts. While ChatGPT shows promise in generating responses, its inconsistencies and inaccuracies render it unsuitable for clinical decision-making at this time. Continued research and development are essential to enhance the reliability of AI tools in healthcare, ensuring they can eventually support healthcare professionals effectively.

๐Ÿ’ฌ Your comments

What are your thoughts on the use of AI in healthcare? Do you believe tools like ChatGPT can be improved for clinical applications? ๐Ÿ’ฌ Share your insights in the comments below or connect with us on social media:

Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions.

Abstract

BACKGROUND: Interactive artificial intelligence tools such as ChatGPT have gained popularity, yet little is known about their reliability as a reference tool for healthcare-related information for healthcare providers and trainees. The objective of this study was to assess the consistency, quality, and accuracy of the responses generated by ChatGPT on healthcare-related inquiries.
METHODS: A total of 18 open-ended questions including six questions in three defined clinical areas (2 each to address “what”, “why”, and “how”, respectively) were submitted to ChatGPT v3.5 based on real-world usage experience. The experiment was conducted in duplicate using 2 computers. Five investigators independently ranked each response using a 4-point scale to rate the quality of the bot’s responses. The Delphi method was used to compare each investigator’s score with the goal of reaching at least 80% consistency. The accuracy of the responses was checked using established professional references and resources. When the responses were in question, the bot was asked to provide reference material used for the investigators to determine the accuracy and quality. The investigators determined the consistency, accuracy, and quality by establishing a consensus.
RESULTS: The speech pattern and length of the responses were consistent within the same user but different between users. Occasionally, ChatGPT provided 2 completely different responses to the same question. Overall, ChatGPT provided more accurate responses (8 out of 12) to the “what” questions with less reliable performance to the “why” and “how” questions. We identified errors in calculation, unit of measurement, and misuse of protocols by ChatGPT. Some of these errors could result in clinical decisions leading to harm. We also identified citations and references shown by ChatGPT that did not exist in the literature.
CONCLUSIONS: ChatGPT is not ready to take on the coaching role for either healthcare learners or healthcare professionals. The lack of consistency in the responses to the same question is problematic for both learners and decision-makers. The intrinsic assumptions made by the chatbot could lead to erroneous clinical decisions. The unreliability in providing valid references is a serious flaw in using ChatGPT to drive clinical decision making.

Author: [‘Shiferaw MW’, ‘Zheng T’, ‘Winter A’, ‘Mike LA’, ‘Chan LN’]

Journal: BMC Med Inform Decis Mak

Citation: Shiferaw MW, et al. Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions. Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions. 2024; 24:404. doi: 10.1186/s12911-024-02824-5

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.