Follow us
pubmed meta image 2
๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - January 12, 2025

Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.

๐ŸŒŸ Stay Updated!
Join Dr. Ailexa’s channels to receive the latest insights in health and AI.

โšก Quick Summary

This groundbreaking study compared the decision-making capabilities of Large Language Models (LLMs) with traditional human multidisciplinary tumor board recommendations in the field of otorhinolaryngology. The results showed that both web-based and locally run LLMs achieved high concordance rates, with Llama 3 outperforming ChatGPT-4o in therapeutic recommendations.

๐Ÿ” Key Details

  • ๐Ÿ“Š Cases analyzed: 25 simulated tumor board cases
  • ๐Ÿ‘ฅ Participants: Multidisciplinary team (MDT) of specialists
  • โš™๏ธ Technologies used: ChatGPT-4o (web-based) and Llama 3 (locally run)
  • ๐Ÿ† Concordance rates: ChatGPT-4o: 84%, Llama 3: 92%

๐Ÿ”‘ Key Takeaways

  • ๐Ÿค– LLMs can provide viable therapeutic recommendations in ORL head and neck surgery.
  • ๐Ÿ”’ Llama 3 addresses data protection concerns effectively as a locally run model.
  • ๐Ÿ“ˆ ChatGPT-4o identified all first-line therapy options in 64% of cases.
  • ๐Ÿ“‰ Llama 3 identified all first-line therapy options in 60% of cases.
  • ๐Ÿ’ก MDT members indicated that LLM recommendations could enhance decisions in 17% of assessments.
  • ๐Ÿ“ Medical adequacy ratings were 4.7 for ChatGPT-4o and 4.3 for Llama 3 on a six-point scale.
  • ๐ŸŒŸ Both models should augment rather than replace human decision-making at this stage.

๐Ÿ“š Background

Tumor boards play a crucial role in modern cancer treatment, bringing together specialists from various fields to make informed decisions about patient care. With the rise of artificial intelligence and machine learning, there is growing interest in the potential of LLMs to assist in these complex decision-making processes. However, concerns regarding data protection and patient confidentiality have limited the adoption of web-based models, making this study particularly relevant.

๐Ÿ—’๏ธ Study

This pioneering study involved a multidisciplinary team of specialists who analyzed 25 simulated tumor board cases. The same cases were input into both ChatGPT-4o and Llama 3 using structured prompts. The objective was to assess the concordance between the LLMs’ recommendations and those of the human MDT, focusing on their ability to distinguish between curative and palliative treatment strategies.

๐Ÿ“ˆ Results

The findings revealed that ChatGPT-4o achieved an impressive 84% concordance with the MDT, while Llama 3 surpassed this with a 92% concordance. Both models demonstrated the ability to identify first-line therapy options, with ChatGPT-4o presenting all options in 52% of cases and Llama 3 in 48% of cases. The mean ratings for medical adequacy were also favorable, indicating that the LLMs provided clinically relevant recommendations.

๐ŸŒ Impact and Implications

The implications of this study are significant for the future of cancer treatment decision-making. By integrating LLMs into tumor board discussions, healthcare professionals can potentially enhance their decision-making processes while addressing data protection concerns. This could lead to improved patient outcomes and more efficient use of resources in the healthcare system.

๐Ÿ”ฎ Conclusion

This study highlights the promising role of LLMs in supporting decision-making in otorhinolaryngology and head and neck surgery. While Llama 3 shows particular promise as a clinical tool, it is essential to remember that these models should complement, not replace, human expertise. Continued research and development in this area could pave the way for more advanced applications of AI in healthcare.

๐Ÿ’ฌ Your comments

What are your thoughts on the integration of AI in medical decision-making? We would love to hear your insights! ๐Ÿ’ฌ Share your comments below or connect with us on social media:

Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.

Abstract

INTRODUCTION: Tumor boards are a cornerstone of modern cancer treatment. Given their advanced capabilities, the role of Large Language Models (LLMs) in generating tumor board decisions for otorhinolaryngology (ORL) head and neck surgery is gaining increasing attention. However, concerns over data protection and the use of confidential patient information in web-based LLMs have restricted their widespread adoption and hindered the exploration of their full potential. In this first study of its kind we compared standard human multidisciplinary tumor board recommendations (MDT) against a web-based LLM (ChatGPT-4o) and a locally run LLM (Llama 3) addressing data protection concerns.
MATERIAL AND METHODS: Twenty-five simulated tumor board cases were presented to an MDT composed of specialists from otorhinolaryngology, craniomaxillofacial surgery, medical oncology, radiology, radiation oncology, and pathology. This multidisciplinary team provided a comprehensive analysis of the cases. The same cases were input into ChatGPT-4o and Llamaย 3 using structured prompts, and the concordance between the LLMs’ and MDT’s recommendations was assessed. Four MDT members evaluated the LLMs’ recommendations in terms of medical adequacy (using a six-point Likert scale) and whether the information provided could have influenced the MDT’s original recommendations.
RESULTS: ChatGPT-4o showed 84% concordance (21 out of 25 cases) and Llama 3 demonstrated 92% concordance (23 out of 25 cases) with the MDT in distinguishing between curative and palliative treatment strategies. In 64% of cases (16/25) ChatGPT-4o and in 60% of cases (15/25) Llama, identified all first-line therapy options considered by the MDT, though with varying priority. ChatGPT-4o presented all the MDT’s first-line therapies in 52% of cases (13/25), while Llama 3 offered a homologous treatment strategy in 48% of cases (12/25). Additionally, both models proposed at least one of the MDT’s first-line therapies as their top recommendation in 28% of cases (7/25). The ratings for medical adequacy yielded a mean score of 4.7 (IQR: 4-6) for ChatGPT-4o and 4.3 (IQR: 3-5) for Llama 3. In 17% of the assessments (33/200), MDT members indicated that the LLM recommendations could potentially enhance the MDT’s decisions.
DISCUSSION: This study demonstrates the capability of both LLMs to provide viable therapeutic recommendations in ORL head and neck surgery. Llama 3, operating locally, bypasses many data protection issues and shows promise as a clinical tool to support MDT decisions. However at present, LLMs should augment rather than replace human decision-making.

Author: [‘Buhr CR’, ‘Ernst BP’, ‘Blaikie A’, ‘Smith H’, ‘Kelsey T’, ‘Matthias C’, ‘Fleischmann M’, ‘Jungmann F’, ‘Alt J’, ‘Brandts C’, ‘Kรคmmerer PW’, ‘Foersch S’, ‘Kuhn S’, ‘Eckrich J’]

Journal: Eur Arch Otorhinolaryngol

Citation: Buhr CR, et al. Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery. Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery. 2025; (unknown volume):(unknown pages). doi: 10.1007/s00405-024-09153-3

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.