🧑🏼‍💻 Research - December 1, 2024

Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.

['Gumilar KE', 'Indraprasta BR', 'Faridzi AS', 'Wibowo BM', 'Herlambang A', 'Rahestyningtyas E', 'Irawan B', 'Tambunan Z', 'Bustomi AF', 'Brahmantara BN', 'Yu ZY', 'Hsu YC', 'Pramuditya H', 'Putra VGE', 'Nugroho H', 'Mulawardhana P', 'Tjokroprawiro BA', 'Hedianto T', 'Ibrahim IH', 'Huang J', 'Li D', 'Lu CH', 'Yang JY', 'Liao LN', 'Tan M']

🌟 Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

⚡ Quick Summary

This study evaluated the performance of Large Language Models (LLMs) in supporting clinical decision-making for gynecologic oncology, focusing on three models: ChatGPT-4, Gemini Advanced, and Copilot. The findings revealed that Gemini Advanced outperformed the others with an accuracy of 81.87%, highlighting the potential of LLMs in enhancing medical practice.

🔍 Key Details

📊 Models Assessed: ChatGPT-4, Gemini Advanced, Copilot
🧩 Clinical Vignettes: 15 vignettes of varying difficulty
⚙️ Evaluation Method: Responses coded and evaluated by six expert oncologists
🏆 Accuracy Results: GemAdv: 81.87%, CG-4: 61.60%, Copilot: 70.67%

🔑 Key Takeaways

📊 GemAdv demonstrated superior accuracy in clinical decision support.
💡 Consistency in responses was a strong point for GemAdv, exceeding 60% correctness daily.
👩‍🔬 CG-4 showed slight adherence to NCCN guidelines but lacked depth.
🏆 Depth and Focus of answers are crucial for effective clinical decision-making.
🌍 Ongoing development of LLMs is essential for their reliability in complex scenarios.
🔍 Rigorous evaluation is necessary to maximize clinical utility.

📚 Background

The rapid advancement of Large Language Models (LLMs) has opened new avenues for their application in healthcare, particularly in gynecologic oncology. As these models become more integrated into clinical workflows, it is vital to assess their reliability and accuracy to ensure they can effectively support medical professionals in making informed decisions.

🗒️ Study

This study aimed to evaluate the performance of three prominent LLMs—ChatGPT-4, Gemini Advanced, and Copilot—in providing decision-making support for complex gynecologic cancer cases. Using a set of fifteen clinical vignettes and five open-ended questions based on real patient scenarios, the responses were assessed by six expert gynecologic oncologists using a 5-point Likert scale.

📈 Results

The results indicated that Gemini Advanced achieved an impressive accuracy of 81.87%, significantly outperforming ChatGPT-4 at 61.60% and Copilot at 70.67%. GemAdv consistently provided correct answers more than 60% of the time throughout the testing period, showcasing its potential as a reliable tool in clinical settings.

🌍 Impact and Implications

The findings from this study underscore the potential of LLMs, particularly Gemini Advanced, in enhancing clinical practice within gynecologic oncology. By providing accurate and relevant information, these models can assist healthcare professionals in making better-informed decisions, ultimately improving patient outcomes. However, the need for further refinement and rigorous evaluation remains critical to ensure their effectiveness in more complex clinical scenarios.

🔮 Conclusion

This study highlights the promising role of Large Language Models in supporting clinical decision-making in gynecologic oncology. With Gemini Advanced demonstrating superior performance, there is a clear opportunity for integrating such technologies into healthcare practices. Continued research and development are essential to fully realize the potential of LLMs in clinical settings.

💬 Your comments

What are your thoughts on the integration of LLMs in clinical decision-making? We would love to hear your insights! 💬 Share your comments below or connect with us on social media:

Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.

Abstract

OBJECTIVE: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases.
BACKGROUND: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework.
STUDY DESIGN: We assessed three prominent LLMs-ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot-evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence.
RESULTS: GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making.
CONCLUSION: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.

Author: [‘Gumilar KE’, ‘Indraprasta BR’, ‘Faridzi AS’, ‘Wibowo BM’, ‘Herlambang A’, ‘Rahestyningtyas E’, ‘Irawan B’, ‘Tambunan Z’, ‘Bustomi AF’, ‘Brahmantara BN’, ‘Yu ZY’, ‘Hsu YC’, ‘Pramuditya H’, ‘Putra VGE’, ‘Nugroho H’, ‘Mulawardhana P’, ‘Tjokroprawiro BA’, ‘Hedianto T’, ‘Ibrahim IH’, ‘Huang J’, ‘Li D’, ‘Lu CH’, ‘Yang JY’, ‘Liao LN’, ‘Tan M’]

Journal: Comput Struct Biotechnol J

Citation: Gumilar KE, et al. Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology. Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology. 2024; 23:4019-4026. doi: 10.1016/j.csbj.2024.10.050

🧑🏼‍💻 Research - December 1, 2024

Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.

⚡ Quick Summary

🔍 Key Details

🔑 Key Takeaways

📚 Background

🗒️ Study

📈 Results

🌍 Impact and Implications

🔮 Conclusion

💬 Your comments

Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.

Abstract

Leave a ReplyCancel reply