โก Quick Summary
This study evaluated the performance of Large Language Models (LLMs) in supporting clinical decision-making for gynecologic oncology, focusing on three models: ChatGPT-4, Gemini Advanced, and Copilot. The findings revealed that Gemini Advanced outperformed the others with an accuracy of 81.87%, highlighting the potential of LLMs in enhancing medical practice.
๐ Key Details
- ๐ Models Assessed: ChatGPT-4, Gemini Advanced, Copilot
- ๐งฉ Clinical Vignettes: 15 vignettes of varying difficulty
- โ๏ธ Evaluation Method: Responses coded and evaluated by six expert oncologists
- ๐ Accuracy Results: GemAdv: 81.87%, CG-4: 61.60%, Copilot: 70.67%
๐ Key Takeaways
- ๐ GemAdv demonstrated superior accuracy in clinical decision support.
- ๐ก Consistency in responses was a strong point for GemAdv, exceeding 60% correctness daily.
- ๐ฉโ๐ฌ CG-4 showed slight adherence to NCCN guidelines but lacked depth.
- ๐ Depth and Focus of answers are crucial for effective clinical decision-making.
- ๐ Ongoing development of LLMs is essential for their reliability in complex scenarios.
- ๐ Rigorous evaluation is necessary to maximize clinical utility.
๐ Background
The rapid advancement of Large Language Models (LLMs) has opened new avenues for their application in healthcare, particularly in gynecologic oncology. As these models become more integrated into clinical workflows, it is vital to assess their reliability and accuracy to ensure they can effectively support medical professionals in making informed decisions.
๐๏ธ Study
This study aimed to evaluate the performance of three prominent LLMsโChatGPT-4, Gemini Advanced, and Copilotโin providing decision-making support for complex gynecologic cancer cases. Using a set of fifteen clinical vignettes and five open-ended questions based on real patient scenarios, the responses were assessed by six expert gynecologic oncologists using a 5-point Likert scale.
๐ Results
The results indicated that Gemini Advanced achieved an impressive accuracy of 81.87%, significantly outperforming ChatGPT-4 at 61.60% and Copilot at 70.67%. GemAdv consistently provided correct answers more than 60% of the time throughout the testing period, showcasing its potential as a reliable tool in clinical settings.
๐ Impact and Implications
The findings from this study underscore the potential of LLMs, particularly Gemini Advanced, in enhancing clinical practice within gynecologic oncology. By providing accurate and relevant information, these models can assist healthcare professionals in making better-informed decisions, ultimately improving patient outcomes. However, the need for further refinement and rigorous evaluation remains critical to ensure their effectiveness in more complex clinical scenarios.
๐ฎ Conclusion
This study highlights the promising role of Large Language Models in supporting clinical decision-making in gynecologic oncology. With Gemini Advanced demonstrating superior performance, there is a clear opportunity for integrating such technologies into healthcare practices. Continued research and development are essential to fully realize the potential of LLMs in clinical settings.
๐ฌ Your comments
What are your thoughts on the integration of LLMs in clinical decision-making? We would love to hear your insights! ๐ฌ Share your comments below or connect with us on social media:
Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology.
Abstract
OBJECTIVE: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases.
BACKGROUND: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework.
STUDY DESIGN: We assessed three prominent LLMs-ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot-evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence.
RESULTS: GemAdv demonstrated superior accuracy (81.87ย %) compared to both CG-4 (61.60ย %) and Copilot (70.67ย %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60ย % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making.
CONCLUSION: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.
Author: [‘Gumilar KE’, ‘Indraprasta BR’, ‘Faridzi AS’, ‘Wibowo BM’, ‘Herlambang A’, ‘Rahestyningtyas E’, ‘Irawan B’, ‘Tambunan Z’, ‘Bustomi AF’, ‘Brahmantara BN’, ‘Yu ZY’, ‘Hsu YC’, ‘Pramuditya H’, ‘Putra VGE’, ‘Nugroho H’, ‘Mulawardhana P’, ‘Tjokroprawiro BA’, ‘Hedianto T’, ‘Ibrahim IH’, ‘Huang J’, ‘Li D’, ‘Lu CH’, ‘Yang JY’, ‘Liao LN’, ‘Tan M’]
Journal: Comput Struct Biotechnol J
Citation: Gumilar KE, et al. Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology. Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology. 2024; 23:4019-4026. doi: 10.1016/j.csbj.2024.10.050