โก Quick Summary
This study explores the use of generative large language models (LLMs) for COVID-19 severity prediction through a conversational AI app, demonstrating their effectiveness in low-data scenarios. The results indicate that LLMs can provide real-time, personalized risk assessments without the need for coding, marking a significant advancement in healthcare technology.
๐ Key Details
- ๐ Dataset: 393 pediatric patients
- โ๏ธ Technology: Pretrained generative LLMs (LLaMA2-7b, Flan-T5-xl)
- ๐ Performance Metric: Area Under the Curve (AUC)
- ๐ฑ Application: Mobile app for real-time COVID-19 severity risk assessment
๐ Key Takeaways
- ๐ค Generative LLMs can effectively assess disease severity in low-data environments.
- ๐ In zero-shot scenarios, the T0-3b-T model achieved an AUC of 0.75.
- ๐ก Traditional classifiers like logistic regression performed less effectively in low-data settings.
- ๐ At 2-shot settings, Flan-T5-xl-T reached an AUC of 0.69, outperforming traditional models.
- ๐ The mobile app provides personalized insights through attention-based feature importance.
- ๐ LLMs excel in handling unstructured inputs, making them adaptable to clinical settings.
- ๐ This study highlights the potential of LLM-powered AI in healthcare for real-time decision-making support.
๐ Background
The integration of large language models (LLMs) into healthcare represents a transformative shift in how we approach disease risk assessment. Traditional machine learning methods often depend on structured data and coding, which can limit their application in dynamic clinical environments. This study aims to leverage the capabilities of generative LLMs to provide a more flexible and user-friendly solution for assessing COVID-19 severity.
๐๏ธ Study
Conducted by a team of researchers, this study fine-tuned generative LLMs using few-shot natural language examples from a dataset of 393 pediatric patients. The goal was to develop a mobile app that enables real-time, no-code COVID-19 severity risk assessments through interactive clinician-patient conversations. The performance of LLMs was compared against traditional machine learning classifiers, such as logistic regression and random forest, to evaluate their effectiveness in various experimental settings.
๐ Results
The findings revealed that generative LLMs performed exceptionally well in low-data scenarios. In zero-shot settings, the T0-3b-T model achieved an impressive AUC of 0.75. In contrast, traditional classifiers like logistic regression and random forest struggled, achieving AUCs of 0.57 at 2-shot settings. The results underscore the adaptability of LLMs in clinical settings, particularly when data is limited.
๐ Impact and Implications
The implications of this study are profound. By utilizing generative LLMs, healthcare providers can offer real-time, personalized risk assessments without the need for extensive coding or structured data. This capability not only enhances the efficiency of clinical decision-making but also opens the door for broader applications of AI in healthcare, potentially improving patient outcomes across various medical conditions.
๐ฎ Conclusion
This study highlights the remarkable potential of generative LLMs in transforming disease risk assessment in healthcare. Their ability to deliver personalized, real-time insights without coding makes them a valuable tool for clinicians. As we continue to explore the integration of AI in healthcare, the future looks promising for enhancing patient care and decision-making support.
๐ฌ Your comments
What are your thoughts on the use of generative LLMs in healthcare? We would love to hear your insights! ๐ฌ Leave your comments below or connect with us on social media:
Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19.
Abstract
BACKGROUND: Large language models (LLMs) have demonstrated powerful capabilities in natural language tasks and are increasingly being integrated into health care for tasks like disease risk assessment. Traditional machine learning methods rely on structured data and coding, limiting their flexibility in dynamic clinical environments. This study presents a novel approach to disease risk assessment using generative LLMs through conversational artificial intelligence (AI), eliminating the need for programming.
OBJECTIVE: This study evaluates the use of pretrained generative LLMs, including LLaMA2-7b and Flan-T5-xl, for COVID-19 severity prediction with the goal of enabling a real-time, no-code, risk assessment solution through chatbot-based, question-answering interactions. To contextualize their performance, we compare LLMs with traditional machine learning classifiers, such as logistic regression, extreme gradient boosting (XGBoost), and random forest, which rely on tabular data.
METHODS: We fine-tuned LLMs using few-shot natural language examples from a dataset of 393 pediatric patients, developing a mobile app that integrates these models to provide real-time, no-code, COVID-19 severity risk assessment through clinician-patient interaction. The LLMs were compared with traditional classifiers across different experimental settings, using the area under the curve (AUC) as the primary evaluation metric. Feature importance derived from LLM attention layers was also analyzed to enhance interpretability.
RESULTS: Generative LLMs demonstrated strong performance in low-data settings. In zero-shot scenarios, the T0-3b-T model achieved an AUC of 0.75, while other LLMs, such as T0pp(8bit)-T and Flan-T5-xl-T, reached 0.67 and 0.69, respectively. At 2-shot settings, logistic regression and random forest achieved an AUC of 0.57, while Flan-T5-xl-T and T0-3b-T obtained 0.69 and 0.65, respectively. By 32-shot settings, Flan-T5-xl-T reached 0.70, similar to logistic regression (0.69) and random forest (0.68), while XGBoost improved to 0.65. These results illustrate the differences in how generative LLMs and traditional models handle the increasing data availability. LLMs perform well in low-data scenarios, whereas traditional models rely more on structured tabular data and labeled training examples. Furthermore, the mobile app provides real-time, COVID-19 severity assessments and personalized insights through attention-based feature importance, adding value to the clinical interpretation of the results.
CONCLUSIONS: Generative LLMs provide a robust alternative to traditional classifiers, particularly in scenarios with limited labeled data. Their ability to handle unstructured inputs and deliver personalized, real-time assessments without coding makes them highly adaptable to clinical settings. This study underscores the potential of LLM-powered conversational artificial intelligence (AI) in health care and encourages further exploration of its use for real-time, disease risk assessment and decision-making support.
Author: [‘Roshani MA’, ‘Zhou X’, ‘Qiang Y’, ‘Suresh S’, ‘Hicks S’, ‘Sethuraman U’, ‘Zhu D’]
Journal: JMIR AI
Citation: Roshani MA, et al. Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19. Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19. 2025; 4:e67363. doi: 10.2196/67363