โก Quick Summary
This study evaluated the use of GPT-4o, a large language model, for the automatic extraction of pulmonary embolism (PE) diagnoses from radiology reports. The results demonstrated that GPT-4o significantly outperformed the baseline model, achieving a sensitivity of 1.0 and an F1-score of 0.975, indicating its potential to enhance clinical workflows and decision-making.
๐ Key Details
- ๐ Dataset: 1,000 radiology report impressions for training, 200 for validation, and 200 operational records for postdeployment assessment.
- โ๏ธ Technology: GPT-4o (decoder-only model) and Clinical Longformer (encoder-only model).
- ๐ Performance: GPT-4o achieved a sensitivity of 1.0 and an F1-score of 0.975 in validation.
๐ Key Takeaways
- ๐ค GPT-4o demonstrated superior performance in extracting PE diagnoses compared to Clinical Longformer.
- ๐ Sensitivity of 1.0 indicates perfect identification of PE cases.
- ๐ก F1-score of 0.975 reflects high precision and recall in diagnosis extraction.
- ๐ Real-world application showed a specificity of 0.94 in operational settings.
- ๐ Automation can significantly reduce the time spent on manual report reviews.
- ๐ฅ Improved clinical workflows can lead to faster diagnosis and treatment for critical conditions.
- ๐ Study published in JMIR Medical Informatics, highlighting the importance of AI in healthcare.
๐ Background
Pulmonary embolism (PE) is a life-threatening condition that requires swift diagnosis to minimize mortality rates. Traditionally, extracting PE diagnoses from radiology reports has been a labor-intensive process, often leading to delays in treatment. The emergence of advanced natural language processing (NLP) technologies, particularly transformer models like GPT-4o, presents an opportunity to automate this task, thereby enhancing diagnostic accuracy and efficiency in clinical settings.
๐๏ธ Study
The study aimed to develop an automatic extraction system using GPT-4o to identify PE diagnoses from radiology report impressions. Two models were evaluated: a fine-tuned Clinical Longformer as a baseline and the GPT-4o extractor. The Clinical Longformer was trained on a dataset of 1,000 impressions, while the GPT-4o model was validated on a separate set of 200 samples, with further assessments conducted on an additional 200 operational records.
๐ Results
The results were promising, with GPT-4o achieving a sensitivity of 1.0 and an F1-score of 0.975 during validation. In postdeployment evaluations, the model maintained a sensitivity of 1.0, a specificity of 0.94, and an F1-score of 0.97. These metrics indicate a high level of diagnostic accuracy, supporting the model’s potential to streamline clinical workflows and reduce the need for manual reviews.
๐ Impact and Implications
The findings from this study could have a transformative impact on the management of pulmonary embolism. By leveraging GPT-4o for automatic extraction of diagnoses, healthcare providers can expect improved efficiency in clinical workflows, leading to quicker diagnosis and treatment for patients. This advancement not only enhances patient care but also underscores the growing role of artificial intelligence in healthcare settings.
๐ฎ Conclusion
This study highlights the significant potential of GPT-4o in automating the extraction of PE diagnoses from radiology reports. With its impressive performance metrics, GPT-4o stands as a reliable tool for enhancing clinical decision-making and improving patient outcomes. As we continue to explore the integration of AI in healthcare, the future looks promising for technologies that can expedite diagnosis and treatment pathways for critical conditions like PE.
๐ฌ Your comments
What are your thoughts on the use of AI for improving diagnostic processes in healthcare? We would love to hear your insights! ๐ฌ Leave your comments below or connect with us on social media:
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.
Abstract
BACKGROUND: Pulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings.
OBJECTIVE: This study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency.
METHODS: In total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o’s ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting.
RESULTS: GPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision.
CONCLUSIONS: The GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.
Author: [‘Mahyoub M’, ‘Dougherty K’, ‘Shukla A’]
Journal: JMIR Med Inform
Citation: Mahyoub M, et al. Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study. Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study. 2025; 13:e67706. doi: 10.2196/67706