โก Quick Summary
This study evaluated the diagnostic accuracy of ChatGPT-4V, an AI model with visual capabilities, in detecting sacroiliitis on MRI scans. The model showed high sensitivity for identifying active inflammatory changes, particularly bone marrow edema, but struggled with chronic structural abnormalities, highlighting both its potential and limitations in clinical settings.
๐ Key Details
- ๐ Dataset: 125 patients, 250 sacroiliac joint images
- ๐งฉ Imaging Techniques: Coronal T1-weighted and semicoronal STIR MRI sequences
- โ๏ธ Technology: ChatGPT-4V AI model
- ๐ Performance Metrics: Sensitivity, specificity, precision, and area under the curve (AUC)
๐ Key Takeaways
- ๐ค ChatGPT-4V demonstrated a sensitivity of 0.955 for detecting bone marrow edema.
- ๐ Lower sensitivity was observed for sclerosis (0.211), joint space narrowing (0.298), and joint surface irregularities (0.433).
- ๐ Overall accuracy of the model was 0.624, with a weighted-average AUC of 0.62.
- ๐ก The model excels in identifying active inflammatory changes but underperforms in chronic conditions.
- ๐ Future improvements are needed for better detection of chronic structural abnormalities.
- ๐ฅ Clinical integration requires fine-tuning with specialist-labeled datasets.
- ๐ This study highlights the potential of AI in radiology, particularly for inflammatory conditions.

๐ Background
Sacroiliitis, an inflammation of the sacroiliac joints, can significantly impact patient quality of life. Accurate diagnosis through MRI is crucial for effective treatment. The integration of artificial intelligence, particularly models like ChatGPT-4V, offers a promising avenue for enhancing diagnostic accuracy and efficiency in radiology.
๐๏ธ Study
Conducted at a tertiary hospital, this retrospective study analyzed MRI scans from 125 patients to evaluate the performance of ChatGPT-4V in detecting signs of sacroiliitis. Two experienced radiologists assessed the images, and the AI model was prompted with standardized queries to analyze the findings.
๐ Results
The results indicated that ChatGPT-4V achieved a high sensitivity of 0.955 for detecting bone marrow edema, with an AUC of 0.84. However, its sensitivity for chronic findings such as sclerosis and joint irregularities was notably lower, with AUC values ranging from 0.55 to 0.59. The overall accuracy of the model was measured at 0.624, suggesting room for improvement.
๐ Impact and Implications
The findings from this study underscore the potential of AI technologies like ChatGPT-4V in enhancing the diagnostic process for inflammatory conditions such as sacroiliitis. While the model shows promise in identifying active inflammation, its limitations in detecting chronic changes highlight the need for further development. As AI continues to evolve, its integration into clinical practice could lead to more accurate and timely diagnoses, ultimately improving patient outcomes.
๐ฎ Conclusion
This study illustrates the significant potential of AI in the field of radiology, particularly for detecting inflammatory conditions. While ChatGPT-4V has demonstrated impressive capabilities in identifying active sacroiliitis, its current limitations in chronic detection must be addressed for it to be effectively utilized in clinical settings. Continued research and model refinement are essential for maximizing the benefits of AI in healthcare.
๐ฌ Your comments
What are your thoughts on the use of AI in radiology? Do you believe it can transform diagnostic practices? ๐ฌ Share your insights in the comments below or connect with us on social media:
Evaluating the Performance of ChatGPT-4V in Detecting Inflammatory Magnetic Resonance Imaging Findings of Sacroiliitis: Potentials, Challenges, and Limitations.
Abstract
This study aims to evaluate the diagnostic accuracy of ChatGPT-4V, an AI model with visual capabilities, in detecting sacroiliitis on MRI and compares its performance to expert radiologists. This retrospective study included 125 patients (250 sacroiliac joint images) from a tertiary hospital’s Picture Archiving and Communication System. MRI scans, including coronal T1-weighted and semicoronal STIR sequences, were assessed by two experienced radiologists. ChatGPT-4V was prompted with standardized queries to analyze the images for signs of active or chronic sacroiliitis. Its diagnostic outputs were compared to the radiologists’ assessments. Performance metrics, including sensitivity, specificity, precision, and area under the curve (AUC), were calculated. ChatGPT-4V demonstrated high sensitivity for detecting bone marrow edema (0.955; AUC, 0.84) but lower sensitivity for sclerosis (0.211; AUC, 0.55), joint space narrowing (0.298; AUC, 0.59), and joint surface irregularities (0.433; AUC, 0.59). The overall accuracy of the model was 0.624, with a weighted-average AUC of 0.62. ChatGPT-4V excelled in identifying active inflammatory changes but underperformed in detecting chronic structural abnormalities. ChatGPT-4V shows promise in detecting active inflammatory sacroiliitis, particularly bone marrow edema, but its current inability to reliably identify chronic structural abnormalities limits its standalone clinical utility. To achieve enhanced diagnostic capability and enable clinical integration, future efforts must focus on model fine-tuning using specialist-labeled radiological datasets.
Author: [‘Erden Y’, ‘Dilek G’, ‘Temel MH’, ‘Soylu HH’, ‘Kalfaoฤlu ME’, ‘Baฤcฤฑer F’]
Journal: J Imaging Inform Med
Citation: Erden Y, et al. Evaluating the Performance of ChatGPT-4V in Detecting Inflammatory Magnetic Resonance Imaging Findings of Sacroiliitis: Potentials, Challenges, and Limitations. Evaluating the Performance of ChatGPT-4V in Detecting Inflammatory Magnetic Resonance Imaging Findings of Sacroiliitis: Potentials, Challenges, and Limitations. 2025; (unknown volume):(unknown pages). doi: 10.1007/s10278-025-01742-w