🧑🏼‍💻 Research - March 19, 2026

Insufficient reporting quality in large language model studies in the field of radiology.

['Suh PS', 'Jeong SY', 'Ueda D', 'Shim WH', 'Heo H', 'Woo CY', 'Park H', 'Suh CH']

🌟 Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

⚡ Quick Summary

A recent systematic review revealed that a significant number of studies on large language models (LLMs) in radiology exhibit insufficient reporting quality. This lack of adherence to key reporting standards hampers transparency and reproducibility in the field.

🔍 Key Details

📊 Dataset: 246 studies published between November 30, 2022, and December 31, 2024
🧩 Focus: Performance evaluation of LLMs and radiology reporting
⚙️ Standards used: MI-CLEAR-LLM and TRIPOD-LLM checklists
📉 Key metrics: Only 27.6% specified model version, 35.8% mentioned access date

🔑 Key Takeaways

📊 Majority of studies lack standardized methodologies, leading to variability.
💡 Insufficient reporting of key elements, particularly model details and output probability.
📝 Only 41.1% provided full prompts in their studies.
📉 Output probability-related issues were under-reported, with only 22.8% addressing the number of attempts.
🔄 Reporting insufficiencies persisted regardless of publication date.
🌍 Emphasis on transparency is crucial for improving reproducibility in future LLM research.
🔍 Study highlights the need for adherence to reporting standards.

📚 Background

The integration of large language models (LLMs) in radiology has the potential to transform diagnostic processes and reporting. However, the effectiveness of these models is contingent upon the quality of research reporting. Inconsistent methodologies and lack of transparency can lead to challenges in replicating results and applying findings in clinical settings.

🗒️ Study

This systematic review aimed to evaluate the reporting quality of studies involving LLMs in radiology. Researchers conducted a comprehensive search of the PubMed-MEDLINE and EMBASE databases, ultimately including 246 eligible studies. The analysis focused on adherence to the MI-CLEAR-LLM and TRIPOD-LLM checklists, which outline essential reporting elements for LLM research.

📈 Results

The findings indicated that while all studies reported the name of the LLM, only 27.6% specified the model version, and 35.8% mentioned the access date. Furthermore, only 41.1% provided full prompts, and issues related to output probability were notably under-reported. These reporting deficiencies were consistent across studies published before and after July 25, 2024.

🌍 Impact and Implications

The implications of this study are significant for the field of radiology. By highlighting the need for improved reporting quality, the authors advocate for greater adherence to established standards. This could enhance the transparency and reproducibility of future studies, ultimately leading to better integration of LLMs in clinical practice and improved patient outcomes.

🔮 Conclusion

This review underscores the critical importance of adhering to reporting standards in LLM research within radiology. By addressing the identified insufficiencies, researchers can contribute to a more reliable and transparent body of work, paving the way for advancements in the application of AI technologies in healthcare. The future of LLMs in radiology holds great promise, but it is essential to prioritize quality reporting to realize their full potential.

💬 Your comments

What are your thoughts on the reporting quality of studies involving large language models in radiology? Let’s engage in a discussion! 💬 Share your insights in the comments below or connect with us on social media:

Insufficient reporting quality in large language model studies in the field of radiology.

Abstract

OBJECTIVES: Our systematic review aimed to evaluate the quality of reporting in research articles involving LLMs in the radiology field.
MATERIALS AND METHODS: After searching the PubMed-MEDLINE and EMBASE databases, a total of 246 eligible studies published between November 30, 2022, and December 31, 2024, were included. The analysis assessed the percentage of studies adhering to key elements required for LLM research, based on the MInimum reporting items for CLear Evaluation of Accuracy Reports of Large Language Models in healthcare (MI-CLEAR-LLM) and the Transparent Reporting of a Multivariable Model for Individual Prognosis Or Diagnosis-large language models (TRIPOD-LLM) checklists. Studies published before and after July 25, 2024, were compared using a chi-square test.
RESULTS: The most common topic was performance evaluation of LLMs using radiologic cases (44.3%, 109/246), followed by radiology reporting (37.8%, 93/246). Although all studies reported LLM’s name, only 27.6% (68/246) specified the model version, 35.8% (88/246) mentioned access date, and 25.2% (62/246) mentioned application programming interface usage. Full prompts were provided in 41.1% (101/246) of studies. Output probability-related issues, including the number of attempts (22.8%, 56/246) and factors such as temperature (16.7%, 41/246), were under-reported. These reporting insufficiencies persisted in studies published before and after July 25, 2024.
CONCLUSION: Most studies assessing large language models in radiology lacked sufficient reporting of key elements required for large language model research. We recommend that authors strive to adhere to these elements to ensure transparency and improve the reproducibility of future studies.
CRITICAL RELEVANCE STATEMENT: Our study highlighted the need for improved reporting quality and adherence to key elements to ensure transparent reporting and improve the reproducibility of future studies using large language models.
KEY POINTS: Numerous studies on large language models (LLMs) in radiology lack standardized methodologies, leading to high variability and inconsistent reporting. Our review demonstrated insufficiency in key elements for LLM research, particularly in model details and output probability. Better reporting and adherence to key elements are essential for enhancing transparency and reproducibility in future LLM research.

Author: [‘Suh PS’, ‘Jeong SY’, ‘Ueda D’, ‘Shim WH’, ‘Heo H’, ‘Woo CY’, ‘Park H’, ‘Suh CH’]

Journal: Insights Imaging

Citation: Suh PS, et al. Insufficient reporting quality in large language model studies in the field of radiology. Insufficient reporting quality in large language model studies in the field of radiology. 2026; 17:(unknown pages). doi: 10.1186/s13244-026-02236-1

🧑🏼‍💻 Research - March 19, 2026

Insufficient reporting quality in large language model studies in the field of radiology.

['Suh PS', 'Jeong SY', 'Ueda D', 'Shim WH', 'Heo H', 'Woo CY', 'Park H', 'Suh CH']

⚡ Quick Summary

🔍 Key Details

🔑 Key Takeaways

📚 Background

🗒️ Study

📈 Results

🌍 Impact and Implications

🔮 Conclusion

💬 Your comments

Insufficient reporting quality in large language model studies in the field of radiology.

Abstract

Leave a ReplyCancel reply