โก Quick Summary
This study evaluated the safety and utility of MedAgentBrief, an AI-driven tool for generating hospital course summaries, during its clinical deployment. The findings indicated a significant reduction in physician burnout and minimal risk of harm associated with AI-generated summaries.
๐ Key Details
- ๐ Participants: 384 hospital discharges
- ๐งฉ AI Technology: Gemini 2.5 Pro for generating summaries
- โ๏ธ Study Design: Single-arm prospective pilot quality improvement study
- ๐ Primary Outcome: Physician-reported potential for harm from unedited summaries
๐ Key Takeaways
- ๐ค AI-generated summaries were used in 57% of cases, indicating a strong acceptance among physicians.
- ๐ Feedback revealed that 25% of summaries had omissions and 20% had inaccuracies, but hallucinations were rare (2%).
- ๐ก Physician ratings showed that 88% of unedited summaries posed no harm, with only 1% likely to cause moderate harm.
- ๐ Burnout scores significantly decreased from 1.75 to 1.20 after the intervention (P = .03).
- โณ Documentation time savings varied, with some physicians experiencing reductions of up to 2.9 minutes.
- ๐ The study supports the viability of AI summarization to alleviate clinician documentation burden.
- ๐ Study period: August 1 to October 11, 2025, with baseline data from April 9 to July 31, 2025.

๐ Background
High-quality discharge summaries are crucial for ensuring safe transitions of care. However, they often contribute to the documentation burden faced by clinicians, leading to increased rates of burnout. Recent advancements in large language models (LLMs) suggest that AI can generate clinical summaries that match the quality of those produced by physicians, yet there has been a lack of prospective data on their safety and utility in real-world clinical settings.
๐๏ธ Study
This study was conducted in an academic inpatient medicine unit, focusing on the deployment of MedAgentBrief, an LLM-based workflow that generates hospital course summaries. The AI system utilized patient history and daily progress notes to create draft summaries, which were then reviewed by physicians. The aim was to assess the safety, utility, and impact on clinician well-being during this prospective pilot study.
๐ Results
The AI system generated a total of 1,274 summaries across the 384 hospital discharges. Physicians utilized the AI-generated content in 219 cases (57.0%). Feedback indicated that while there were some omissions and inaccuracies, the majority of summaries were deemed safe for use. Notably, the intervention led to a significant decrease in physician burnout scores, highlighting the potential benefits of AI in reducing clinician workload.
๐ Impact and Implications
The findings from this study suggest that AI-generated hospital course summaries can play a vital role in enhancing clinician efficiency and reducing burnout. By integrating AI into clinical workflows, healthcare institutions can potentially improve the quality of care provided to patients while alleviating the documentation burden on physicians. This could pave the way for broader applications of AI technologies in various healthcare settings.
๐ฎ Conclusion
This study underscores the promising potential of AI in generating clinical documentation. The successful implementation of MedAgentBrief not only demonstrated minimal risk of harm but also contributed to a significant reduction in physician burnout. As healthcare continues to evolve, embracing AI technologies like this could lead to improved outcomes for both clinicians and patients alike. Further research is encouraged to explore the full capabilities of AI in clinical documentation.
๐ฌ Your comments
What are your thoughts on the integration of AI in clinical documentation? We would love to hear your insights! ๐ฌ Leave your comments below or connect with us on social media:
Physician-Reported Safety Outcomes of AI-Generated Hospital Course Summaries.
Abstract
IMPORTANCE: High-quality discharge summaries are essential for safe care transitions but contribute substantially to clinician documentation burden and burnout. While retrospective studies suggest that large language models (LLMs) can generate clinical summaries of comparable quality to those by physicians, prospective data on their safety, utility, and association with clinician well-being in clinical environments are lacking.
OBJECTIVE: To evaluate the safety, use, and association with clinician burden of MedAgentBrief, an LLM-based agentic workflow for generating hospital course summaries, during prospective clinical deployment.
DESIGN, SETTING, AND PARTICIPANTS: This single-arm prospective pilot quality improvement study encompassed hospital discharges at 1 academic inpatient medicine unit from August 1 to October 11, 2025, with baseline comparisons drawn from April 9 to July 31, 2025.
INTERVENTION: A custom agentic LLM workflow using Gemini 2.5 Pro generated draft hospital course summaries nightly using patient history and physical and daily progress notes. Drafts were securely emailed to physicians daily for review and optional use.
MAIN OUTCOMES AND MEASURES: The primary outcome was physician-reported potential for and severity of harm from unedited summaries (Agency for Healthcare Research and Quality Common Format Harm Scale). Secondary outcomes included use rate, error types (omissions, inaccuracies, and hallucinations), time spent in discharge summaries (electronic health record logs), and changes in cognitive burden (NASA Task Load Index; score range, 0-100, with higher scores indicating greater cognitive burden) and burnout (Stanford Professional Fulfillment Index Work Exhaustion Scale; score range, 0-4, with higher scores indicating greater burnout).
RESULTS: Among 384 hospital discharges, the system generated 1274 summaries. Physicians used artificial intelligence (AI) content in 219 cases (57.0%). Feedback on 100 summaries (88 of 219 used summaries [40.2%] and 12 of 165 unused summaries [7.3%]) noted omissions (25 summaries [25.0%]) and inaccuracies (20 summaries [20.0%]) but rare hallucinations (2 summaries [2.0%]). Physicians rated 88 unedited summaries (88.0%) as having no harm potential and 1 (1.0%) as likely to cause moderate harm; no severe harm was reported. Mean physician burnout scores decreased significantly from before to after the intervention (1.75; 95% CI, 1.16-2.34 vs 1.20; 95% CI, 0.71-1.69; Pโ=โ.03). Time savings were heterogeneous, with 5 of 7 physicians with matched baseline data (71.4%) seeing reductions in median documentation time; changes from baseline to pilot were up to 2.9 minutes, which was a nonsignificant difference (10.7 minutes; 95% CI, 7.4-13.3 minutes vs 7.8 minutes; 95% CI, 5.1-11.7 minutes; Pโ=โ.13).
CONCLUSIONS AND RELEVANCE: In this study, an LLM-based agentic workflow produced hospital course summaries that were frequently used with minimal risk of harm identified. The intervention was associated with a reduction in physician burnout, supporting the viability of AI summarization to mitigate documentation burden.
Author: [‘Grolleau F’, ‘Liang AS’, ‘Keyes T’, ‘Ma SP’, ‘Lew T’, ‘Huynh TR’, ‘Steele N’, ‘Chung P’, ‘Qin P’, ‘Chandra G’, ‘Wang SF’, ‘Mullen E’, ‘Carpenter L’, ‘Hoppenfeld M’, ‘Morrin M’, ‘Kyerematen BA’, ‘Ambers N’, ‘Kotecha N’, ‘Alsentzer E’, ‘Hom J’, ‘Shah NH’, ‘Schulman K’, ‘Chen JH’]
Journal: JAMA Netw Open
Citation: Grolleau F, et al. Physician-Reported Safety Outcomes of AI-Generated Hospital Course Summaries. Physician-Reported Safety Outcomes of AI-Generated Hospital Course Summaries. 2026; 9:e2616556. doi: 10.1001/jamanetworkopen.2026.16556