AI spots blood clot risks better than doctors

🧑🏼‍💻 Research - June 23, 2026

AI spots blood clot risks better than doctors

Baughman, D. J., Liu, S., Jee, S., Young, C., Knight, A. M., Davis, A., Yegnasubramanian, S., Najjar, P., Whitbread, J. J., Ahumada, L., Chused, A., Haut, E. R., Lau, B. D., Sridharan, A., Streiff, M., Aziz, K. B.

🌟 Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

A new clinical trial shows that while generative AI catches more blood clot risks than human doctors, its tendency to miss narrative details means it cannot fly solo.

How much risk is a hospital willing to tolerate to avoid overwhelming its staff with false alarms? When it comes to preventing deadly blood clots, clinical teams constantly walk this tightrope. Traditional checklists are often too rigid to catch every vulnerable patient, while relying purely on human instinct leads to highly variable care.

A multisite validation of an EHR-integrated generative AI tool, called inHealth General Reasoner (iHGR), exposes the sharp trade-offs of clinical automation. The findings challenge the industry push for fully autonomous clinical diagnostics. While the software proved highly capable as a safety net, it also highlighted a persistent vulnerability in machine learning: the inability to read between the lines of human clinical notes.

How the tools compared

Researchers evaluated the iHGR system using real-world data from Johns Hopkins Medicine collected between June 21, 2025, and December 18, 2025. Out of 758 eligible adult inpatient admissions, they randomly sampled 500 cases. This sample was carefully balanced across different hospital sites and two distinct clinical eras. The first era used a checklist-based order set from June 21 to November 19, while the second relied on clinician judgment from November 29 to December 18.

The study compared the AI and human workflows against a physician-adjudicated reference standard. The results reveal a clear trade-off between sensitivity (catching the risk) and specificity (avoiding false alarms).

The iHGR tool achieved 81.8% sensitivity and 70.9% specificity.
The checklist-based workflow managed just 61.3% sensitivity but reached 86.2% specificity.
Clinician judgment alone yielded 78.1% sensitivity and 65.4% specificity.

The narrative blind spot

These numbers show that traditional checklists are far too conservative. By missing nearly 40% of patients who actually need preventative therapy, checklists create a dangerous gap in patient safety. The AI successfully closed much of this gap, outperforming both the checklists and standalone human judgment in sensitivity.

Yet the AI still stumbled on unstructured data. When the iHGR system failed, its false-negative classifications were consistently linked to missed narrative risk factors buried in free-text clinical notes.

This limitation is the real story for health system leaders. If an AI cannot reliably parse the nuanced stories doctors write in charts, it cannot safely replace human oversight. This finding suggests that the immediate value of generative AI is not in replacing doctors, but in acting as an automated double-check. It should support, rather than supplant, clinician judgment to keep patients safe without triggering a wave of unnecessary treatments.

Read the full study in medRxiv.

🧑🏼‍💻 Research - June 23, 2026

AI spots blood clot risks better than doctors

Baughman, D. J., Liu, S., Jee, S., Young, C., Knight, A. M., Davis, A., Yegnasubramanian, S., Najjar, P., Whitbread, J. J., Ahumada, L., Chused, A., Haut, E. R., Lau, B. D., Sridharan, A., Streiff, M., Aziz, K. B.

How the tools compared

The narrative blind spot

Leave a ReplyCancel reply