A new study reveals that neurologists struggle to accurately predict stroke recovery because of systematic optimism and poor visual assessments, but AI models can correct these human errors.
When a patient suffers a major stroke, families ask one urgent question: will they ever walk or speak again? Neurologists routinely make these predictions, but their clinical intuition is surprisingly flawed. A new study challenges the long-held belief that human clinical judgment is the gold standard for complex stroke prognosis, revealing that doctors are systematically overoptimistic and struggle to read critical brain scans accurately.
Researchers tested six neurologists against two AI systems using data from 500 patients in the MR CLEAN trial, validating the models on a larger cohort of 404 patients. The doctors predicted three-month recovery scores, known as the modified Rankin Scale (mRS), for 40 patients. The results expose a massive gap between human intuition and computational precision.
Where human intuition fails
When predicting the full range of recovery outcomes, the AI models easily outperformed the doctors. Why did the humans fail? The data points to two specific blind spots. First, doctors are terrible at manually extracting imaging features under pressure. Second, they suffer from a systematic optimism bias, predicting better recoveries than actually occurred.
- MR PREDICTS achieved a quadratic weighted kappa (QWK) of 0.51 for full outcome prediction.
- The deep learning model scored 0.49 on the same scale.
- Unaided neurologists managed a QWK of only 0.27.
- Doctor estimations of brain damage deviated by 3.4 points from true scores, and their collateral blood flow accuracy was just 44.6%.
This disconnect is the real story. For years, the medical field has assumed that years of bedside experience translate to accurate visual reads of CT scans. This trial suggests that human eyes are highly inconsistent at grading brain tissue damage in acute settings.
Where humans still compete
Interestingly, when the task was simplified to a binary “good versus bad” outcome, the gap closed. Unaided neurologists achieved 64.17% accuracy, which was statistically comparable to MR PREDICTS at 67.50% and the deep learning model at 63.16%.
But a binary view is too simplistic for real-world care. Families need nuanced expectations, not a coin-flip prediction. When doctors were assisted by the MR PREDICTS model, their performance improved to a QWK of 0.41 and their binary accuracy rose to 68.75%. The AI acted as an anchor, dragging overoptimistic doctors back to reality.
The limits of prediction
This does not mean algorithms should replace physicians. The study is limited by its small sample of six raters and its reliance on retrospective trial data. A model cannot comfort a family or understand a patient’s personal values.
However, these findings suggest that relying solely on clinical intuition in the acute stroke bay is a liability. By automating the extraction of imaging features, deep learning models can eliminate human-input variability and provide an objective second opinion.
Read the full study in medRxiv.
