Smartwatches and rings are marketed as lifesaving heart monitors, but new data shows their single-lead AI systems fail the very patients who need them most.
We trust wearables to watch over our grandparents. But stripping a clinical 12-lead ECG down to a single-lead wearable sensor does not affect everyone equally. For young, healthy users, the loss of data is a minor hiccup. For the elderly, it is a diagnostic blind spot.
This disconnect is the real story.
For years, the digital health industry has assumed that consumer-grade hardware is “good enough” for general screening. This study challenges that complacency. It reveals a massive age bias built not into the code, but into the physical limitations of single-lead devices. When we shrink the hardware, we shrink the safety net for those most at risk.
The forty-fold drop
Researchers trained a neural network on 21,091 PTB-XL ECG recordings covering five cardiac disease categories. They tested how the AI performed when shifting from a clinical 12-lead setup down to six, two, and one lead. Even under the best conditions—the full 12-lead setup—model accuracy dropped from 84.5% in patients under 40 to 66.2% in those aged 75 and older.
The real crisis happens when you strip away the leads. Going down to a single lead caused the AI’s accuracy to plunge by 14.1 percentage points in the 75-and-older group. For patients under 40, the drop was a mere 0.4 percentage points. This represents a staggering 40-fold differential degradation in performance, a gap confirmed by three independent statistical tests (all p < 0.0001).
Older hearts are complicated. They rarely present with just one clean arrhythmia. Instead, they exhibit multi-condition diagnostic complexity that a single-lead wearable simply cannot capture. When we reduce the physical sensors, we starve the AI of the spatial data it needs to untangle these overlapping conditions.
A new regulatory standard
This means tech companies cannot rely on average accuracy metrics to clear regulatory hurdles. If an AI-ECG tool boasts high average accuracy, but that number is propped up by young users with simple heart rhythms, it is functionally useless for an 80-year-old. It might even be dangerous by providing false reassurance.
- Model accuracy on 12-lead ECGs is already 18.3 percentage points lower for seniors than for young adults.
- Single-lead reduction causes a 14.1% performance drop in elderly patients compared to almost no change in youth.
- External validation on the MIT-BIH Arrhythmia Database confirmed these patterns are highly stable across datasets.
We must stop evaluating clinical AI on aggregate populations. If a device degrades so severely for the demographic most at risk of stroke and heart failure, its clinical utility is compromised. Age-stratified reporting must become the baseline for validation.
Read the full study in medRxiv.
