🧑🏼‍💻 Research - June 25, 2026

Smartphones and LLMs Detect Huntington’s Disease

🌟 Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

By feeding raw smartphone eye-tracking data into off-the-shelf language models, researchers bypassed the need for custom medical AI to spot Huntington’s disease.

Can an AI diagnose a rare genetic disease without ever being trained to recognize it? The dominant medical software playbook demands massive, labeled datasets to train specialized diagnostic models. This approach is slow, expensive, and nearly impossible for rare conditions where patient data is scarce.

A new study challenges this paradigm. By using general-purpose large language models as zero-shot reasoning engines, researchers proved that the clinical knowledge already baked into these models can interpret raw physiological signals. This shifts the bottleneck of digital health from expensive model training to high-quality data collection.

Eye tracking on consumer phones

The study evaluated a small cohort of 26 participants, split evenly between 13 patients with genetically confirmed Huntington’s disease and 13 age-matched healthy controls. Each participant completed a standardized ocular motor assessment using a basic smartphone application. The phone tracked quantitative eye-movement metrics, which closely matched expert neurologist ratings with a Spearman correlation of 0.76 to 0.95. This confirmed that consumer hardware can capture high-fidelity physiological signals.

Instead of training a custom algorithm, the researchers fed these structured eye-movement metrics directly to four general-purpose LLMs. These models had no prior task-specific training or diagnostic labels. Yet, they generated an AI-Assigned HD Probability Score that matched the accuracy of traditional, supervised machine learning.

How the models performed

  • The LLMs distinguished Huntington’s patients from controls with high accuracy, yielding an AUC of 0.879 to 0.944.
  • The zero-shot LLM performance was statistically equivalent to a supervised logistic regression model trained specifically on the same data.
  • The generated probability scores correlated strongly with clinical severity, matching cognitive impairment at -0.86, functional decline at -0.74, and motor symptoms at 0.85.

This is a significant shift in how we think about diagnostic software. For years, the industry assumed that clinical AI required bespoke architectures for every disease. This trial suggests the active ingredient is not a custom algorithm, but rather the structured clinical reasoning already latent in large models.

The limits of zero-shot medicine

We must look at the limitations honestly. The sample size of 26 people is tiny. While the physiological signal is strong, a larger cohort is required to prove these models do not stumble on confounding ocular conditions like cataracts or unrelated neurological issues. Furthermore, relying on commercial LLMs introduces a black-box element to clinical decision-making that regulators will view with skepticism.

Even with these caveats, the implications are clear. If a general-purpose model can interpret raw physiological data as accurately as a bespoke model, the economics of digital diagnostics change overnight. We may soon see a world where the diagnostic tool is simply a smartphone camera and a standard prompt.

Read the full study in AI.

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.