🧑🏼‍💻 Research - June 29, 2026

AI models struggle with precise cancer registry dates

🌟 Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

Automating cancer registries with artificial intelligence sounds like an easy win, but new data shows these models fail at the precise timelines crucial for tracking patient care.

Can we trust algorithms to build the databases that track public cancer trends? Cancer registries currently rely on human registrars reading hundreds of complex clinical reports. If we automate this process, we risk injecting invisible errors into our most critical oncology datasets.

This is not just a speed issue. A new study reveals that while AI can easily spot simple facts like which breast had a tumor, it stumbles badly when tracking when things actually happened. In oncology, a month’s delay in treatment is the difference between success and failure, making the AI’s temporal blindness a major liability.

Researchers tested five foundational models on real-world records from 5,939 patients across seven different cancer types. They used zero-shot prompting to extract eight key variables from unstructured clinical reports. The lineup included heavyweights like Claude Sonnet 4.5, GPT-OSS-120b, and GPT-OSS-20b, alongside smaller models like Gemma 12b and LLaMA 3.1 8b.

The accuracy gap

The results expose a stark divide between simple categorization and complex timeline tracking. Larger models generally outperformed their smaller peers, but none could reliably pin down dates.

  • The best models achieved F1 scores of around 0.8 for low-cardinality variables like tumor grade, summary stage, and laterality.
  • Performance remained only slightly lower for high-cardinality variables such as primary site, regional nodes examined, and positive regional nodes.
  • Exact accuracy for diagnosis and treatment dates plummeted to just 0.55 for even the best-performing models.
  • Allowing a 30-day margin of error pushed date accuracy up to 0.85.

Why dates defeat AI

Why do dates break these models? Clinical notes are messy narratives filled with historical references, scheduled future appointments, and hypothetical timelines. An LLM lacks the temporal reasoning to distinguish between when a biopsy was planned and when it actually occurred.

A 30-day margin of error is a lifetime in oncology.

Relying on these models without strict human oversight means polluting registries with fuzzy timelines. If researchers use this flawed data to evaluate treatment delays, their conclusions will be fundamentally warped. AI cannot yet replace the human eye for the chronological details that define a patient’s journey.

Read the full study in medRxiv.

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.