🗞️ News - December 18, 2025

Study Reveals AI Tools Can Misinterpret Patient Data in Electronic Records

AI tools may misinterpret patient data in electronic records, leading to inaccuracies. Human oversight remains essential for data integrity. 🏥🔍

🌟 Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

Study Reveals AI Tools Can Misinterpret Patient Data in Electronic Records

Key Findings from the Research

A recent study conducted by researchers at the University of Oxford has highlighted issues with AI tools, particularly large language models (LLMs), when tasked with removing personal patient information from electronic patient records (EPRs). The study indicates that these tools can sometimes produce hallucinations, which are erroneous outputs that do not reflect the original data.

Study Overview
  • The research assessed the effectiveness of LLMs and specialized software in identifying and removing sensitive patient identifiers such as names, dates, and medical record numbers.
  • Published in iScience on December 9, 2025, the study found that smaller LLMs often over-redacted information or generated fictitious content.
  • Such hallucinations can compromise the integrity of clinical research and patient safety.
Methodology

The researchers began by having a human reviewer manually redact 3,650 medical records to establish a benchmark. They then compared the performance of:

  1. Two task-specific de-identification tools: Microsoft Azure and AnonCAT.
  2. Five general-purpose LLMs: GPT-4, GPT-3.5, Llama-3, Phi-3, and Gemma.
Expert Insights

Dr. Andrew Soltan, an academic clinical lecturer at the University of Oxford, noted:

“While some large language models perform impressively, others can generate false or misleading text. This behavior poses a risk in clinical contexts, and careful validation is critical before deployment.”

Conclusions and Recommendations

The study concluded that automating the de-identification process could significantly reduce the time and costs associated with preparing clinical data for research while ensuring compliance with data protection regulations. Key findings include:

  • Microsoft’s Azure de-identification service demonstrated the highest accuracy, closely aligning with human reviewers.
  • GPT-4 also showed strong performance, indicating that modern language models can effectively remove identifiers with minimal adjustments.
  • Some models performed well without extensive retraining, suggesting a practical approach for hospitals to implement these technologies.
Importance of Human Oversight

Professor David Eyre from Oxford Population Health emphasized the necessity of human judgment in managing patient data:

“This work shows that AI can be a powerful ally in protecting patient confidentiality. But human judgment and strong governance must remain at the center of any system that handles patient data.”

Support for the Study

This research was supported by various organizations, including the National Institute for Health and Care Research (NIHR), Microsoft Research UK, and Cancer Research UK.

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.