A new study reveals that while AI can draft highly informative medical answers, patients still prefer the simpler, clearer touch of a human clinician.
Patients facing nuclear medicine scans are often anxious. They ask complex questions about radiation, drug interactions, and scheduling. Clinicians rarely have the time to write exhaustive, personalized essays for every inbox message.
A prospective study published in npj Digital Medicine tested ChatGPT v4.1 against nuclear medicine physicians and administrative staff. The researchers analyzed 339 drug interaction queries, 42 medical queries, and 76 administrative queries. Experts and non-experts then graded the responses using the QUEST evaluation framework.
This trial challenges the assumption that more data makes a better patient response. It reveals a sharp trade-off between clinical accuracy and human empathy.
Where the machine wins
For administrative tasks, the AI was the clear victor. Non-expert raters judged the AI-generated responses to be more informative 97% of the time. They also preferred the AI responses in 86% of cases.
The AI also proved highly consistent. Statistical agreement metrics, measured by PABAK, were much higher for the AI than for human staff. For administrative queries, AI agreement ranged from 0.92 to 1.00, while human agreement fell between -0.63 and -0.13. The machine delivers a predictable, high-quality baseline that human teams struggle to match during busy shifts.
The clarity trade-off
When the questions turned to complex medical issues, the dynamics shifted. The AI still performed remarkably well on paper. Medical experts rated the AI responses as equivalent or better than human answers in 8 out of 10 dimensions, spanning 76% to 98% of the queries.
However, the human element proved irreplaceable for actual communication. Key findings from the medical query evaluation include:
- The AI responses were rated as more informative 67% of the time.
- Human responses were judged easier to understand in 62% of cases.
- Raters split sharply on preference, with a 60% disagreement rate on which response was better overall.
This disconnect is the real story.
The AI wrote comprehensive, accurate paragraphs, but it ended up overwhelming the reader. Humans naturally simplify their language to comfort a worried patient.
Rethinking the patient portal
This finding complicates the push to automate clinical inboxes. If we deploy these models directly to patients, we risk burying them in accurate but dense medical jargon. The goal of patient communication is comprehension, not just information dumping.
The study has clear limitations. It was a single-center evaluation focused on nuclear medicine, which is a highly specialized field. The researchers also noted that further validation is required before these models are used in live clinical workflows.
Instead of replacing staff, the immediate path forward is hybrid. Clinics should use AI to draft the dense, accurate foundation of a response. Then, a human must edit it down to make it gentle and readable. Accuracy is a machine task, but clarity remains a human skill.
Read the full analysis in npj Digital Medicine.
