โก Quick Summary
This study evaluated the readability of informed consent forms used in endocrine surgery and demonstrated that using a large language model (LLM) significantly improved readability from an average of 14.1 to 8.8 grade levels. However, this improvement came at the cost of content fidelity, highlighting the need for careful human review.
๐ Key Details
- ๐ Consent Forms Analyzed: Eight forms, including procedural and trial documents.
- ๐ Readability Assessment: Conducted using LLM in two independent sessions.
- โ๏ธ Evaluation Metrics: Precision, recall, F1 scores, and inter-rater reliability.
- ๐ Key Findings: Significant readability improvement but notable content omissions.
๐ Key Takeaways
- ๐ Original readability averaged 14.1 grade levels, far exceeding recommended levels.
- โ๏ธ First LLM revision reduced readability to 8.8 grade levels (P < 0.01).
- ๐ Second LLM revision showed no further improvement (9.9 grade levels; P = 0.87).
- ๐ F1 score averaged 0.71, indicating high precision (0.95) but lower recall (0.62).
- ๐ Content fidelity decreased with greater reductions in reading level (r = 0.73, P < 0.01).
- ๐ค Inter-rater agreement was excellent (K = 0.99, P < 0.01).
- ๐ก Human review is essential to ensure completeness and fidelity of consent forms.

๐ Background
Informed consent is a fundamental aspect of patient care, requiring that patients fully understand the risks, benefits, and alternatives associated with medical interventions. The American Medical Association and the National Institutes of Health recommend that patient-facing materials be written at a sixth-grade reading level or lower to enhance comprehension and ensure informed decision-making.
๐๏ธ Study
This study was conducted within the endocrine surgery division of a tertiary care center, where eight informed consent forms were assessed for readability. The researchers employed a large language model (LLM) to rewrite these forms, aiming to meet the recommended readability standards while maintaining the integrity of the content.
๐ Results
The results were promising, with the first revision of the consent forms achieving a significant reduction in readability to an average of 8.8 grade levels. However, the second revision did not yield further improvements. The analysis revealed a high precision score but a concerning level of content omissions, indicating that while LLMs can enhance readability, they may also compromise the completeness of the information provided.
๐ Impact and Implications
The findings of this study underscore the potential of LLMs to improve health literacy by making consent forms more accessible to patients. However, the trade-off between readability and content fidelity raises important questions about the role of technology in healthcare communication. It emphasizes the necessity for human oversight to ensure that essential information is not lost in the pursuit of simplicity.
๐ฎ Conclusion
This research highlights the dual-edged nature of using artificial intelligence in healthcare documentation. While LLMs can significantly enhance the readability of informed consent forms, the potential for content omissions necessitates careful human review. As we continue to integrate AI technologies into healthcare, it is crucial to balance accessibility with the integrity of information to ensure that patients are truly informed.
๐ฌ Your comments
What are your thoughts on the use of AI in improving patient communication? Do you believe the benefits outweigh the risks? Let’s discuss! ๐ฌ Share your insights in the comments below or connect with us on social media:
Speaking Patient’s Language: Assessment of Readability and Fidelity of Artificial Intelligence-Optimized Consent Forms.
Abstract
INTRODUCTION: Informed consent requires patients to fully comprehend the risks, benefits, and alternatives of an intervention. The American Medical Association and the National Institutes of Health recommend patient-facing materials be written at a sixth-grade level or lesser. We evaluated baseline readability of informed consents used within the endocrine surgery division of a tertiary care center and determined whether rewriting them with a large language model (LLM)-based chatbot can bring the text to the recommended level while preserving fidelity.
METHODS: Eight consent forms (two institutional procedural forms and six prospective trial documents) underwent readability assessment. Each form was processed by the LLM in two separate, independent sessions. Pre- and postedit readability scores were compared. Three independent reviewers assessed content fidelity by calculating precision, recall, and F1 scores (harmonic mean balancing precision and recall). Inter-rater reliability was evaluated using the intraclass correlation coefficient.
RESULTS: Original forms averaged 14.1 ยฑ 1.3 grade levels. First LLM revision significantly improved readability to an 8.8 ยฑ 1.2 grade level (P < 0.01), a five-grade reduction. Second LLM revision showed no further improvement (9.9 ยฑ 1.2; P = 0.87). The mean F1 score was 0.71 ยฑ 0.26, with high precision (0.95 ยฑ 0.06) but lower recall (0.62 ยฑ 0.16), indicating few hallucinations but frequent content omissions. Greater reductions in reading level were significantly associated with decreased content fidelity (r = 0.73, P < 0.01). Inter-rater agreement was excellent (K = 0.99, P < 0.01).
CONCLUSIONS: LLM-based editing significantly improved consent form readability but resulted in substantial content omissions. These findings demonstrate LLM's potential for advancing health literacy while highlighting the critical need for human review to ensure completeness and fidelity.
Author: [‘Gomez-Carrillo D’, ‘Izhar A’, ‘Arain H’, ‘Swaminathan N’, ‘Caretti R’, ‘Roy R’, ‘Kasmirski J’, ‘Gillis A’, ‘Dream S’, ‘Chen H’, ‘Lindeman B’]
Journal: J Surg Res
Citation: Gomez-Carrillo D, et al. Speaking Patient’s Language: Assessment of Readability and Fidelity of Artificial Intelligence-Optimized Consent Forms. Speaking Patient’s Language: Assessment of Readability and Fidelity of Artificial Intelligence-Optimized Consent Forms. 2026; 322:317-325. doi: 10.1016/j.jss.2026.03.079