โก Quick Summary
This study introduces a novel Clinician Turing Test to evaluate the effectiveness of the AI ventilator assistant (AVA) in providing clinical decision support for patients with sepsis and acute respiratory distress syndrome (ARDS). By assessing whether clinicians can distinguish between AI-generated and human-generated treatment recommendations, this research aims to establish a preclinical signal of safety and appropriateness for AI deployment in critical care settings.
๐ Key Details
- ๐ Participants: 350 critical care clinicians from six US hospitals
- ๐งฉ Vignettes: Nine clinical scenarios involving sepsis and ARDS
- โ๏ธ Methodology: Randomized, electronic, vignette-based study
- ๐ Primary Endpoint: Accuracy in identifying treatment profiles as AI-generated or human-generated
- ๐ Expected Results: 2026
๐ Key Takeaways
- ๐ค AI CDSS: The AI ventilator assistant (AVA) aims to support treatment decisions in critical care.
- ๐งช Clinician Turing Test: A unique approach to validate AI recommendations against human clinicians.
- ๐ Study Design: Multisite, randomized, and vignette-based for robust evaluation.
- ๐ Primary Focus: Determine if clinicians can distinguish AI-generated recommendations from human ones.
- ๐ก Secondary Insights: Clinicians’ perceptions of safety, appropriateness, and interest in AI CDSSs will also be assessed.
- ๐ก๏ธ Ethical Approval: The study has received Institutional Review Board approval.
- ๐ Broader Implications: Findings could inform future clinical implementation of AI in critical care.
- ๐ Trial Registration: NCT07025096.

๐ Background
The integration of artificial intelligence (AI) into clinical decision-making has the potential to transform patient care, particularly in high-stakes environments like the intensive care unit (ICU). Conditions such as sepsis and ARDS require rapid and accurate decision-making, making the evaluation of AI-based clinical decision support systems (CDSSs) crucial. However, traditional methods of assessing AI effectiveness often lack the necessary clinical context, leading to a gap in understanding their real-world applicability.
๐๏ธ Study
This study proposes a Phase 1b design to evaluate the AI ventilator assistant (AVA) through a Clinician Turing Test. By recruiting 350 critical care clinicians, the research will present participants with nine clinical vignettes derived from real patient scenarios. Each vignette will include treatment profiles that are either AI-generated or based on actual human clinician decisions, allowing for a direct comparison of decision-making processes.
๐ Results
The primary outcome will be the accuracy of clinicians in identifying the source of treatment recommendations. This will be analyzed using a mixed-effects logistic regression model, providing insights into the reliability of AI-generated recommendations. Secondary outcomes will explore clinicians’ perceptions regarding the safety and appropriateness of the AI’s suggestions, as well as their confidence in distinguishing between AI and human recommendations.
๐ Impact and Implications
The findings from this study could significantly influence the future of AI integration in critical care settings. By establishing whether AI-generated recommendations can be indistinguishable from those made by human clinicians, we can gain valuable insights into the clinical safety and appropriateness of AI systems. This research not only aims to validate AVA but also sets a precedent for evaluating other AI CDSSs in healthcare, potentially leading to improved patient outcomes and more efficient clinical workflows.
๐ฎ Conclusion
This innovative study highlights the potential of AI in enhancing clinical decision-making for critical conditions like sepsis and ARDS. By employing a Clinician Turing Test, researchers aim to provide essential data on the safety and effectiveness of AI CDSSs before their deployment in real-world clinical environments. The future of AI in healthcare looks promising, and continued research in this area is vital for advancing patient care.
๐ฌ Your comments
What are your thoughts on the use of AI in clinical decision-making? Do you believe that AI can effectively support clinicians in high-stakes environments? ๐ฌ Share your insights in the comments below or connect with us on social media:
Evaluating AI-based comprehensive clinical decision support for sepsis and ARDS: protocol for a Clinician Turing Test.
Abstract
INTRODUCTION: Few artificial intelligence (AI) clinical decision support systems (CDSSs) are ever evaluated in practice. Although some signal of clinical effectiveness may be needed to justify AI deployment and testing, such data are typically unavailable in early-stage research. This conundrum is especially relevant in the intensive care unit (ICU), where conditions like sepsis and acute respiratory distress syndrome (ARDS) require high-stakes decisions. Our group developed the AI ventilator assistant (AVA), a novel AI CDSS for patients with sepsis ARDS receiving invasive mechanical ventilation. But the promising results of predictive performance estimates are not sufficient to assess AVA’s clinical safety and appropriateness prior to future evaluation and deployment. Therefore, we propose a Clinician Turing Test as a novel validation approach to determine whether clinicians can distinguish AVA-generated treatment recommendations from those enacted by real human clinicians. If AVA’s recommendations are consistently indistinguishable from those of real clinicians, thereby ‘passing’ this Turing test, this would provide a strong preclinical signal of safety and appropriateness.
METHODS AND ANALYSIS: This multisite, randomised, electronic, vignette-based Phase 1b study will use a Clinician Turing Test design. We aim to recruit 350 critical care clinicians, including physicians and advanced practice providers from six US hospitals. Participants will review nine clinical vignettes of patients with sepsis and ARDS derived from the Molecular Epidemiology of Severe Sepsis in the ICU cohort and an associated profile of a suggested treatment plan. For each participant-vignette combination, the source of the treatment profile will be randomly assigned (AI-generated by AVA vs the actually enacted treatment from real human clinicians) in a 1:1 allocation. The primary endpoint is the participants’ accuracy in identifying whether a treatment profile was AI-generated or human-generated, assessed using equivalence testing through a mixed-effects logistic regression model with random effects for participants and vignettes. Secondarily, a fitted binary classifier will assess discrimination ability using the C-statistic. Secondary endpoints include clinicians’ perceptions of the safety and appropriateness of the treatment profiles, confidence in distinguishing AI-generated and human-generated recommendations, interest in AI CDSSs for sepsis and ventilator management and the time to complete the survey. This novel Phase 1b design provides preliminary but essential information about an AI CDSS’s clinical appropriateness without the risk or cost of actual deployment, thereby informing decisions about future clinical implementation and evaluation in real clinical environments.
ETHICS AND DISSEMINATION: This protocol was approved by the Institutional Review Board of the University of Pennsylvania (Protocol #858201). Results are expected in 2026 and will be submitted for publication in peer-reviewed journals and presented at scientific conferences.
TRIAL REGISTRATION NUMBER: NCT07025096.
Author: [‘Angeli Gazola A’, ‘Bishop NS’, ‘Schmid BE’, ‘Pirracchio R’, ‘Valley TS’, ‘Bhavani SV’, ‘Krutsinger DC’, ‘Giannini HM’, ‘Lu Y’, ‘Ungar LH’, ‘Meyer NJ’, ‘Kerlin MP’, ‘Weissman GE’]
Journal: BMJ Open
Citation: Angeli Gazola A, et al. Evaluating AI-based comprehensive clinical decision support for sepsis and ARDS: protocol for a Clinician Turing Test. Evaluating AI-based comprehensive clinical decision support for sepsis and ARDS: protocol for a Clinician Turing Test. 2025; 15:e106757. doi: 10.1136/bmjopen-2025-106757