๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - March 10, 2025

Human Reviewers’ Ability to Differentiate Human-Authored or Artificial Intelligence-Generated Medical Manuscripts: A Randomized Survey Study.

๐ŸŒŸ Stay Updated!
Join Dr. Ailexa’s channels to receive the latest insights in health and AI.

โšก Quick Summary

A recent study evaluated the ability of human reviewers to distinguish between human-authored and AI-generated medical manuscripts. The findings revealed a specificity of 55.6% and a sensitivity of 31.2%, indicating that generative AI can produce manuscripts that are challenging to differentiate from those written by humans.

๐Ÿ” Key Details

  • ๐Ÿ“… Study Duration: October 1, 2023, to December 1, 2023
  • ๐Ÿ‘ฉโ€โš•๏ธ Participants: 51 physicians, including post-doctorates, assistant professors, and full professors
  • ๐Ÿ“ Manuscripts: AI-generated using ChatGPT 3.5 and randomly selected human-authored manuscripts
  • ๐Ÿ” Evaluation Method: Blinded survey questionnaire after manuscript review

๐Ÿ”‘ Key Takeaways

  • ๐Ÿ“Š Overall accuracy of human reviewers was low, with a specificity of 55.6% and sensitivity of 31.2%.
  • ๐Ÿ’ก High-impact factor manuscripts were identified with greater accuracy compared to low-impact factor ones (P=.037).
  • ๐Ÿ‘ฉโ€๐ŸŽ“ Academic rank and prior manuscript review experience did not significantly predict accuracy.
  • ๐Ÿค– Frequency of AI interaction was a significant factor in correct identification, with higher odds ratios for frequent users.
  • ๐ŸŒ Generative AI like ChatGPT can produce manuscripts indistinguishable from human authorship.

๐Ÿ“š Background

The rise of artificial intelligence in various fields has sparked discussions about its implications, particularly in academia and medical writing. As AI technologies advance, understanding their capabilities and limitations becomes crucial, especially in maintaining the integrity of scientific literature. This study aims to explore how well human reviewers can discern between AI-generated and human-authored medical manuscripts.

๐Ÿ—’๏ธ Study

Conducted at a single academic center, this prospective randomized survey study involved 51 physicians who were tasked with reviewing a selection of manuscripts. The manuscripts included both AI-generated texts created using ChatGPT 3.5 and randomly selected human-authored works. Participants were blinded to the authorship and were asked to identify the authors after reviewing the manuscripts.

๐Ÿ“ˆ Results

The study revealed that human reviewers had a specificity of 55.6% and a sensitivity of 31.2% in distinguishing between the two types of manuscripts. The positive predictive value was 38.5%, while the negative predictive value stood at 47.6%. Notably, the ability to identify high-impact factor manuscripts was significantly better than that for low-impact factor ones, highlighting the influence of manuscript quality on reviewer accuracy.

๐ŸŒ Impact and Implications

The findings of this study raise important questions about the future of medical writing and the role of AI in academia. As generative AI continues to evolve, the potential for creating high-quality manuscripts that are indistinguishable from human writing could challenge traditional notions of authorship and peer review. This could lead to a need for new guidelines and standards in academic publishing to ensure the integrity and credibility of scientific literature.

๐Ÿ”ฎ Conclusion

This study highlights the growing capabilities of generative AI in producing medical manuscripts that are difficult for human reviewers to differentiate from those authored by humans. As AI technologies advance, it is essential for the academic community to adapt and establish frameworks that address the implications of AI-generated content in scientific publishing. Continued research in this area will be vital to navigate the evolving landscape of medical literature.

๐Ÿ’ฌ Your comments

What are your thoughts on the implications of AI in medical writing? Do you believe that AI-generated manuscripts could enhance or undermine the quality of scientific literature? Let’s discuss! ๐Ÿ’ฌ Leave your thoughts in the comments below or connect with us on social media:

Human Reviewers’ Ability to Differentiate Human-Authored or Artificial Intelligence-Generated Medical Manuscripts: A Randomized Survey Study.

Abstract

OBJECTIVE: To assess the ability of humans to differentiate human-authored vs artificial intelligence (AI)-generated medical manuscripts.
METHODS: This is a prospective randomized survey study from October 1, 2023, to December 1, 2023, from a single academic center. Artificial intelligence-generated medical manuscripts were created using ChatGPT 3.5 and were evaluated alongside randomly selected human-authored manuscripts. Participants, who were blinded from manuscript selection and creation, were randomized to receive three manuscripts that were either human-authored or AI-generated and had to fill out a survey questionnaire after review regarding who authored the manuscript. The primary outcome was accuracy of human reviewers in differentiating manuscript authors. Secondary outcomes were to identify factors that influenced prediction accuracy.
RESULTS: Fifty-one physicians were included in the study, including 12 post-doctorates, 19 assistant professors, and 20 associate or full professors. The overall specificity of 55.6% (95% CI, 30.8% to 78.5%), sensitivity of 31.2% (95% CI,11.0% to 58.7%), positive predictive value of 38.5% (95% CI,13.9% to 68.4%) and negative predictive value of 47.6% (95% CI, 25.7% to 70.2%). A stratified analysis of human-authored manuscripts indicated that high-impact factor manuscripts were identified with higher accuracy than low-impact factor ones (P=.037). For individual-level data, neither academic rank nor prior manuscript review experience significantly predicted the accuracy. The frequency of AI interaction was a significant factor, with occasional (odds ratio [OR], 8.20; P=.016), fairly frequent (OR, 7.13; P=.033), and very frequent (OR, 8.36; P=.030) use associated with correct identification. Further analysis revealed no significant predictors among the papers’ qualities.
CONCLUSION: Generative AI such as ChatGPT could create medical manuscripts that could not be differentiated from human-authored manuscripts.

Author: [‘Helgeson SA’, ‘Johnson PW’, ‘Gopikrishnan N’, ‘Koirala T’, ‘Moreno-Franco P’, ‘Carter RE’, ‘Quicksall ZS’, ‘Burger CD’]

Journal: Mayo Clin Proc

Citation: Helgeson SA, et al. Human Reviewers’ Ability to Differentiate Human-Authored or Artificial Intelligence-Generated Medical Manuscripts: A Randomized Survey Study. Human Reviewers’ Ability to Differentiate Human-Authored or Artificial Intelligence-Generated Medical Manuscripts: A Randomized Survey Study. 2025; (unknown volume):(unknown pages). doi: 10.1016/j.mayocp.2024.08.029

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.