๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - May 16, 2026

Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study.

๐ŸŒŸ Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

โšก Quick Summary

This study presents a groundbreaking effort to standardize surgical gesture terminology through a SAGES Delphi consensus process, resulting in a hierarchical taxonomy of 10 clusters, 24 gestures, and 46 sub-gestures. This standardized framework aims to enhance surgical workflow analysis and improve the interoperability of datasets in surgical data science.

๐Ÿ” Key Details

  • ๐Ÿ“Š Initial Terms: 270 literature-derived gesture terms
  • ๐Ÿ”„ Process: Two Delphi surveys and a pilot video-based validation task
  • ๐Ÿ† Final Taxonomy: 10 clusters, 24 gestures, 46 sub-gestures
  • ๐Ÿค Expert Panel: Multi-round expert review with โ‰ฅ 80% agreement threshold

๐Ÿ”‘ Key Takeaways

  • ๐Ÿ“ˆ Gesture-level tokenization provides greater specificity than phase- or step-level labels.
  • ๐Ÿ’ก Multi-instrument annotation is crucial for capturing assisting actions in surgery.
  • ๐ŸŽฅ Video-based validation showed high agreement for key gestures like coagulate and suction.
  • ๐Ÿ” Ambiguities were identified among semantically similar actions, guiding final revisions.
  • ๐ŸŒ Standardization aims to reduce annotation variability and enhance model reproducibility.
  • ๐Ÿš€ Future Steps: Defining temporal boundaries for gestures is the next critical step.

๐Ÿ“š Background

The field of surgical workflow analysis has been hindered by a lack of standardized terminology for surgical gestures. Traditional methods often rely on broad phase- or step-level labels, which can obscure the nuances of surgical actions. The introduction of gesture-level tokenization aims to capture the intricate interactions between instruments and tissues, providing a more detailed and functional representation of surgical actions.

๐Ÿ—’๏ธ Study

The study was conducted by a panel of experts through a SAGES-led Delphi consensus process. Starting with an extensive list of gesture terms, the researchers employed a novel hybrid approach that combined large language model (LLM)-assisted semantic clustering with expert reviews. The process included two rounds of Delphi surveys and a pilot video-based task where participants labeled surgical clips, culminating in a final consensus meeting.

๐Ÿ“ˆ Results

The iterative refinement of the taxonomy led to a final consensus that includes 10 clusters, 24 gestures, and 46 sub-gestures. The panel emphasized the importance of multi-instrument annotation, rejecting the notion of dominant-instrument-only labeling. Video-based validation demonstrated high agreement for several gestures, while also revealing ambiguities that informed the final revisions of the taxonomy.

๐ŸŒ Impact and Implications

This standardized surgical gesture taxonomy is poised to significantly impact surgical data science. By providing a common language for surgical actions, it aims to enhance the reliability of cross-study comparisons and facilitate the development of scalable video-based assessments and autonomous systems. The implications extend beyond research, potentially improving surgical training and patient outcomes through better data analysis.

๐Ÿ”ฎ Conclusion

The establishment of a standardized surgical gesture taxonomy marks a significant advancement in the field of surgical workflow analysis. By reducing variability in gesture annotation and enhancing dataset interoperability, this framework lays the groundwork for future innovations in surgical data science. As the field evolves, defining temporal boundaries for gestures will be essential for further progress.

๐Ÿ’ฌ Your comments

What are your thoughts on the importance of standardizing surgical gesture terminology? We invite you to share your insights and engage in a discussion! ๐Ÿ’ฌ Leave your comments below or connect with us on social media:

Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study.

Abstract

INTRODUCTION: Artificial intelligence (AI) for surgical workflow analysis often fails to generalize because surgical actions lack a standardized, fine-grained representation. Gesture-level “tokenization” of surgery, capturing instrument-tissue interactions as the smallest intentional functional units, offers greater technical specificity than phase- or step-level labels and has demonstrated associations with proficiency and clinical outcomes. However, the field remains fragmented by heterogeneous gesture terminology, limiting dataset interoperability and model reproducibility.
METHODS: We conducted a SAGES-led, accelerated Delphi consensus process to establish a standardized surgical gesture taxonomy. Starting with 270 literature-derived gesture terms, we employed a novel hybrid pipeline combining large language model (LLM)-assisted semantic clustering with multi-round expert review. The process involved two Delphi surveys (open-ended, then structured agreement) with a predefined โ‰ฅย 80% agreement threshold, a pilot interactive video-based validation task where participants labeled 30 surgical clips, and a final in-person consensus meeting with live anonymous polling.
RESULTS: Across iterative refinement, the taxonomy evolved from 106 gestures in 11 clusters to a hierarchical framework of Clusters, Gestures, and Sub-gestures, which, after consolidation and pilot annotation, reached a final consensus taxonomy comprising 10 clusters, 24 gestures, and 46 sub-gestures. The panel rejected dominant-instrument-only labeling, supporting multi-instrument annotation to capture assisting actions critical to surgical quality. Video-based validation demonstrated high agreement for multiple gestures (e.g., coagulate, suction, irrigate, staple, clip, needle drive), while identifying predictable ambiguities among semantically proximate actions (e.g., cut vs seal; grasp vs clamp; dissect vs spread), informing final revisions.
CONCLUSION: This work establishes a standardized, hierarchical taxonomy for surgical gestures, providing a foundational language for surgical data science. This framework is designed to reduce annotation variability, enable reliable cross-study comparisons, and accelerate the development of scalable video-based assessment, computer vision, and autonomous systems. Defining temporal boundaries for these gestures was identified as the next critical step.

Author: [‘Morais MC’, ‘Godbole AA’, ‘Iqbal E’, ‘Ballo M’, ‘Jarc A’, ‘Van Amsterdam B’, ‘Matthews B’, ‘Schlachta CM’, ‘Donoho DA’, ‘Hashimoto DA’, ‘Redan JA’, ‘Marwaha J’, ‘Gould J’, ‘Feldman LS’, ‘Meireles O’, ‘De Backer P’, ‘Mascagni P’, ‘LaPlante S’, ‘Sarin A’, ‘Shchatsko A’, ‘Walsh D’, ‘Fer DM’, ‘Funes DR’, ‘Sasaki K’, ‘Szoka N’, ‘Lazzaretti SS’, ‘Ross SB’, ‘Schnelldorfer T’, ‘Krieger A’, ‘Hung AJ’, ‘Filicori F’]

Journal: Surg Endosc

Citation: Morais MC, et al. Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study. Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study. 2026; (unknown volume):(unknown pages). doi: 10.1007/s00464-026-12906-2

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.