โก Quick Summary
This study presents a groundbreaking effort to standardize surgical gesture terminology through a SAGES Delphi consensus process, resulting in a hierarchical taxonomy of 10 clusters, 24 gestures, and 46 sub-gestures. This standardized framework aims to enhance surgical workflow analysis and improve the interoperability of datasets in surgical data science.
๐ Key Details
- ๐ Initial Terms: 270 literature-derived gesture terms
- ๐ Process: Two Delphi surveys and a pilot video-based validation task
- ๐ Final Taxonomy: 10 clusters, 24 gestures, 46 sub-gestures
- ๐ค Expert Panel: Multi-round expert review with โฅ 80% agreement threshold
๐ Key Takeaways
- ๐ Gesture-level tokenization provides greater specificity than phase- or step-level labels.
- ๐ก Multi-instrument annotation is crucial for capturing assisting actions in surgery.
- ๐ฅ Video-based validation showed high agreement for key gestures like coagulate and suction.
- ๐ Ambiguities were identified among semantically similar actions, guiding final revisions.
- ๐ Standardization aims to reduce annotation variability and enhance model reproducibility.
- ๐ Future Steps: Defining temporal boundaries for gestures is the next critical step.

๐ Background
The field of surgical workflow analysis has been hindered by a lack of standardized terminology for surgical gestures. Traditional methods often rely on broad phase- or step-level labels, which can obscure the nuances of surgical actions. The introduction of gesture-level tokenization aims to capture the intricate interactions between instruments and tissues, providing a more detailed and functional representation of surgical actions.
๐๏ธ Study
The study was conducted by a panel of experts through a SAGES-led Delphi consensus process. Starting with an extensive list of gesture terms, the researchers employed a novel hybrid approach that combined large language model (LLM)-assisted semantic clustering with expert reviews. The process included two rounds of Delphi surveys and a pilot video-based task where participants labeled surgical clips, culminating in a final consensus meeting.
๐ Results
The iterative refinement of the taxonomy led to a final consensus that includes 10 clusters, 24 gestures, and 46 sub-gestures. The panel emphasized the importance of multi-instrument annotation, rejecting the notion of dominant-instrument-only labeling. Video-based validation demonstrated high agreement for several gestures, while also revealing ambiguities that informed the final revisions of the taxonomy.
๐ Impact and Implications
This standardized surgical gesture taxonomy is poised to significantly impact surgical data science. By providing a common language for surgical actions, it aims to enhance the reliability of cross-study comparisons and facilitate the development of scalable video-based assessments and autonomous systems. The implications extend beyond research, potentially improving surgical training and patient outcomes through better data analysis.
๐ฎ Conclusion
The establishment of a standardized surgical gesture taxonomy marks a significant advancement in the field of surgical workflow analysis. By reducing variability in gesture annotation and enhancing dataset interoperability, this framework lays the groundwork for future innovations in surgical data science. As the field evolves, defining temporal boundaries for gestures will be essential for further progress.
๐ฌ Your comments
What are your thoughts on the importance of standardizing surgical gesture terminology? We invite you to share your insights and engage in a discussion! ๐ฌ Leave your comments below or connect with us on social media:
Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study.
Abstract
INTRODUCTION: Artificial intelligence (AI) for surgical workflow analysis often fails to generalize because surgical actions lack a standardized, fine-grained representation. Gesture-level “tokenization” of surgery, capturing instrument-tissue interactions as the smallest intentional functional units, offers greater technical specificity than phase- or step-level labels and has demonstrated associations with proficiency and clinical outcomes. However, the field remains fragmented by heterogeneous gesture terminology, limiting dataset interoperability and model reproducibility.
METHODS: We conducted a SAGES-led, accelerated Delphi consensus process to establish a standardized surgical gesture taxonomy. Starting with 270 literature-derived gesture terms, we employed a novel hybrid pipeline combining large language model (LLM)-assisted semantic clustering with multi-round expert review. The process involved two Delphi surveys (open-ended, then structured agreement) with a predefined โฅย 80% agreement threshold, a pilot interactive video-based validation task where participants labeled 30 surgical clips, and a final in-person consensus meeting with live anonymous polling.
RESULTS: Across iterative refinement, the taxonomy evolved from 106 gestures in 11 clusters to a hierarchical framework of Clusters, Gestures, and Sub-gestures, which, after consolidation and pilot annotation, reached a final consensus taxonomy comprising 10 clusters, 24 gestures, and 46 sub-gestures. The panel rejected dominant-instrument-only labeling, supporting multi-instrument annotation to capture assisting actions critical to surgical quality. Video-based validation demonstrated high agreement for multiple gestures (e.g., coagulate, suction, irrigate, staple, clip, needle drive), while identifying predictable ambiguities among semantically proximate actions (e.g., cut vs seal; grasp vs clamp; dissect vs spread), informing final revisions.
CONCLUSION: This work establishes a standardized, hierarchical taxonomy for surgical gestures, providing a foundational language for surgical data science. This framework is designed to reduce annotation variability, enable reliable cross-study comparisons, and accelerate the development of scalable video-based assessment, computer vision, and autonomous systems. Defining temporal boundaries for these gestures was identified as the next critical step.
Author: [‘Morais MC’, ‘Godbole AA’, ‘Iqbal E’, ‘Ballo M’, ‘Jarc A’, ‘Van Amsterdam B’, ‘Matthews B’, ‘Schlachta CM’, ‘Donoho DA’, ‘Hashimoto DA’, ‘Redan JA’, ‘Marwaha J’, ‘Gould J’, ‘Feldman LS’, ‘Meireles O’, ‘De Backer P’, ‘Mascagni P’, ‘LaPlante S’, ‘Sarin A’, ‘Shchatsko A’, ‘Walsh D’, ‘Fer DM’, ‘Funes DR’, ‘Sasaki K’, ‘Szoka N’, ‘Lazzaretti SS’, ‘Ross SB’, ‘Schnelldorfer T’, ‘Krieger A’, ‘Hung AJ’, ‘Filicori F’]
Journal: Surg Endosc
Citation: Morais MC, et al. Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study. Standardization of surgical gesture taxonomy: a SAGES Delphi consensus study. 2026; (unknown volume):(unknown pages). doi: 10.1007/s00464-026-12906-2