โก Quick Summary
This study developed and validated a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training. The model achieved a remarkable 70% accuracy and an F1 score of 0.76, demonstrating its potential to enhance skill assessments in medical training.
๐ Key Details
- ๐ Dataset: Videos from 8 experts and 21 novices performing thyroid US
- ๐งฉ Features used: Video frames processed into sequences of 1, 10, and 50 seconds
- โ๏ธ Technology: Convolutional neural network with a pre-trained ResNet-50 base and long short-term memory layer
- ๐ Performance: 50-second sequences achieved 70% accuracy and 0.76 F1 score
๐ Key Takeaways
- ๐ค AI model effectively differentiates between expert and novice performance in ultrasound training.
- ๐ 50-second video sequences provided the best performance metrics.
- โณ Experts demonstrated significantly longer durations above the competence threshold compared to novices.
- ๐ Bayesian updating and adaptive thresholding were used to assess performance over time.
- ๐ Potential applications across various procedural domains beyond ultrasound training.
- ๐ Study published in Med Teach, highlighting the importance of AI in medical education.
๐ Background
Assessing skills in simulated settings has traditionally been resource-intensive and often lacks validated metrics. The integration of artificial intelligence into medical training offers a promising solution to automate competence assessments, thereby enhancing the training process and ensuring better preparedness for real-world scenarios.
๐๏ธ Study
The study focused on developing a machine learning AI model to evaluate performance during thyroid ultrasound training. By analyzing video data from both experts and novices, the researchers aimed to create a system that could provide automated, near real-time assessments of competence, ultimately improving the training experience for medical professionals.
๐ Results
The AI model demonstrated its effectiveness by achieving a 70% accuracy and an F1 score of 0.76 when analyzing 50-second video sequences. Notably, experts maintained longer durations above the competence threshold (15.71 seconds) compared to novices (9.31 seconds), indicating a clear distinction in skill levels.
๐ Impact and Implications
The findings from this study could significantly transform the landscape of medical training. By utilizing AI for automated assessments, educators can provide more detailed micro-assessments of complex procedures. This approach not only enhances the interpretability of skill evaluations but also has the potential to be applied across various procedural domains, improving training outcomes globally.
๐ฎ Conclusion
This research highlights the transformative potential of AI in medical training. The development of a long short-term memory-based AI model for thyroid ultrasound training represents a significant step forward in automating skill assessments. As we continue to explore the integration of AI in healthcare education, the future looks promising for enhanced training methodologies and improved patient care.
๐ฌ Your comments
What are your thoughts on the use of AI for skill assessments in medical training? We would love to hear your insights! ๐ฌ Join the conversation in the comments below or connect with us on social media:
Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models.
Abstract
BACKGROUND: Assessing skills in simulated settings is resource-intensive and lacks validated metrics. Advances in AI offer the potential for automated competence assessment, addressing these limitations. This study aimed to develop and validate a machine learning AI model for automated evaluation during simulation-based thyroid ultrasound (US) training.
METHODS: Videos from eight experts and 21 novices performing thyroid US on a simulator were analyzed. Frames were processed into sequences of 1, 10, and 50 seconds. A convolutional neural network with a pre-trained ResNet-50 base and a long short-term memory layer analyzed these sequences. The model was trained to distinguish competence levels (competent=1, not competent=0) using fourfold cross-validation, with performance metrics including precision, recall, F1 score, and accuracy. Bayesian updating and adaptive thresholding assessed performance over time.
RESULTS: The AI model effectively differentiated expert and novice US performance. The 50-second sequences achieved the highest accuracy (70%) and F1 score (0.76). Experts showed significantly longer durations above the threshold (15.71s) compared to novices (9.31s, p= .030).
CONCLUSIONS: A long short-term memory-based AI model provides near real-time, automated assessments of competence in US training. Utilizing temporal video data enables detailed micro-assessments of complex procedures, which may enhance interpretability and be applied across various procedural domains.
Author: [‘Bang Andersen I’, ‘Sรธndergaard Svendsen MB’, ‘Risgaard AL’, ‘Sander Danstrup C’, ‘Todsen T’, ‘Tolsgaard MG’, ‘Friis ML’]
Journal: Med Teach
Citation: Bang Andersen I, et al. Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models. Enabling micro-assessments of skills in the simulated setting using temporal artificial intelligence-models. 2025; (unknown volume):1-10. doi: 10.1080/0142159X.2025.2555353