โก Quick Summary
This study introduces a clinician-driven automated data preprocessing approach in nuclear medicine AI environments, emphasizing the importance of clinician input in enhancing machine learning model performance. The implementation of a rule set table (RST) demonstrated an impressive up to 18% increase in balanced accuracy for oncology-specific models.
๐ Key Details
- ๐ Cohorts analyzed: Prostate, glioma, and diffuse large B-cell lymphoma (DLBCL)
- โ๏ธ Technology used: XGBoost algorithm for classification tasks
- ๐ Evaluation method: 100-fold Monte Carlo cross-validation
- ๐ Performance metrics: Balanced accuracy (BACC) increase of up to 18%
๐ Key Takeaways
- ๐ค Clinician involvement in data preprocessing is crucial for improving AI model outcomes.
- ๐ Rule set table (RST) allows clinicians to input explicit and non-explicit rules for data preprocessing.
- ๐ Models with “exp-keep” and “pref-keep” instructions showed the highest performance increases across datasets.
- ๐ Performance improvements were noted in glioma (+18% BACC), prostate (+6% BACC), and DLBCL (+3% BACC).
- ๐ Manual vs. automated preprocessing setups were compared, highlighting the effectiveness of clinician-driven approaches.
- ๐ก This study serves as a proof of concept for more inclusive data preprocessing processes in future research.
๐ Background
In the realm of clinical science, the integration of artificial intelligence (AI) has become increasingly vital. However, the success of AI models heavily relies on extensive data preprocessing (DP) steps, traditionally managed by data scientists. This study highlights the necessity of incorporating clinical domain knowledge into the DP process, ensuring that clinicians play a pivotal role in shaping the algorithms that drive AI decision-making.
๐๏ธ Study
The research proposed a novel approach to data preprocessing, utilizing a rule set table (RST) as an interface for clinicians to input their expertise in a human-readable format. This interface translates clinician input into machine-readable commands for preprocessing algorithms, facilitating a more collaborative environment between data scientists and clinicians. The study evaluated the impact of RST on various clinical cohorts, employing both manual and automated preprocessing setups.
๐ Results
The findings revealed a significant performance boost in machine learning models when clinician-driven preprocessing was applied. Specifically, models utilizing the RST showed an increase of up to 18% in balanced accuracy compared to those without RST. The glioma cohort exhibited the most substantial improvement, while prostate and DLBCL cohorts also benefited from clinician input, demonstrating the effectiveness of this approach across different cancer types.
๐ Impact and Implications
The implications of this study are profound, suggesting that a more inclusive, clinician-driven data preprocessing process can enhance the predictive performance of AI models in oncology. By bridging the gap between clinical expertise and data science, we can pave the way for more accurate and reliable AI applications in nuclear medicine and beyond. This approach not only improves model performance but also fosters collaboration among healthcare professionals, ultimately benefiting patient care.
๐ฎ Conclusion
This study underscores the importance of integrating clinician insights into the data preprocessing phase of AI model development. The demonstrated enhancements in predictive performance highlight the potential for a more collaborative approach in future research. As we continue to explore the intersection of AI and clinical practice, embracing clinician-driven methodologies could lead to significant advancements in patient outcomes and healthcare delivery.
๐ฌ Your comments
What are your thoughts on the role of clinicians in data preprocessing for AI models? We would love to hear your insights! ๐ฌ Join the conversation in the comments below or connect with us on social media:
Clinician-driven automated data preprocessing in nuclear medicine AI environments.
Abstract
BACKGROUND: Artificial Intelligence (AI) approaches in clinical science require extensive data preprocessing (DP) steps prior to building AI models. Establishing DP pipelines is a non-trivial task, mainly driven by purely mathematical rules and done by data scientists. Nevertheless, clinician presence shall be paramount at this step. The study proposes a data preprocessing approach driven by clinical domain knowledge, where clinician input, in form of explicit and non-explicit rules, directly impacts the algorithms’ decision-making processes, thus, making the DP planning phase more inclusive for clinicians.
METHODS: The rule set table (RST) was introduced as interface which accepts clinician’s input as formal rules (including four actions: exp-keep, exp-remove, pref-keep, pref-remove features or samples) in human-readable form and translates it to machine readable input for preprocessing algorithms. A collection of commonly used algorithms was incorporated for data preprocessing of various clinical cohorts in both single and multi-center scenarios. The impact of RST was evaluated by utilizing 100-fold Monte Carlo cross-validation scheme for prostate and glioma cohorts (single center) with 80โ-โ20% training-testing split. Furthermore, diffuse large B-cell lymphoma (DLBCL) cohort was evaluated by using Center 1 as training and Center 2 as testing cohort for clinical endpoint prediction. Both scenarios were investigated in manual and automated data preprocessing setups across all cohorts. The XGBoost algorithm was employed for classification tasks across all established models. Predictive performance was estimated by confusion matrix analysis in validation samples of all cohorts. The performance of RST across all actions as well as without RST were compared in both manual and automated settings for each respective cohort.
RESULTS: Performance increase of ML models with manual preprocessing combined with RST was up-to 18% balanced accuracy (BACC) compared to models without RST. The ML models with “exp-keep” and “pref-keep” instructions showed highest performance increase of +โ18% BACC (glioma), +โ6% BACC (prostate) and +โ3% BACC (DLBCL) compared to other models across all datasets.
CONCLUSION: The study demonstrated the added value of RST in predictive performance of oncology-specific ML models, hence, serving as proof of concept of a more inclusive clinician-driven DP process in future studies.
Author: [‘Krajnc D’, ‘Spielvogel CP’, ‘Ecsedi B’, ‘Ritter Z’, ‘Alizadeh H’, ‘Hacker M’, ‘Papp L’]
Journal: Eur J Nucl Med Mol Imaging
Citation: Krajnc D, et al. Clinician-driven automated data preprocessing in nuclear medicine AI environments. Clinician-driven automated data preprocessing in nuclear medicine AI environments. 2025; (unknown volume):(unknown pages). doi: 10.1007/s00259-025-07183-5