โก Quick Summary
This study introduces a novel machine learning (ML) stacking technique that combines Single Nucleotide Polymorphisms (SNPs) and inferred local ancestry (LA) to enhance the prediction of severe asthma risk. The integration of these data types significantly improved predictive performance, achieving an AUC of 0.729.
๐ Key Details
- ๐ Dataset: 248 African American pediatric patients from the Biorepository and Integrative Genomics (BIG) Initiative
- ๐งฉ Features used: SNPs and inferred local ancestry data
- โ๏ธ Technology: Machine learning stacking technique with logistic regression and random forest models
- ๐ Performance: Stacked SNP pipeline AUC: 0.693, Stacked LA pipeline AUC: 0.625, Combined AUC: 0.729
๐ Key Takeaways
- ๐ Novel ML approach integrates genotype and ancestry data for better asthma risk prediction.
- ๐ก Significant improvement in predictive accuracy with an AUC increase from 0.693 to 0.729.
- ๐ฉโ๐ฌ Distinct features from SNP and LA data provide complementary insights into asthma risk.
- ๐ฅ Potential for personalized medicine through the effective use of multifactorial data.
- ๐ Study highlights the importance of ancestry in understanding genetic predispositions.
- ๐งฌ SNP and LA data capture different sources of variation in asthma response.
- ๐ Future research could expand this approach to other complex conditions.

๐ Background
Asthma is a prevalent respiratory condition that can significantly impact quality of life, particularly in its severe form. Traditional methods of predicting asthma risk often fall short, especially in diverse populations. The integration of genetic data, such as SNPs, with ancestry information offers a promising avenue for enhancing predictive models, paving the way for more tailored treatment strategies.
๐๏ธ Study
This study utilized data from the BIG Initiative, focusing on a cohort of 248 self-reported African American pediatric patients. The researchers developed a machine learning framework that employed a stacking technique to combine SNP and LA data, aiming to improve the prediction of response to inhaled corticosteroids (ICS) in severe asthma cases.
๐ Results
The results demonstrated that the stacked SNP pipeline achieved an AUC of 0.693, while the LA pipeline yielded an AUC of 0.625. However, when these two data types were integrated, the predictive performance significantly improved, resulting in an AUC of 0.729 (paired t-test p-value = 0.005). This indicates that combining these distinct data types can enhance the understanding of asthma risk factors.
๐ Impact and Implications
The findings from this study have profound implications for the field of personalized medicine. By effectively utilizing multifactorial data, healthcare providers can better predict medication responses in complex conditions like severe asthma. This approach not only enhances patient care but also contributes to the broader goal of tailoring treatments based on individual genetic and ancestral backgrounds.
๐ฎ Conclusion
This study showcases the potential of integrating machine learning with genetic and ancestry data to improve asthma risk prediction. The significant improvement in predictive accuracy highlights the importance of considering diverse data sources in clinical settings. As we move forward, further research in this area could lead to groundbreaking advancements in personalized medicine, ultimately benefiting patients with severe asthma and other complex conditions.
๐ฌ Your comments
What are your thoughts on the integration of genetic and ancestry data in predicting asthma risk? We would love to hear your insights! ๐ฌ Leave your comments below or connect with us on social media:
Machine learning models incorporating genotype and ancestry improve severe asthma risk prediction.
Abstract
This study proposes a novel machine learning (ML)-based stacking technique that integrates Single Nucleotide Polymorphisms (SNPs) and inferred local ancestry (LA) to improve predictive accuracy in clinical outcomes. Asthma, particularly severe asthma (SA) with poor response to inhaled corticosteroids (ICS), serves as the case study to illustrate this approach. Using data from the Biorepository and Integrative Genomics (BIG) Initiative, which includes whole-exome sequenced data from a self-reported African American pediatric cohort (N=248), we develop an ML framework to predict ICS response. After SNP data preprocessing and LA estimation, we employ stratified 10-fold cross-validation, creating base pipelines for SNP and LA data, which are then combined in stacked pipelines to assess the effectiveness of integrating these distinct data types. The stacked SNP pipeline yields an AUC of 0.693 ยฑ 0.066 and the stacked LA pipeline yields an AUC of 0.625 ยฑ 0.103. The integration of LA with SNP data significantly improves predictive performance, boosting the AUC to 0.729 ยฑ 0.048 (paired t-test p-value = 0.005). Pipelines using LA data alone shows comparable performance to those using SNP data alone. However, the most important contributing features are distinct between LA and SNP data demonstrating that these data types capture distinct sources of variation and could provide complementary insights. This study highlights the potential of stacking ML pipelines, based on feature selection techniques and along with logistic regression and random forest predictive models, to integrate SNP and LA data. Such holistic approach has the promise to improve predictive performance of medication response in complex conditions like SA. This approach has broader implications for advancing personalized medicine through the effective use of multifactorial data.
Author: [‘Tahmin N’, ‘Chinthala LK’, ‘Marsico FL’, ‘Buonaiuto S’, ‘Mohammed A’, ‘Carlisle A’, ‘Gautam Y’, ‘Colonna V’, ‘Mersha TB’, ‘Davis RL’, ‘Khojandi A’]
Journal: Sci Rep
Citation: Tahmin N, et al. Machine learning models incorporating genotype and ancestry improve severe asthma risk prediction. Machine learning models incorporating genotype and ancestry improve severe asthma risk prediction. 2025; 15:40243. doi: 10.1038/s41598-025-24080-x