๐Ÿง‘๐Ÿผโ€๐Ÿ’ป Research - March 27, 2026

Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator.

๐ŸŒŸ Stay Updated!
Join AI Health Hub to receive the latest insights in health and AI.

โšก Quick Summary

A recent study developed an explainable machine learning tool for predicting long-term cardiovascular disease risk in Chinese adults aged 45 and older. Utilizing data from over 8,000 participants, the random forest model achieved an impressive AUC of 0.829, highlighting the importance of waist circumference as a key predictor.

๐Ÿ” Key Details

  • ๐Ÿ“Š Dataset: 8,080 participants aged โ‰ฅ 45 years from the CHARLS study (2011-2020)
  • ๐Ÿงฉ Features used: 77 candidate variables, including hypertension, dyslipidaemia, and waist circumference
  • โš™๏ธ Technology: Ten machine learning algorithms, with a focus on random forest
  • ๐Ÿ† Performance: Random forest model: AUC 0.829, accuracy 0.770, sensitivity 0.681, specificity 0.795

๐Ÿ”‘ Key Takeaways

  • ๐Ÿ’ก Machine learning offers a powerful approach to cardiovascular risk prediction tailored for the Chinese population.
  • ๐Ÿ“ˆ Waist circumference emerged as the most critical predictor of cardiovascular disease risk.
  • ๐Ÿ” SHAP analysis provided transparent insights into feature contributions, enhancing model interpretability.
  • ๐Ÿง  Psychobehavioral factors like depression and sleep duration were also identified as independent predictors.
  • ๐ŸŒ A web-based risk calculator was developed for real-time cardiovascular disease probability estimates.
  • ๐Ÿฅ The study highlights the need for localized risk prediction models to improve healthcare outcomes in specific populations.
  • ๐Ÿ“… Follow-up duration of 9 years provided robust data for long-term risk assessment.
  • ๐ŸŒ This research addresses a significant gap in cardiovascular risk prediction for the Chinese middle-aged and elderly demographic.

๐Ÿ“š Background

Cardiovascular disease is the leading cause of mortality in China, accounting for over 40% of all deaths. Traditional risk prediction models have primarily been developed using data from Western populations, which may not accurately reflect the unique health profiles and risk factors present in the Chinese middle-aged and elderly demographic. This study aims to bridge that gap by utilizing machine learning techniques to create a more tailored and interpretable risk prediction tool.

๐Ÿ—’๏ธ Study

The study utilized data from the China Health and Retirement Longitudinal Study (CHARLS), enrolling 8,080 participants aged 45 years and older without baseline cardiovascular disease. The researchers aimed to compare the predictive performance of ten machine learning algorithms and identify the optimal model for long-term cardiovascular risk prediction. The study employed SHAP methodology to ensure transparent interpretation of the model’s predictions.

๐Ÿ“ˆ Results

Among the ten algorithms evaluated, the random forest model demonstrated superior performance with a validation set AUC of 0.829. The model exhibited excellent calibration and provided maximal net clinical benefit across various risk thresholds. Notably, 1,246 participants (22.0%) in the training cohort experienced incident cardiovascular disease, with hypertension, waist circumference, dyslipidaemia, and liver disease identified as principal independent predictors.

๐ŸŒ Impact and Implications

This study’s findings have significant implications for public health in China. By developing a machine learning-based risk prediction tool tailored to the local population, healthcare providers can implement more effective community-based screening and individualized prevention strategies. The web-based risk calculator offers a practical solution for resource-constrained settings, enabling real-time assessments of cardiovascular disease risk and promoting proactive health management.

๐Ÿ”ฎ Conclusion

The research highlights the transformative potential of machine learning in cardiovascular risk prediction, particularly for the Chinese middle-aged and elderly population. By focusing on key predictors such as waist circumference and incorporating psychobehavioral factors, this study paves the way for more personalized and effective healthcare interventions. The future of cardiovascular disease prevention looks promising with the integration of advanced technologies in risk assessment.

๐Ÿ’ฌ Your comments

What are your thoughts on the use of machine learning for cardiovascular risk prediction? We would love to hear your insights! ๐Ÿ’ฌ Leave your comments below or connect with us on social media:

Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator.

Abstract

Cardiovascular disease represents the leading cause of mortality in China, accounting for over 40% of all deaths. Existing risk prediction models predominantly derive from Western populations, rendering them suboptimally calibrated for the Chinese middle-aged and elderly demographic. Conventional statistical approaches inadequately capture non-linear associations within high-dimensional data, whilst machine learning models, despite superior performance, suffer from insufficient interpretability. This study leveraged a nationally representative cohort to develop an interpretable machine learning-based tool for long-term cardiovascular risk prediction tailored to the Chinese population. To compare the predictive performance of ten machine learning algorithms using data from the China Health and Retirement Longitudinal Study (CHARLS), identify the optimal model, achieve transparent interpretation through SHapley Additive exPlanations (SHAP) methodology, and develop an individualized cardiovascular risk assessment tool for Chinese residents aged 45 years and above. The study enrolled 8,080 participants agedโ€‰โ‰ฅโ€‰45 years without baseline cardiovascular disease from the CHARLS 2011-2020 longitudinal dataset, with 9-year follow-up. The primary outcome was incident cardiovascular disease. From 77 candidate variables, logistic regression analysis identified 11 predictors: geographical region, hypertension, dyslipidaemia, liver disease, asthma, depression score, age, sleep duration, triglycerides, high-density lipoprotein cholesterol, and waist circumference. The cohort was randomly partitioned into training (nโ€‰=โ€‰5,657, 70%) and validation (nโ€‰=โ€‰2,423, 30%) sets. Ten predictive models were constructed, including random forest, gradient boosting machine, and extreme gradient boosting. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), calibration plots, and decision curve analysis. Feature contributions were elucidated using SHAP values. Incident cardiovascular disease occurred in 1,246 participants (22.0%) within the training cohort. Multivariable analysis identified hypertension (adjusted OR 1.80), waist circumference (adjusted OR 1.05 per 1-cm increment), dyslipidaemia (adjusted OR 1.42), and liver disease (adjusted OR 1.60) as principal independent predictors. Among ten algorithms evaluated, random forest demonstrated superior performance: validation set AUC 0.829 (95% CI 0.809-0.848), accuracy 0.770, sensitivity 0.681, specificity 0.795. The model exhibited excellent calibration and yielded maximal net clinical benefit across the 10%-85% risk threshold spectrum. SHAP analysis revealed waist circumference as the predominant contributor, followed by triglycerides, age, and hypertension. Psychobehavioural factors (depression, sleep duration) demonstrated independent predictive value. A web-based risk calculator was developed, providing real-time individual 9-year cardiovascular disease probability estimates. The random forest model accurately predicts cardiovascular disease risk in the Chinese middle-aged and elderly population, with waist circumference emerging as the most critical predictor. Translated into an online assessment tool, this model facilitates community-based screening and individualized prevention, offering a pragmatic risk stratification approach for resource-constrained settings.

Author: [‘Zhu XY’, ‘Li W’, ‘Pan XY’, ‘Li T’, ‘Yuan GL’]

Journal: Sci Rep

Citation: Zhu XY, et al. Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator. Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator. 2026; (unknown volume):(unknown pages). doi: 10.1038/s41598-026-45297-4

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.