โก Quick Summary
This systematic review evaluated multimodal AI models for diagnosing Alzheimer disease (AD), analyzing 66 studies over five years. The findings reveal that these models consistently outperform single-modal approaches, achieving an average accuracy of 92.5% for AD diagnosis and an area under the curve (AUC) of 0.922 for mild cognitive impairment conversion.
๐ Key Details
- ๐ Dataset: 66 studies on multimodal AI for AD
- ๐งฉ Features used: Clinical, imaging, genetic, and linguistic data
- โ๏ธ Technology: Multimodal machine learning and deep learning
- ๐ Performance: Average accuracy of 92.5% for AD diagnosis, AUC of 0.922 for mild cognitive impairment conversion
๐ Key Takeaways
- ๐ Multimodal AI models significantly outperform single-modal models in AD diagnosis.
- ๐ก Average accuracy for AD diagnosis reached 92.5% across datasets.
- ๐ Mild cognitive impairment conversion models achieved an AUC of 0.922.
- ๐ค Fusion architectures reported AUCs exceeding 0.95.
- ๐ UK Biobank studies showed an average AUC of 0.84 in large population datasets.
- ๐ฃ๏ธ DementiaBank speech-language studies achieved an AUC of 0.813.
- ๐ Self-collected datasets demonstrated high accuracy but limited generalizability.
- ๐ Need for standardized benchmarks and transparent evaluation protocols highlighted.

๐ Background
Early detection of Alzheimer disease (AD) is crucial for effective intervention, yet diagnostic performance varies significantly across different modalities and datasets. Recent advancements in multimodal artificial intelligence (AI) have shown promise, but the evidence remains fragmented due to diverse datasets and modeling approaches. This review aims to consolidate findings and provide a clearer understanding of the current landscape in AD diagnosis.
๐๏ธ Study
Following the PRISMA 2020 guidelines, this systematic review analyzed studies published over the last five years that applied multimodal machine learning or deep learning techniques to AD, mild cognitive impairment, and dementia outcomes. The review included studies that utilized multiple data modalities while excluding those that relied on a single modality or lacked methodological rigor.
๐ Results
The review identified a total of 66 studies that met the inclusion criteria. Notably, multimodal models consistently outperformed single-modal baselines. For instance, the Alzheimer’s Disease Neuroimaging Initiative achieved an average accuracy of 92.5% (SD 3.8%), while models predicting mild cognitive impairment conversion reached an average AUC of 0.922 (SD 0.045). In contrast, studies from the UK Biobank reported an average AUC of 0.84 (SD 0.056).
๐ Impact and Implications
The findings from this systematic review underscore the potential of multimodal AI models to enhance the accuracy of AD diagnosis and risk prediction. By integrating diverse biological, clinical, and behavioral data, these models can provide a more comprehensive understanding of the disease. However, the review also highlights the need for standardized benchmarks and transparent evaluation protocols to ensure reliable real-world applications of these technologies.
๐ฎ Conclusion
This systematic review illustrates the significant advancements in multimodal AI for Alzheimer disease diagnosis. By framing these models as not just performance-driven tools but also as frameworks for equitable and interpretable diagnosis, the study paves the way for future research and clinical applications. The call for standardized practices is essential to maximize the impact of these technologies in real-world settings.
๐ฌ Your comments
What are your thoughts on the role of multimodal AI in diagnosing Alzheimer disease? We would love to hear your insights! ๐ฌ Share your comments below or connect with us on social media:
Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities.
Abstract
BACKGROUND: Early detection of Alzheimer disease (AD) is essential for timely intervention; yet, diagnostic performance varies widely across modalities and datasets. Recent multimodal artificial intelligence (AI) models have made significant progress, but the evidence base remains fragmented due to heterogeneous datasets, modeling frameworks, and reporting quality.
OBJECTIVE: This systematic review aimed to analyze studies on multimodal AI models for AD diagnosis, prognosis, and risk prediction over 5 years. We evaluated dataset characteristics, modality combinations, modeling strategies, performance metrics, and methodological limitations. We further discuss real-world implications and translational pathways.
METHODS: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines, we systematically searched PubMed, IEEE Xplore, Scopus, ACM Digital Library, Cochrane, and arXiv, with the final datasets last searched on November 15, 2025. Studies applying multimodal machine learning or deep learning to AD, mild cognitive impairment, and dementia outcomes were included, whereas studies using a single modality or lacking sufficient methodological detail were excluded. QUADAS-2 (Revised Quality Assessment of Diagnostic Accuracy Studies tool) assessed risk of bias. Extracted performance results were synthesized across 4 major multimodal dataset families.
RESULTS: A total of 66 studies met the inclusion criteria. Across datasets, multimodal models consistently outperformed single-modal baselines. Alzheimer’s Disease Neuroimaging Initiative-based diagnosis achieved an average accuracy of 92.5% (SD 3.8%), while mild cognitive impairment-conversion models achieved an average area under the curve (AUC) of 0.922 (SD 0.045), and several fusion architectures reported AUCs above 0.95. In contrast, UK Biobank risk-prediction studies reported an average AUC of 0.84 (SD 0.056), and this reflects performance in large, population-based datasets. DementiaBank speech-language studies achieved an average AUC of 0.813 (SD 0.042), and cross-lingual AD detection achieved an accuracy of 77% (SD 6.5%). Self-collected multimodal datasets demonstrated average accuracies around 96% (SD 2.4%), but their generalizability is limited due to small sample sizes and single-center designs.
CONCLUSIONS: This systematic review demonstrates that multimodal AI models consistently outperform single-modal models for AD diagnosis, prognosis, and risk prediction by integrating complementary biological, clinical, and behavioral information. Unlike prior reviews, this review provides a unified synthesis across heterogeneous clinical, imaging, genetic, and linguistic datasets, enabling cross-domain comparison of modeling strategies and performance. However, the generalizability of reported performance was limited due to substantial heterogeneity in dataset composition, outcome definitions, and validation, and prevalent risks of bias. By evaluating these factors, this review clarifies where current evidence is robust and where caution is warranted. The findings highlight the need for standardized multimodal benchmarks, transparent evaluation protocols, and clinically grounded model design to enable reliable real-world deployment. Overall, this work advances the field by framing multimodal AI not only as a performance-driven tool but also as a translational framework for equitable, interpretable, and scalable AD diagnosis.
Author: [‘Yu Z’, ‘Mulholland A’, ‘Huang T’, ‘Liu Q’]
Journal: J Med Internet Res
Citation: Yu Z, et al. Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities. Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities. 2026; 28:e85414. doi: 10.2196/85414