โก Quick Summary
This study introduces a Task-Optimized Vision Transformer (TOViT) model for the detection and classification of diabetic retinopathy (DR), achieving an impressive 99% classification accuracy and F1-scores exceeding 93% across all DR stages. The model is designed for real-time deployment on low-cost hardware, making it a promising tool for early diagnosis in resource-constrained settings.
๐ Key Details
- ๐ Datasets Used: Three large-scale public datasets
- โ๏ธ Technology: Task-Optimized Vision Transformer (TOViT)
- ๐ Performance: 99% classification accuracy, F1-scores > 93%
- ๐ป Hardware Implementation: Raspberry Pi-4
- โฑ๏ธ Processing Speed: 8 frames per second with 120 ms latency
๐ Key Takeaways
- ๐๏ธ Diabetic Retinopathy is a leading cause of preventable blindness globally.
- ๐ก Early detection is crucial for timely intervention and better patient outcomes.
- ๐ค TOViT model integrates optimization strategies for enhanced feature extraction.
- ๐ Structured pruning and 8-bit quantization improve computational efficiency.
- ๐ Potential for use in portable, point-of-care screening devices.
- ๐ฌ Study conducted by Bhoopalan R et al. and published in Sci Rep.
- ๐ฉบ Implications for global healthcare systems in expanding access to retinal screening.

๐ Background
Diabetic retinopathy (DR) is a serious complication of diabetes that can lead to blindness if not detected early. Traditional methods of diagnosis often rely on expensive equipment and trained personnel, which can be a barrier in low-resource settings. The development of efficient, cost-effective diagnostic tools is essential to improve access to care and reduce the burden of preventable blindness.
๐๏ธ Study
The study aimed to create a model that could accurately detect and classify the severity of DR using retinal fundus images. The researchers designed the TOViT model to overcome the limitations of conventional deep learning models, particularly their inability to capture long-range dependencies in images while requiring significant computational resources.
๐ Results
The TOViT model demonstrated remarkable performance, achieving a classification accuracy of 99% and F1-scores exceeding 93% across all stages of DR. The model’s real-time performance on a Raspberry Pi-4, processing at 8 frames per second with a latency of 120 ms, confirms its feasibility for use in portable screening devices.
๐ Impact and Implications
The introduction of the TOViT model has significant implications for global healthcare, particularly in resource-constrained environments. By enabling early and accurate diagnosis of diabetic retinopathy, this technology could help reduce the incidence of preventable blindness and improve patient outcomes. The scalability of this model could lead to broader access to retinal screening, ultimately enhancing the quality of care in underserved populations.
๐ฎ Conclusion
This study highlights the potential of the TOViT model as a transformative tool in the early detection of diabetic retinopathy. By leveraging advanced machine learning techniques and optimizing for low-cost hardware, we can significantly improve access to essential eye care services. Continued research and development in this area are crucial for expanding the reach of automated diagnostic tools in healthcare.
๐ฌ Your comments
What are your thoughts on the use of AI in detecting diabetic retinopathy? We would love to hear your insights! ๐ฌ Share your comments below or connect with us on social media:
Task optimized vision transformer for diabetic retinopathy detection and classification in resource constrained early diagnosis settings.
Abstract
Diabetic Retinopathy (DR) is a progressive complication of diabetes and a leading cause of preventable blindness worldwide. Early detection and accurate classification of DR severity are critical for timely intervention but remain challenging, particularly in resource-constrained settings. While conventional deep learning (DL) models based on Convolutional Neural Networks (CNNs) have shown promising results, they often struggle to capture long-range dependencies in retinal fundus images and typically require substantial computational resources, limiting their utility on low-cost hardware. To address these challenges, this study introduces a Task-Optimized Vision Transformer (TOViT) model, specifically designed for DR detection and severity classification. The model integrates several optimization strategies, including layer-wise learning rate scheduling, attention head tuning, and embedding dimension refinement, to enhance feature extraction while maintaining computational efficiency. The model is further compressed through structured pruning and 8-bit quantization to support real-time deployment on Raspberry Pi-4 hardware. Evaluated on three large-scale public datasets, TOViT achieved a classification accuracy of 99%, with F1-scores exceeding 93% across all DR stages. Hardware implementation yielded real-time performance, processing at 8 frames per second with 120ย ms latency, confirming its potential for use in portable, point-of-care screening devices. This work presents a scalable and clinically relevant approach for automated DR diagnosis, with promising implications for expanding access to early retinal screening in global healthcare systems.
Author: [‘Bhoopalan R’, ‘Sekar P’, ‘Nagaprasad N’, ‘Mamo TR’, ‘Krishnaraj R’]
Journal: Sci Rep
Citation: Bhoopalan R, et al. Task optimized vision transformer for diabetic retinopathy detection and classification in resource constrained early diagnosis settings. Task optimized vision transformer for diabetic retinopathy detection and classification in resource constrained early diagnosis settings. 2025; 15:39047. doi: 10.1038/s41598-025-25399-1