A new foundation model bypasses cherry-picked images to evaluate gastric cancer risk using every photo taken during an endoscopy.
Why do medical AI models look so perfect in lab tests but stumble in real clinics? The answer lies in the curation. Most endoscopy AI models are trained on pristine, hand-selected images of lesions. In the real world, a doctor takes dozens of blurry, unaligned photos during a single procedure.
This disconnect makes standard frame-by-frame AI impractical.
A new model called GutCore challenges this paradigm by analyzing the entire messy folder of a patient’s endoscopy images at once. This shifts the focus from “is this specific frame cancerous” to “does this patient have invasive disease.” It brings AI closer to how clinical decisions are actually made.
Analyzing the whole case
Researchers pretrained the model on 5.6 million endoscopic images from more than ten hospitals. They then tested it on an internal cohort of 11,035 examinations from 2019 to 2023. This cohort included 8,049 cases of early or advanced gastric cancer and 2,986 cases of benign gastritis or intestinal metaplasia.
Instead of picking the best frame, the system aggregated every stored image from each patient’s examination. This approach mirrors the broader shift toward comprehensive digital diagnostics highlighted in recent digital health diagnostics research.
Predicting invasion and survival
The model proved highly accurate across several diagnostic tasks. It did not just spot tumors. It also mapped how deeply they had penetrated the stomach wall.
- Achieved an AUC of 0.995 for general cancer detection.
- Reached an AUC of 0.960 for detecting muscularis propria invasion and 0.804 for SM2-or-deeper invasion.
- Predicted molecular biomarkers, scoring an AUC of 0.831 for Epstein-Barr virus and 0.854 for MLH1 loss, though HER2 performance was weaker at 0.673.
- Separated advanced gastric cancer patients into high- and low-risk survival groups with a hazard ratio of 13.18.
That survival stratification is particularly telling. The risk separation remained clear even when looking strictly within patients who had pathological stage II and III disease. This suggests the AI is picking up on subtle visual cues of tumor aggressiveness that traditional staging might overlook.
The clinical reality check
While these numbers are impressive, the model still faces classic hurdles. It was evaluated on an internal cohort from a single tertiary center. We have seen other diagnostic tools excel in quiet lab environments only to falter when deployed across diverse clinical workflows, a challenge often discussed in endoscopic ultrasound AI literature.
The weaker performance on HER2 prediction also shows that computer vision cannot entirely replace physical tissue biopsies yet. However, by proving that uncurated, messy clinical photo sets can yield high-quality diagnostic data, this work sets a new baseline for how we build and evaluate endoscopic software.
Read the full study in medRxiv.
