โก Quick Summary
A recent international study evaluated the performance of large language models in addressing patient queries about colorectal cancer (CRC) screening across 28 countries and 23 languages. The findings indicate that these models can significantly enhance patient education, achieving high scores in accuracy, completeness, and comprehensibility of responses.
๐ Key Details
- ๐ Study Scope: Conducted in 28 countries, covering 23 languages.
- ๐ Questions: A standardized set of 15 CRC screening-related questions.
- ๐ค Technology: Responses generated by ChatGPT (GPT-4o).
- ๐จโโ๏ธ Evaluation: 140 gastroenterologists assessed responses for accuracy, completeness, and comprehensibility.
- ๐ Statistical Methods: t-test, Chi-square, and two-way ANOVA used for analysis.
๐ Key Takeaways
- ๐ Multilingual Capability: ChatGPT demonstrated strong performance in multiple languages.
- ๐ High Ratings: 73.9% for accuracy, 86.9% for completeness, and 82.6% for comprehensibility scored โฅ4.
- ๐ Variability: Lower scores were noted in Chinese, Dutch, and Greek languages.
- ๐ Context Matters: Performance varied between countries sharing the same language.
- ๐ก Patient Education: The model shows promise as a tool for enhancing patient understanding of CRC screening.
- โ ๏ธ Caution Required: Regional variability necessitates careful validation before clinical use.

๐ Background
Colorectal cancer (CRC) is a significant health concern worldwide, with screening proven to reduce both incidence and mortality rates. Despite its benefits, patient adherence to screening recommendations remains suboptimal. Addressing patient queries in their native languages could improve participation rates, yet the effectiveness of large language models in this context has not been thoroughly investigated until now.
๐๏ธ Study
This cross-continental study was conducted from April to June 2025, involving a diverse group of participants from Europe, Asia, Africa, America, and Oceania. A set of 15 CRC screening-related questions was translated into 23 languages and submitted to ChatGPT (GPT-4o). The responses were then evaluated by a panel of 140 gastroenterologists, who rated them on a 5-point Likert scale for accuracy, completeness, and comprehensibility.
๐ Results
The study yielded impressive results, with mean scores for accuracy, completeness, and comprehensibility being 4.1 ยฑ 1.0, 4.1 ยฑ 1.0, and 4.2 ยฑ 0.9, respectively. Most languages achieved high ratings, indicating that the model effectively addressed patient queries. However, notable exceptions were observed in certain languages, highlighting the need for tailored approaches in different linguistic contexts.
๐ Impact and Implications
The findings from this study suggest that large language models like ChatGPT could serve as valuable tools for multilingual patient education regarding CRC screening. By providing accurate and comprehensible information in native languages, these models have the potential to enhance patient engagement and adherence to screening programs. However, the observed regional variability underscores the importance of validating these tools in specific contexts before widespread clinical integration.
๐ฎ Conclusion
This study highlights the significant potential of large language models in improving patient education about colorectal cancer screening. With their ability to generate accurate and comprehensible responses in multiple languages, these models could play a crucial role in enhancing patient participation in screening programs. Continued research and validation are essential to ensure their effective integration into clinical practice.
๐ฌ Your comments
What are your thoughts on the use of AI in patient education? Do you believe that large language models can truly enhance understanding and adherence to health screenings? ๐ฌ Share your insights in the comments below or connect with us on social media:
Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries.
Abstract
BACKGROUND: Colorectal cancer (CRC) screening reduces incidence and mortality, yet patient adherence remains suboptimal. Large language models may improve participation by addressing patient questions in native languages, but their multilingual performance has not been systematically assessed.
METHODS: From April to June 2025, we conducted a cross-continental study involving 28 countries and 23 languages. A standardized set of 15 CRC screening-related questions was translated into each language and submitted to ChatGPT (GPT-4o). Responses were independently evaluated by 140 gastroenterologists (five per country) for accuracy, completeness, and comprehensibility on a 5-point Likert scale. Statistical analyses included t-test, Chi-square, and two-way ANOVA.
RESULTS: The study included experts and data from Europe, Asia, Africa, America, and Oceania. Mean scores (ยฑSD) for accuracy, completeness, and comprehensibility were 4.1 ยฑ 1.0, 4.1 ยฑ 1.0, and 4.2 ยฑ 0.9, respectively. Most languages achieved high ratings, with 73.9%, 86.9%, and 82.6% scoring โฅ4 for accuracy, completeness, and comprehensibility. However, lower scores were observed in Chinese, Dutch, and Greek. Variability was also noted between countries sharing the same language, highlighting language- and context-dependent performance.
DISCUSSION: ChatGPT showed strong ability to answer CRC screening questions across multiple languages, supporting its promise as a multilingual patient education tool. Nonetheless, regional variability requires careful validation before clinical integration.
Author: [‘M M’, ‘A P’, ‘S G’, ‘T V’, ‘Lhs L’, ‘S B’, ‘P P’, ‘M M’, ‘T Z’, ‘H U’, ‘Ejt A’, ‘D B’, ‘H D’, ‘T D’, ‘A G’, ‘T K’, ‘S L’, ‘B L’, ‘H M’, ‘R N’, ‘Y O’, ‘A R’, ‘A T’, ‘I M’, ‘World Endoscopy Organization (WEO) Emerging Stars Program’]
Journal: Dig Liver Dis
Citation: M M, et al. Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries. Performance of large language models in addressing patient queries on colorectal cancer screening in different languages: An international study across 28 countries. 2025; (unknown volume):(unknown pages). doi: 10.1016/j.dld.2025.11.026