โก Quick Summary
A recent study evaluated the performance of three popular chatbotsโChatGPT, Google Gemini, and Microsoft Co-Pilotโin answering questions about cannabis and its use for cancer-related symptoms. While the chatbots demonstrated overall high accuracy, they also produced some misleading statements and missed critical information.
๐ Key Details
- ๐ค Chatbots evaluated: ChatGPT, Google Gemini, Microsoft Co-Pilot
- ๐ Date of evaluation: February 6, 2025
- ๐จโโ๏ธ Expert scoring: Responses scored by six physicians on accuracy and comprehensiveness
- ๐ Scoring scale: 0-10 for accuracy, comprehensiveness, and reliability
- ๐ Readability assessment: Flesch Kincaid Grade Level and Flesch Reading Ease scores
๐ Key Takeaways
- ๐ Mean accuracy scores: ChatGPT 9.0, Gemini 8.8, Co-Pilot 8.3
- โ ๏ธ Co-Pilot significantly underperformed in accuracy compared to ChatGPT (mean difference -0.62, p=0.008).
- ๐ Mean comprehensiveness scores: ChatGPT 8.1, Gemini 8.5, Co-Pilot 7.2
- ๐ก ChatGPT and Gemini outperformed Co-Pilot in comprehensiveness (p<0.001).
- ๐ Inaccuracies identified: Misleading statements about cannabis formulations and symptom benefits.
- โ ๏ธ Missing information: Adverse effects and drug interactions were not adequately addressed.
- ๐ Gemini had the lowest reliability score: 4.1.
- ๐ Readability was poor across all chatbots.

๐ Background
The use of chatbots in healthcare is on the rise, providing users with quick access to information. However, the complexity of health topics, particularly regarding cannabis and its therapeutic applications, raises questions about the reliability of chatbot responses. This study aimed to assess how well these AI tools can provide accurate and comprehensive information on cannabis use for cancer symptoms.
๐๏ธ Study
The study involved asking three popular chatbots to respond to questions derived from reputable sources, including the Centers for Disease Control and the American Society of Clinical Oncology. Responses were evaluated by six physicians with expertise in the field, focusing on accuracy, comprehensiveness, and readability.
๐ Results
The results indicated that while all three chatbots performed well in terms of accuracy, with mean scores around 8.3 to 9.0, there were significant discrepancies in comprehensiveness and reliability. Notably, Co-Pilot lagged behind the others, particularly in providing comprehensive answers. The study also highlighted the presence of misleading information and a lack of critical details regarding adverse effects and drug interactions.
๐ Impact and Implications
The findings of this study underscore the need for caution when relying on chatbot responses for complex health topics like cannabis use in cancer treatment. While these tools can provide quick information, healthcare professionals and patients alike should be aware of their limitations. The potential for misleading information necessitates further research and refinement of chatbot algorithms to enhance their reliability and comprehensiveness.
๐ฎ Conclusion
This study reveals the promising yet flawed nature of chatbot technology in addressing health-related inquiries. Although chatbots like ChatGPT and Google Gemini show high accuracy, the presence of inaccuracies and missing information calls for a cautious approach. As chatbot technology evolves, ongoing evaluation and improvement will be essential to ensure that these tools can effectively support patients and healthcare providers.
๐ฌ Your comments
What are your thoughts on the use of chatbots for health information? Do you think they can be reliable sources, or should we always consult a healthcare professional? ๐ฌ Share your insights in the comments below or connect with us on social media:
Chatbot Responses to Frequently Asked Questions about Cannabis and Its Use for Cancer Symptoms.
Abstract
CONTEXT: Chatbots are increasingly used by the public, but their performance in answering questions about complex health topics such as cannabis is unknown.
OBJECTIVES: To evaluate responses of three popular chatbots regarding cannabis and its use for cancer-related symptoms.
METHODS: We asked ChatGPT, Google Gemini, Microsoft Co-Pilot to answer questions about cannabis derived from the Centers for Disease Control website and American Society of Clinical Oncology guidelines regarding cannabis. Responses were collected on February 6, 2025. Six physicians with expertise in this field scored responses for accuracy and comprehensiveness (0-10 scale). Reliability of references was scored separately (0-10 scale). Readability was assessed using Flesch Kincaid Grade Level, Flesch Reading Ease scores.
RESULTS: Mean accuracy scores (SD) for ChatGPT, Gemini, Co-Pilot were 9.0 (1.8), 8.8 (2.3), 8.3 (2.3), respectively. Co-Pilot significantly underperformed in accuracy compared to ChatGPT (mean difference -0.62, 95% CI: -1.11, 0.14; p=0.008). Mean comprehensiveness scores (SD) for ChatGPT, Gemini, Co-Pilot were 8.1 (2.2), 8.5 (2.2), 7.2 (2.4), respectively. ChatGPT and Gemini performed better than Co-Pilot in comprehensiveness (mean difference Co-Pilot vs. ChatGPT: -0.88 [95% CI: 1.34, -0.42; p<0.001]; mean difference Co-Pilot vs. Gemini: -1.28 [95% CI: -1.74, -0.82; p<0.001]). Inaccurate or misleading statements regarding cannabis formulations and symptom benefits were identified, with missing information on adverse effects and drug interactions. Gemini had the lowest reliability (4.1). Readability among all chatbots was poor.
CONCLUSION: Despite overall high accuracy and comprehensiveness scores, chatbots made some misleading, inaccurate statements or missed information. For now, their answers should be interpreted with caution.
Author: [‘Kim MJ’, ‘Abrams DI’, ‘Braun IM’, ‘Case AA’, ‘Davis MP’, ‘Tanco K’, ‘Wallace MS’, ‘Manuel CM’, ‘Bruera E’, ‘Hui D’]
Journal: J Pain Symptom Manage
Citation: Kim MJ, et al. Chatbot Responses to Frequently Asked Questions about Cannabis and Its Use for Cancer Symptoms. Chatbot Responses to Frequently Asked Questions about Cannabis and Its Use for Cancer Symptoms. 2026; (unknown volume):(unknown pages). doi: 10.1016/j.jpainsymman.2026.04.002