โก Quick Summary
This study utilized natural language processing (NLP) tools to analyze 4,592 posts from the Reddit forum r/PreCervicalCancer, revealing key insights into patient experiences with pre-cervical cancer. The findings indicate that automated methods can effectively uncover patient sentiments and concerns with minimal human oversight.
๐ Key Details
- ๐ Dataset: 4,592 posts from r/PreCervicalCancer
- ๐งฉ Features used: Posts and comments on social media
- โ๏ธ Technology: BERTopic and GPT-4o mini for topic clustering
- ๐ Performance: 88.0% accuracy in topic clustering
๐ Key Takeaways
- ๐ NLP tools can effectively analyze patient experiences in non-traditional data sources.
- ๐ก 10 distinct topics were identified from the posts, reflecting common patient concerns.
- ๐ฉโ๐ฌ Sentiment analysis showed that comments had less negative sentiment than posts (Cohen’s d = 0.46).
- ๐ฅ Key concerns included lasting physical and psychological impacts of procedures like LEEP.
- ๐ Study validated the use of AI-driven methods for large-scale analysis of clinical content.
- ๐ Minimal human oversight was required for accurate analysis, showcasing the efficiency of AI.
- ๐ Community engagement was measured through post upvote scores and comment counts.

๐ Background
Understanding patient experiences is crucial in the management of pre-cervical cancer. Traditional methods of gathering patient feedback can be limited and often fail to capture the breadth of experiences shared in online communities. The rise of social media platforms has opened new avenues for researchers to explore these experiences, particularly through the lens of natural language processing (NLP) technologies.
๐๏ธ Study
This study focused on the Reddit forum r/PreCervicalCancer, where patients share their experiences and concerns regarding pre-cervical cancer. By employing NLP tools such as BERTopic and GPT-4o mini, researchers aimed to cluster posts into meaningful topics and analyze sentiment, providing a comprehensive view of patient experiences.
๐ Results
The analysis revealed that posts clustered into 10 different topics with an impressive 88.0% accuracy. Additionally, 80.0% of the topic headings generated by the large language model were deemed appropriate. The study also highlighted that comments exhibited a less negative sentiment compared to posts, indicating a supportive community environment.
๐ Impact and Implications
The findings from this study underscore the potential of AI-driven methods to enhance our understanding of patient experiences in healthcare. By leveraging social media data, healthcare providers can gain valuable insights into patient concerns and sentiments, ultimately leading to improved patient care and support systems. This approach could pave the way for more personalized healthcare strategies and better patient outcomes.
๐ฎ Conclusion
This study demonstrates the remarkable capabilities of NLP tools in analyzing patient experiences related to pre-cervical cancer. The ability to conduct large-scale analyses with minimal human oversight opens new doors for research and patient engagement. As we continue to explore the intersection of AI and healthcare, the insights gained from such studies will be invaluable in shaping future patient care strategies.
๐ฌ Your comments
What are your thoughts on the use of AI in understanding patient experiences? We would love to hear your insights! ๐ฌ Leave your comments below or connect with us on social media:
Application and validation of AI-driven methods to explore patient experiences of pre-cervical cancer.
Abstract
OBJECTIVE: We sought to apply novel natural language processing (NLP) tools to explore patient experiences of pre-cervical cancer on social media and validate the performance of these tools.
METHODS: All posts and comments were extracted from the forum r/PreCervicalCancer on social media platform Reddit. Using BERTopic, posts were clustered into topics according to their semantic similarity, which were manually reviewed. Topic headings were derived using a large language model (LLM) and compared to manually curated headings. Clustering outliers were reassigned by BERTopic, an LLM and by manual methods in parallel and compared. Post and comment sentiment were quantitatively analysed using VADER. Post upvote scores and comments counts were analysed to measure community engagement.
RESULTS: 4592 posts were extracted from r/PreCervicalCancer. Posts clustered into 10 different topics using BERTopic with 88.0% accuracy. 80.0% of topic headings generated by GPT-4o mini were deemed appropriate. Reassignment of clustering outliers by BERTopic and GPT-4o mini was limited, 52.8% and 41.1% accuracy, respectively. Key clinical findings reflect several common concerns among patients, particularly regarding specific lasting physical and psychological impact of procedures like LEEP, result anxiety, and challenges in healthcare navigation. Comments had less negative sentiment than posts (Cohen’s dย =ย 0.46), suggesting support.
CONCLUSIONS: In this cross-sectional study, we validated NLP tools to analyse content, sentiment and reactions to 4592 posts on pre-cervical cancer. Our findings suggest that, with minimal human oversight, automated methods can accurately conduct large-scale analyses of similar clinical content, unlocking new insights of patient experiences using non-traditional data sources.
Author: [‘Luo MY’, ‘Williams CYK’]
Journal: Eur J Obstet Gynecol Reprod Biol
Citation: Luo MY and Williams CYK. Application and validation of AI-driven methods to explore patient experiences of pre-cervical cancer. Application and validation of AI-driven methods to explore patient experiences of pre-cervical cancer. 2026; 318:114953. doi: 10.1016/j.ejogrb.2026.114953