⚡ Quick Summary
This study explored the effectiveness of web search advertisements in crowdsourcing a diverse dermatology image dataset, resulting in 5,749 submissions with a high rate of genuine images (97.9%). The findings suggest that this method could significantly enhance the availability of dermatological images for research and AI tool development.
🔍 Key Details
- 📊 Dataset: 5,749 submissions, 5,631 genuine dermatological images
- 🧩 Demographics: 66.7% female contributors, 52.0% aged <40 years
- 🌍 Racial diversity: 32.6% of contributors identified as non-White
- ⚙️ Study duration: March to November 2023
- 📈 Analysis period: January to February 2024
🔑 Key Takeaways
- 📊 Crowdsourcing via search ads is an effective method for gathering dermatology images.
- 💡 High engagement was observed, with a median of 22 submissions per day.
- 👩🔬 Demographic representation was skewed towards younger and female contributors.
- 🏆 Majority of conditions reported were allergic, infectious, or inflammatory (89.0%).
- 🌍 Dataset reflects geographical diversity in skin types and conditions.
- 🆔 Study conducted by the Skin Condition Image Network (SCIN).
- 📈 Increased dermatologist confidence in diagnosis correlated with more demographic data.
📚 Background
The availability of health datasets from clinical sources often lacks the necessary breadth and diversity, which can hinder research, medical education, and the development of artificial intelligence tools. This study addresses the need for innovative methods to create comprehensive health datasets, particularly in dermatology, where visual representation is crucial for accurate diagnosis and treatment.
🗒️ Study
Conducted from March to November 2023, this prospective observational survey utilized Google Search ads to invite internet users in the US to contribute images of dermatological conditions. Participants provided demographic and symptom information, which was then compiled into the SCIN open access dataset. The study ensured privacy and safety in image contributions, filtering submissions for quality and relevance.
📈 Results
The study received a total of 5,749 submissions, with 5,631 (97.9%) being genuine dermatological images. Notably, 66.7% of contributors were female, and 52.0% were under 40 years old, indicating a demographic representation that differs from the general US population. Furthermore, the dataset revealed that 32.6% of contributors identified as non-White, enhancing the diversity of the dataset.
🌍 Impact and Implications
The findings from this study highlight the potential of using web search advertisements as a viable method for crowdsourcing health datasets. By bridging significant gaps in the availability of images for common dermatological conditions, this approach could facilitate better research outcomes and improve the development of AI tools in healthcare. The SCIN dataset stands to enhance the understanding and treatment of dermatological conditions across diverse populations.
🔮 Conclusion
This study demonstrates the effectiveness of crowdsourcing through digital platforms to create a rich dermatology image dataset. The results indicate that such innovative methods can significantly improve the representation of diverse skin conditions, ultimately benefiting research and clinical practice. As we move forward, it is essential to explore and implement similar strategies in other areas of healthcare to enhance data diversity and accessibility.
💬 Your comments
What are your thoughts on using crowdsourcing for health datasets? Do you see potential for this approach in other medical fields? 💬 Share your insights in the comments below or connect with us on social media:
Creating an Empirical Dermatology Dataset Through Crowdsourcing With Web Search Advertisements.
Abstract
IMPORTANCE: Health datasets from clinical sources do not reflect the breadth and diversity of disease, impacting research, medical education, and artificial intelligence tool development. Assessments of novel crowdsourcing methods to create health datasets are needed.
OBJECTIVE: To evaluate if web search advertisements (ads) are effective at creating a diverse and representative dermatology image dataset.
DESIGN, SETTING, AND PARTICIPANTS: This prospective observational survey study, conducted from March to November 2023, used Google Search ads to invite internet users in the US to contribute images of dermatology conditions with demographic and symptom information to the Skin Condition Image Network (SCIN) open access dataset. Ads were displayed against dermatology-related search queries on mobile devices, inviting contributions from adults after a digital informed consent process. Contributions were filtered for image safety and measures were taken to protect privacy. Data analysis occurred January to February 2024.
EXPOSURE: Dermatologist condition labels as well as estimated Fitzpatrick Skin Type (eFST) and estimated Monk Skin Tone (eMST) labels.
MAIN OUTCOMES AND MEASURES: The primary metrics of interest were the number, quality, demographic diversity, and distribution of clinical conditions in the crowdsourced contributions. Spearman rank order correlation was used for all correlation analyses, and the χ2 test was used to analyze differences between SCIN contributor demographics and the US census.
RESULTS: In total, 5749 submissions were received, with a median of 22 (14-30) per day. Of these, 5631 (97.9%) were genuine images of dermatological conditions. Among contributors with self-reported demographic information, female contributors (1732 of 2596 contributors [66.7%]) and younger contributors (1329 of 2556 contributors [52.0%] aged <40 years) had a higher representation in the dataset compared with the US population. Of 2614 contributors who reported race and ethnicity, 852 (32.6%) reported a racial or ethnic identity other than White. Dermatologist confidence in assigning a differential diagnosis increased with the number of self-reported demographic and skin-condition-related variables (Spearman R = 0.1537; P < .001). Of 4019 contributions reporting duration since onset, 2170 (54.0%) reported onset within less than 7 days of submission. Of the 2835 contributions that could be assigned a dermatological differential diagnosis, 2523 (89.0%) were allergic, infectious, or inflammatory conditions. eFST and eMST distributions reflected the geographical origin of the dataset.
CONCLUSIONS AND RELEVANCE: The findings of this survey study suggest that search ads are effective at crowdsourcing dermatology images and could therefore be a useful method to create health datasets. The SCIN dataset bridges important gaps in the availability of images of common, short-duration skin conditions.
Author: [‘Ward A’, ‘Li J’, ‘Wang J’, ‘Lakshminarasimhan S’, ‘Carrick A’, ‘Campana B’, ‘Hartford J’, ‘Sreenivasaiah PK’, ‘Tiyasirisokchai T’, ‘Virmani S’, ‘Wong R’, ‘Matias Y’, ‘Corrado GS’, ‘Webster DR’, ‘Smith MA’, ‘Siegel D’, ‘Lin S’, ‘Ko J’, ‘Karthikesalingam A’, ‘Semturs C’, ‘Rao P’]
Journal: JAMA Netw Open
Citation: Ward A, et al. Creating an Empirical Dermatology Dataset Through Crowdsourcing With Web Search Advertisements. Creating an Empirical Dermatology Dataset Through Crowdsourcing With Web Search Advertisements. 2024; 7:e2446615. doi: 10.1001/jamanetworkopen.2024.46615