โก Quick Summary
This study evaluated the content validity and inter-rater reliability of AI-generated stuttering assessment and intervention programs using GPT-4 in both Turkish and English. The findings indicate that while the majority of items were rated as appropriate, human validation is essential for refining culturally specific elements.
๐ Key Details
- ๐ฅ Participants: 12 certified speech-language pathologists
- ๐ Programs Reviewed: 12 AI-generated programs (6 in Turkish, 6 in English)
- โ๏ธ Evaluation Method: 5-point Likert scale
- ๐ Reliability Metrics: Cronbach’s Alpha and Intraclass Correlation Coefficients (ICC)
๐ Key Takeaways
- ๐ AI-generated materials can be clinically relevant and linguistically accurate.
- ๐ Majority rating: Items scored between 4.6 and 4.9 on appropriateness.
- ๐ Reliability: Overall inter-rater reliability was poor (ICC = 0.45), but single-rater reliability was higher (ICC = 0.65).
- ๐ Revision Needs: Few items flagged for revision, mainly related to emotional or contextual components.
- ๐น๐ท Cultural Adaptation: Turkish versions required clearer cultural context compared to English versions.
- ๐ค AI’s Role: Highlights the importance of human validation in AI-generated content.
- ๐ Multilingual Development: Supports the integration of AI tools in creating multilingual clinical content.

๐ Background
Stuttering is a complex fluency disorder that affects individuals across various age groups and cultural backgrounds. Traditional assessment and intervention methods often rely on human expertise, which can be time-consuming and limited by availability. The advent of artificial intelligence offers a promising avenue for developing tailored assessment and intervention programs that can be adapted to different languages and cultural contexts.
๐๏ธ Study
This study aimed to assess the content validity of AI-generated stuttering programs created by GPT-4, focusing on both Turkish and English versions. Twelve certified speech-language pathologists specializing in fluency disorders reviewed the programs, rating each item on a 5-point Likert scale. The study also sought to determine if linguistic or cultural differences influenced expert evaluations.
๐ Results
The results revealed that the majority of items were rated as appropriate or highly appropriate, with mean scores ranging from 4.6 to 4.9. However, the overall inter-rater reliability was found to be poor (ICC = 0.45), indicating variability among raters. In contrast, single-rater reliability showed a higher ICC of 0.65. Experts noted that while the English versions were generally more detailed, certain Turkish terms needed better cultural adaptation.
๐ Impact and Implications
The findings from this study underscore the potential of AI in generating clinically relevant materials for stuttering assessment and intervention. By integrating AI-assisted tools, clinicians can enhance the development of multilingual content, making it more accessible to diverse populations. However, the necessity for human validation remains critical to ensure that emotional and cultural nuances are appropriately addressed, ultimately improving patient outcomes.
๐ฎ Conclusion
This study highlights the significant role of AI in creating effective stuttering assessment and intervention programs. While GPT-4 demonstrates the ability to produce relevant materials, the importance of expert review cannot be overstated. As we move forward, embracing AI in clinical settings could lead to more personalized and effective treatment options for individuals who stutter. Continued research in this area is essential to refine these tools and enhance their applicability across different languages and cultures.
๐ฌ Your comments
What are your thoughts on the integration of AI in speech therapy? Do you believe it can enhance the quality of care for individuals who stutter? Let’s discuss! ๐ฌ Leave your thoughts in the comments below or connect with us on social media:
Content validity of AI-generated stuttering assessment and intervention programs based on expert review: A comparative analysis across age groups and language versions.
Abstract
PURPOSE: This study aimed to evaluate the content validity and inter-rater reliability of stuttering assessment and intervention programs generated by artificial intelligence (GPT-4) in both Turkish and English for preschool, school-age, and adult populations. It also examined whether linguistic or cultural differences affected expert evaluations.
METHODS: Twelve AI-generated programs (six in Turkish, six in English) were reviewed by twelve certified speech-language pathologists specializing in fluency disorders. Each item was rated using a 5-point Likert scale. Descriptive statistics, Cronbach’s Alpha, and Intraclass Correlation Coefficients (ICC) were calculated to assess consistency and reliability.
RESULTS: The majority of items were rated as appropriate or highly appropriate (M = 4.6-4.9). The overall reliability among raters was poor (ICC = 0.45), while single-rater reliability was higher (ICC = 0.65). Only a small number of items were flagged for revision, typically involving emotional or contextual components. Experts noted that English versions tended to be more detailed and literature-consistent, whereas certain Turkish terms required clearer cultural adaptation.
CONCLUSION: GPT-4 can produce clinically relevant and linguistically accurate stuttering materials when paired with expert review. However, human validation remains essential to refine affective and culture-specific elements. These findings support the integration of AI-assisted tools in multilingual clinical content development.
Author: [‘Koรงak AN’, ‘Arslan MB’]
Journal: J Fluency Disord
Citation: Koรงak AN and Arslan MB. Content validity of AI-generated stuttering assessment and intervention programs based on expert review: A comparative analysis across age groups and language versions. Content validity of AI-generated stuttering assessment and intervention programs based on expert review: A comparative analysis across age groups and language versions. 2025; 87:106186. doi: 10.1016/j.jfludis.2025.106186