โก Quick Summary
This study evaluated the ability of ChatGPT to answer frequently asked questions about stuttering, revealing that approximately 45.50% of its responses were perceived as human-like by certified speech and language pathologists (SLPs). The findings suggest that while ChatGPT can provide accurate and harmless information, it should not replace professional diagnosis or treatment.
๐ Key Details
- ๐ฅ Participants: Panel of five certified speech and language pathologists (SLPs)
- โ Questions: Eleven common questions about stuttering
- ๐ Evaluation Metrics: Accuracy, potential harm, professional alignment, and readability scores
- ๐ Readability Scores: Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), Dale-Chall Score (D-CS)
๐ Key Takeaways
- ๐ค AI Responses: 45.50% of ChatGPT’s answers were misidentified as human-generated.
- โ Accuracy: 83.60% of responses were deemed accurate by the SLP panel.
- ๐ก๏ธ Harmlessness: 63.60% of responses were rated as harmless.
- ๐ Impact: 38.20% of responses were considered to cause only minor to moderate harm.
- ๐ค Professional Consensus: 62% of responses aligned with prevailing views in the SLP community.
- ๐ Readability: Average FRES of 26.52 indicates readability for college graduates.
- ๐ Response Length: Average of 99.73 words and 6.80 sentences per response.
- ๐ Correlation: Strong correlations between SLP evaluations and readability metrics.
๐ Background
Stuttering is a complex speech disorder that affects individuals of all ages. With the rise of artificial intelligence, tools like ChatGPT are being explored for their potential to provide information and support to those seeking answers about stuttering. Understanding the quality and readability of AI-generated responses is crucial for ensuring that individuals receive accurate and helpful information.
๐๏ธ Study
This exploratory study involved a panel of five certified SLPs who assessed the responses generated by ChatGPT to eleven common questions about stuttering. The SLPs were blinded to the source of the answers and evaluated them based on accuracy, potential harm, and alignment with professional consensus. Additionally, various readability features were analyzed to determine how accessible the information was to the general public.
๐ Results
The results indicated that ChatGPT demonstrated a promising ability to generate responses that were perceived as human-like, with 45.50% of the answers incorrectly attributed to SLPs. Furthermore, 83.60% of the responses were accurate, and 63.60% were rated as harmless. The readability analysis revealed that the average Flesch Reading Ease Score was 26.52, suggesting that the content is suitable for college graduates.
๐ Impact and Implications
The findings of this study highlight the potential of AI tools like ChatGPT to provide valuable information about stuttering, especially for individuals who may have limited access to professional services. However, it is essential to emphasize that while these tools can be informative, they should not replace the expertise of qualified SLPs in diagnosis and treatment. As AI continues to evolve, its role in healthcare will likely expand, necessitating ongoing evaluation and ethical considerations.
๐ฎ Conclusion
This study underscores the promising capabilities of ChatGPT in addressing frequently asked questions about stuttering. While the AI-generated responses were largely accurate and perceived as human-like, it is crucial to remember that these tools are intended for educational purposes only. Continued research and development in this area can enhance the accessibility of information while ensuring that professional guidance remains paramount in treatment and diagnosis.
๐ฌ Your comments
What are your thoughts on the use of AI in providing information about stuttering? We would love to hear your insights! ๐ฌ Share your comments below or connect with us on social media:
Assessing the response quality and readability of ChatGPT in stuttering.
Abstract
OBJECTIVE: This study aimed to examine how frequently asked questions regarding stuttering were comprehended and answered by ChatGPT.
METHODS: In this exploratory study, eleven common questions about stuttering were asked in a single conversation with the GPT-4o mini. While being blind relative to the source of the answers (whether by AI or SLPs), a panel of five certified speech and language pathologists (SLPs) was requested to differentiate if responses were produced by the ChatGPT chatbot or provided by SLPs. Additionally, they were instructed to evaluate the responses based on several criteria, including the presence of inaccuracies, the potential for causing harm and the degree of harm that could result, and alignment with the prevailing consensus within the SLP community. All ChatGPT responses were also evaluated utilizing various readability features, including the Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), and Dale-Chall Score (D-CS), the number of words, number of sentences, words per sentence (WPS), characters per word (CPW), and the percentage of difficult words. Furthermore, Spearman’s rank correlation coefficient was employed to examine relationship between the evaluations conducted by the panel of certified SLPs and readability features.
RESULTS: A substantial proportion of the AI-generated responses (45.50โฏ%) were incorrectly identified by SLP panel as being written by other SLPs, indicating high perceived human-likeness (origin). Regarding content quality, 83.60โฏ% of the responses were found to be accurate (incorrectness), 63.60โฏ% were rated as harmless (harm), and 38.20โฏ% were considered to cause only minor to moderate impact (extent of harm). In terms of professional alignment, 62โฏ% of the responses reflected the prevailing views within the SLP community (consensus). The means ยฑ standard deviation of FRES, GFSL, and D-CS were 26.52โฏยฑโฏ13.94 (readable for college graduates), 18.17โฏยฑโฏ3.39 (readable for graduate students), and 9.90โฏยฑโฏ1.08 (readable for 13th to 15th grade [college]), respectively. Furthermore, each response contained an average of 99.73 words, 6.80 sentences, 17.44 WPS, 5.79 CPW, and 27.96โฏ% difficult words. The correlation coefficients ranged between significantly large negative value (rโฏ=โฏ-0.909, pโฏ<โฏ0.05) to very large positive value (rโฏ=โฏ0.918, pโฏ<โฏ0.05).
CONCLUSION: The results revealed that the emerging ChatGPT possesses a promising capability to provide appropriate responses to frequently asked questions in the field of stuttering, which is attested by the fact that panel of certified SLPs perceived about 45โฏ% of them to be generated by SLPs. However, given the increasing accessibility of AI tools, particularly among individuals with limited access to professional services, it is crucial to emphasize that such tools are intended solely for educational purposes and should not replace diagnosis or treatment by qualified SLPs.
Author: [‘Saeedi S’, ‘Bakhtiar M’]
Journal: J Fluency Disord
Citation: Saeedi S and Bakhtiar M. Assessing the response quality and readability of ChatGPT in stuttering. Assessing the response quality and readability of ChatGPT in stuttering. 2025; 85:106149. doi: 10.1016/j.jfludis.2025.106149