⚡ Quick Summary
This study explored the use of large language models (LLMs) to detect depression from user-generated diary text data, demonstrating a promising approach in digital mental health screening. The findings revealed that GPT-3.5 fine-tuning achieved an impressive accuracy of 0.902 and a specificity of 0.955 in identifying depression.
🔍 Key Details
- 📊 Dataset: 428 diaries from 91 participants
- 🧩 Features used: User-generated diary text data
- ⚙️ Technology: Large language models (ChatGPT with GPT-3.5 and GPT-4)
- 🏆 Performance: GPT-3.5 fine-tuning: Accuracy 0.902, Specificity 0.955
🔑 Key Takeaways
- 📖 User-generated text data from diaries can serve as a valuable source for detecting depression.
- 🤖 Large language models like ChatGPT show potential in mental health screening.
- 🏆 Fine-tuning GPT-3.5 significantly improved performance metrics.
- 📈 Balanced accuracy was highest at 0.844 for GPT-3.5 without fine-tuning.
- 🔍 Recall rate for GPT-3.5 was an impressive 0.929.
- 🌱 Future research should focus on qualitative digital expressions alongside quantitative measures.
- 🧠 Early detection of depression can lead to timely interventions and better outcomes.
- 🌍 Study published in the Journal of Medical Internet Research.
📚 Background
Depressive disorders pose significant global challenges, impacting social dynamics and occupational productivity. Traditional screening tools, such as the Center for Epidemiologic Studies Depression Scale, often lack objectivity and accuracy. As a result, researchers are exploring innovative methods for early detection, including the analysis of user-generated text data from diaries, which can provide valuable insights into mental health.
🗒️ Study
This study aimed to validate the effectiveness of using an emotional diary writing app to detect depression through user-generated text. Participants were assessed for depression and suicide risk before and after a two-week diary writing period. The text data collected was analyzed using leading LLMs, specifically ChatGPT with GPT-3.5 and GPT-4, to evaluate their performance in recognizing depression.
📈 Results
The results indicated that GPT-3.5 fine-tuning significantly enhanced the model’s ability to detect depression, achieving an accuracy of 0.902 and a specificity of 0.955. Interestingly, the highest balanced accuracy of 0.844 was observed for GPT-3.5 without fine-tuning, showcasing its potential even in a less optimized state. The recall rate of 0.929 further emphasizes the model’s effectiveness in identifying at-risk individuals.
🌍 Impact and Implications
The findings from this study highlight the transformative potential of using LLMs in digital mental health screening. By leveraging user-generated text data, healthcare providers can enhance early detection of depression, leading to timely interventions. This approach not only complements traditional quantitative measures but also emphasizes the importance of qualitative digital expressions in understanding mental health.
🔮 Conclusion
This study underscores the significant potential of large language models in revolutionizing mental health screening. By utilizing user-generated diary text data, we can improve the accuracy and objectivity of depression detection. As we move forward, it is essential to continue exploring innovative methodologies that integrate both qualitative and quantitative data to enhance mental health outcomes.
💬 Your comments
What are your thoughts on the use of large language models for mental health screening? We would love to hear your insights! 💬 Share your comments below or connect with us on social media:
Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.
Abstract
BACKGROUND: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT.
OBJECTIVE: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source.
METHODS: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content.
RESULTS: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929.
CONCLUSIONS: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.
Author: [‘Shin D’, ‘Kim H’, ‘Lee S’, ‘Cho Y’, ‘Jung W’]
Journal: J Med Internet Res
Citation: Shin D, et al. Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study. Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study. 2024; 26:e54617. doi: 10.2196/54617