Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

0
Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
  • Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint at (2023).

  • Broderick, R. People are using AI for therapy, whether the tech is ready for it or not. Fast Company (2023).

  • Weizenbaum, J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 36–45 (1966).

    Article 

    Google Scholar 

  • Bantilan, N., Malgaroli, M., Ray, B. & Hull, T. D. Just in time crisis response: Suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 31, 289–299 (2021).

    Article 

    Google Scholar 

  • Peretz, G., Taylor, C. B., Ruzek, J. I., Jefroykin, S. & Sadeh-Sharvit, S. Machine learning model to predict assignment of therapy homework in behavioral treatments: Algorithm development and validation. JMIR Form. Res. 7, e45156 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tanana, M. J. et al. How do you feel? Using natural language processing to automatically rate emotion in psychotherapy. Behav. Res. Methods 53, 2069–2082 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Sharma, A., Lin, I. W., Miner, A. S., Atkins, D. C. & Althoff, T. Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nat. Mach. Intell. 5, 46–57 (2023).

    Article 

    Google Scholar 

  • Chen, Z., Flemotomos, N., Imel, Z. E., Atkins, D. C. & Narayanan, S. Leveraging open data and task augmentation to automated behavioral coding of psychotherapy conversations in low-resource scenarios. Preprint at (2022).

  • Shah, R. S. et al. Modeling motivational interviewing strategies on an online peer-to-peer counseling platform. Proc. ACM Hum.-Comput. Interact 6, 1–24 (2022).

    Article 

    Google Scholar 

  • Chan, W. W. et al. The challenges in designing a prevention chatbot for eating disorders: Observational study. JMIR Form. Res. 6, e28003 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Darcy, A. Why generative AI Is not yet ready for mental healthcare. Woebot Health (2023).

  • Abd-Alrazaq, A. A. et al. An overview of the features of chatbots in mental health: A scoping review. Int. J. Med. Inf. 132, 103978 (2019).

    Article 

    Google Scholar 

  • Lim, S. M., Shiau, C. W. C., Cheng, L. J. & Lau, Y. Chatbot-delivered psychotherapy for adults with depressive and anxiety symptoms: A systematic review and meta-regression. Behav. Ther. 53, 334–347 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Baumel, A., Muench, F., Edan, S. & Kane, J. M. Objective user engagement with mental health apps: Systematic search and panel-based usage analysis. J. Med. Internet Res. 21, e14567 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Torous, J., Nicholas, J., Larsen, M. E., Firth, J. & Christensen, H. Clinical review of user engagement with mental health smartphone apps: Evidence, theory and improvements. Evid. Based Ment. Health 21, 116–119 (2018b).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Das, A. et al. Conversational bots for psychotherapy: A study of generative transformer models using domain-specific dialogues. in Proceedings of the 21st Workshop on Biomedical Language Processing 285–297 (Association for Computational Linguistics, 2022). https://doi.org/10.18653/v1/2022.bionlp-1.27.

  • Liu, H. Towards automated psychotherapy via language modeling. Preprint at (2021).

  • Hamilton, J. Why generative AI (LLM) is ready for mental healthcare. LinkedIn (2023).

  • Shariff, A., Bonnefon, J.-F. & Rahwan, I. Psychological roadblocks to the adoption of self-driving vehicles. Nat. Hum. Behav. 1, 694–696 (2017).

    Article 
    PubMed 

    Google Scholar 

  • Markov, A. A. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (‘Example of a statistical investigation of the text of “Eugene Onegin” illustrating the dependence between samples in chain’). Izvistia Imperatorskoi Akad. Nauk Bull. L’Academie Imp. Sci. St-Petersbourg 7, 153–162 (1913).

    Google Scholar 

  • Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

    Article 
    MathSciNet 

    Google Scholar 

  • Baker, J. K. Stochastic modeling for automatic speech understanding. in Speech recognition: invited papers presented at the 1974 IEEE symposium (ed. Reddy, D. R.) (Academic Press, 1975).

  • Jelinek, F. Continuous speech recognition by statistical methods. Proc. IEEE 64, 532–556 (1976).

    Article 

    Google Scholar 

  • Jurafsky, D. & Martin, J. H. N-gram language models. in Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (Pearson Prentice Hall, 2009).

  • Vaswani, A. et al. Attention is all you need. 31st Conf. Neural Inf. Process. Syst. (2017).

  • Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at (2022).

  • Gao, L. et al. The Pile: An 800GB dataset of diverse text for language modeling. Preprint at (2020).

  • Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint at (2019).

  • Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Preprint at (2023).

  • Fairburn, C. G. & Patel, V. The impact of digital technology on psychological treatments and their dissemination. Behav. Res. Ther. 88, 19–25 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Fisher, A. J. et al. Open trial of a personalized modular treatment for mood and anxiety. Behav. Res. Ther. 116, 69–79 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Fan, X. et al. Utilization of self-diagnosis health chatbots in real-world settings: Case study. J. Med. Internet Res. 23, e19928 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Coghlan, S. et al. To chat or bot to chat: Ethical issues with using chatbots in mental health. Digit. Health 9, 1–11 (2023).

    Google Scholar 

  • Beatty, C., Malik, T., Meheli, S. & Sinha, C. Evaluating the therapeutic alliance with a free-text CBT conversational agent (Wysa): A mixed-methods study. Front. Digit. Health 4, 847991 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lin, B., Bouneffouf, D., Cecchi, G. & Varshney, K. R. Towards healthy AI: Large language models need therapists too. Preprint at (2023).

  • Weidinger, L. et al. Ethical and social risks of harm from language models. Preprint at (2021).

  • Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (ACM, 2021). https://doi.org/10.1145/3442188.3445922.

  • Chamberlain, J. The risk-based approach of the European Union’s proposed artificial intelligence regulation: Some comments from a tort law perspective. Eur. J. Risk Regul. 14, 1–13 (2023).

    Article 

    Google Scholar 

  • Norden, J. G. & Shah, N. R. What AI in health care can learn from the long road to autonomous vehicles. NEJM Catal. Innov. Care Deliv. (2022).

  • Sedlakova, J. & Trachsel, M. Conversational artificial intelligence in psychotherapy: A new therapeutic tool or agent? Am. J. Bioeth. 23, 4–13 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Gearing, R. E. et al. Major ingredients of fidelity: A review and scientific guide to improving quality of intervention research implementation. Clin. Psychol. Rev. 31, 79–88 (2011).

    Article 
    PubMed 

    Google Scholar 

  • Wiltsey Stirman, S. Implementing evidence-based mental-health treatments: Attending to training, fidelity, adaptation, and context. Curr. Dir. Psychol. Sci. 31, 436–442 (2022).

    Article 

    Google Scholar 

  • Waller, G. Evidence-based treatment and therapist drift. Behav. Res. Ther. 47, 119–127 (2009).

    Article 
    PubMed 

    Google Scholar 

  • Flemotomos, N. et al. “Am I a good therapist?” Automated evaluation of psychotherapy skills using speech and language technologies. CoRR, Abs, 2102 (10.3758) (2021).

  • Zhang, X. et al. You never know what you are going to get: Large-scale assessment of therapists’ supportive counseling skill use. Psychotherapy (2022).

  • Goldberg, S. B. et al. Machine learning and natural language processing in psychotherapy research: Alliance as example use case. J. Couns. Psychol. 67, 438–448 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wiltsey Stirman, S. et al. A novel approach to the assessment of fidelity to a cognitive behavioral therapy for PTSD using clinical worksheets: A proof of concept with cognitive processing therapy. Behav. Ther. 52, 656–672 (2021).

    Article 

    Google Scholar 

  • Raviola, G., Naslund, J. A., Smith, S. L. & Patel, V. Innovative models in mental health delivery systems: Task sharing care with non-specialist providers to close the mental health treatment gap. Curr. Psychiatry Rep. 21, 44 (2019).

    Article 
    PubMed 

    Google Scholar 

  • American Psychological Association. Guidelines for clinical supervision in health service psychology. Am. Psychol. 70, 33–46 (2015).

    Article 

    Google Scholar 

  • Cook, S. C., Schwartz, A. C. & Kaslow, N. J. Evidence-based psychotherapy: Advantages and challenges. Neurotherapeutics 14, 537–545 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Leichsenring, F., Steinert, C., Rabung, S. & Ioannidis, J. P. A. The efficacy of psychotherapies and pharmacotherapies for mental disorders in adults: An umbrella review and meta‐analytic evaluation of recent meta‐analyses. World Psych. 21, 133–145 (2022).

    Article 

    Google Scholar 

  • Cuijpers, P., van Straten, A., Andersson, G. & van Oppen, P. Psychotherapy for depression in adults: A meta-analysis of comparative outcome studies. J. Consult. Clin. Psychol. 76, 909–922 (2008).

    Article 
    PubMed 

    Google Scholar 

  • Morris, Z. S., Wooding, S. & Grant, J. The answer is 17 years, what is the question: Understanding time lags in translational research. J. R. Soc. Med. 104, 510–520 (2011).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chekroud, A. M. et al. The promise of machine learning in predicting treatment outcomes in psychiatry. World Psych. 20, 154–170 (2021).

    Article 

    Google Scholar 

  • Kazdin, A. E. Mediators and mechanisms of change in psychotherapy research. Annu. Rev. Clin. Psychol. 3, 1–27 (2007).

    Article 
    PubMed 

    Google Scholar 

  • Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I. & Atkinson, P. M. Explainable artificial intelligence: An analytical review. WIREs Data Min. Knowl. Discov. 11, (2021).

  • Kelley, T. L. Interpretation of Educational Measurements. (World Book, 1927).

  • van Bronswijk, S. C. et al. Precision medicine for long-term depression outcomes using the Personalized Advantage Index approach: Cognitive therapy or interpersonal psychotherapy? Psychol. Med. 51, 279–289 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Scala, J. J., Ganz, A. B. & Snyder, M. P. Precision medicine approaches to mental health care. Physiology 38, 82–98 (2023).

    Article 
    CAS 

    Google Scholar 

  • Chorpita, B. F., Daleiden, E. L. & Weisz, J. R. Identifying and selecting the common elements of evidence based interventions: A distillation and matching model. Ment. Health Serv. Res. 7, 5–20 (2005).

    Article 
    PubMed 

    Google Scholar 

  • Chambless, D. L. & Hollon, S. D. Defining empirically supported therapies. J. Consult. Clin. Psychol. 66, 7–18 (1998).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Tolin, D. F., McKay, D., Forman, E. M., Klonsky, E. D. & Thombs, B. D. Empirically supported treatment: Recommendations for a new model. Clin. Psychol. Sci. Pract. 22, 317–338 (2015).

    Google Scholar 

  • Lilienfeld, S. O. Psychological treatments that cause harm. Perspect. Psychol. Sci. 2, 53–70 (2007).

    Article 
    PubMed 

    Google Scholar 

  • Wasil, A. R., Venturo-Conerly, K. E., Shingleton, R. M. & Weisz, J. R. A review of popular smartphone apps for depression and anxiety: Assessing the inclusion of evidence-based content. Behav. Res. Ther. 123, 103498 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Torous, J. B. et al. A hierarchical framework for evaluation and informed decision making regarding smartphone apps for clinical care. Psychiatr. Serv. 69, 498–500 (2018).

    Article 
    PubMed 

    Google Scholar 

  • Gunasekar, S. et al. Textbooks are all you need. Preprint at (2023).

  • Wilhelm, E. et al. Measuring the burden of infodemics: Summary of the methods and results of the Fifth WHO Infodemic Management Conference. JMIR Infodemiology 3, e44207 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Creed, T. A. et al. Knowledge and attitudes toward an artificial intelligence-based fidelity measurement in community cognitive behavioral therapy supervision. Adm. Policy Ment. Health Ment. Health Serv. Res. 49, 343–356 (2022).

    Article 

    Google Scholar 

  • Aktan, M. E., Turhan, Z. & Dolu, İ. Attitudes and perspectives towards the preferences for artificial intelligence in psychotherapy. Comput. Hum. Behav. 133, 107273 (2022).

    Article 

    Google Scholar 

  • Prescott, J. & Hanley, T. Therapists’ attitudes towards the use of AI in therapeutic practice: considering the therapeutic alliance. Ment. Health Soc. Incl. 27, 177–185 (2023).

    Article 

    Google Scholar 

  • American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. (2013).

  • Yogatama, D., De Masson d’Autume, C. & Kong, L. Adaptive semiparametric language models. Trans. Assoc. Comput. Linguist 9, 362–373 (2021).

    Article 

    Google Scholar 

  • Stanley, B. & Brown, G. K. Safety planning intervention: A brief intervention to mitigate suicide risk. Cogn. Behav. Pract. 19, 256–264 (2012).

    Article 

    Google Scholar 

  • Behzadan, V., Munir, A. & Yampolskiy, R. V. A psychopathological approach to safety engineering in AI and AGI. Preprint at (2018).

  • Lambert, M. J. & Harmon, K. L. The merits of implementing routine outcome monitoring in clinical practice. Clin. Psychol. Sci. Pract. 25, (2018).

  • Kjell, O. N. E., Kjell, K. & Schwartz, H. A. AI-based large language models are ready to transform psychological health assessment. Preprint at (2023).

  • First, M. B., Williams, J. B. W., Karg, R. S. & Spitzer, R. L. SCID-5-CV: Structured Clinical Interview for DSM-5 Disorders: Clinician Version. (American Psychiatric Association Publishing, 2016).

  • Shah, D. S., Schwartz, H. A. & Hovy, D. Predictive biases in natural language processing models: A conceptual framework and overview. in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 5248–5264 (Association for Computational Linguistics, 2020). https://doi.org/10.18653/v1/2020.acl-main.468.

  • Adams, L. M. & Miller, A. B. Mechanisms of mental-health disparities among minoritized groups: How well are the top journals in clinical psychology representing this work? Clin. Psychol. Sci. 10, 387–416 (2022).

    Google Scholar 

  • Viswanath, H. & Zhang, T. FairPy: A toolkit for evaluation of social biases and their mitigation in large language models. Preprint at (2023).

  • von Zitzewitz, J., Boesch, P. M., Wolf, P. & Riener, R. Quantifying the human likeness of a humanoid robot. Int. J. Soc. Robot. 5, 263–276 (2013).

    Article 

    Google Scholar 

  • White House Office of Science and Technology Policy. Blueprint for an AI bill of rights. (2022).

  • Parry, G., Castonguay, L. G., Borkovec, T. D. & Wolf, A. W. Practice research networks and psychological services research in the UK and USA. in Developing and Delivering Practice-Based Evidence (eds. Barkham, M., Hardy, G. E. & Mellor-Clark, J.) 311–325 (Wiley-Blackwell, 2010). https://doi.org/10.1002/9780470687994.ch12.

  • Craske, M. G., Treanor, M., Conway, C. C., Zbozinek, T. & Vervliet, B. Maximizing exposure therapy: An inhibitory learning approach. Behav. Res. Ther. 58, 10–23 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Delgadillo, J. et al. Stratified care vs stepped care for depression: A cluster randomized clinical trial. JAMA Psychiatry 79, 101 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Furukawa, T. A. et al. Dismantling, optimising, and personalising internet cognitive behavioural therapy for depression: A systematic review and component network meta-analysis using individual participant data. Lancet Psychiatry 8, 500–511 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *