Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint at (2023).
Broderick, R. People are using AI for therapy, whether the tech is ready for it or not. Fast Company (2023).
Weizenbaum, J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 36–45 (1966).
Google Scholar
Bantilan, N., Malgaroli, M., Ray, B. & Hull, T. D. Just in time crisis response: Suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 31, 289–299 (2021).
Google Scholar
Peretz, G., Taylor, C. B., Ruzek, J. I., Jefroykin, S. & Sadeh-Sharvit, S. Machine learning model to predict assignment of therapy homework in behavioral treatments: Algorithm development and validation. JMIR Form. Res. 7, e45156 (2023).
Google Scholar
Tanana, M. J. et al. How do you feel? Using natural language processing to automatically rate emotion in psychotherapy. Behav. Res. Methods 53, 2069–2082 (2021).
Google Scholar
Sharma, A., Lin, I. W., Miner, A. S., Atkins, D. C. & Althoff, T. Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nat. Mach. Intell. 5, 46–57 (2023).
Google Scholar
Chen, Z., Flemotomos, N., Imel, Z. E., Atkins, D. C. & Narayanan, S. Leveraging open data and task augmentation to automated behavioral coding of psychotherapy conversations in low-resource scenarios. Preprint at (2022).
Shah, R. S. et al. Modeling motivational interviewing strategies on an online peer-to-peer counseling platform. Proc. ACM Hum.-Comput. Interact 6, 1–24 (2022).
Google Scholar
Chan, W. W. et al. The challenges in designing a prevention chatbot for eating disorders: Observational study. JMIR Form. Res. 6, e28003 (2022).
Google Scholar
Darcy, A. Why generative AI Is not yet ready for mental healthcare. Woebot Health (2023).
Abd-Alrazaq, A. A. et al. An overview of the features of chatbots in mental health: A scoping review. Int. J. Med. Inf. 132, 103978 (2019).
Google Scholar
Lim, S. M., Shiau, C. W. C., Cheng, L. J. & Lau, Y. Chatbot-delivered psychotherapy for adults with depressive and anxiety symptoms: A systematic review and meta-regression. Behav. Ther. 53, 334–347 (2022).
Google Scholar
Baumel, A., Muench, F., Edan, S. & Kane, J. M. Objective user engagement with mental health apps: Systematic search and panel-based usage analysis. J. Med. Internet Res. 21, e14567 (2019).
Google Scholar
Torous, J., Nicholas, J., Larsen, M. E., Firth, J. & Christensen, H. Clinical review of user engagement with mental health smartphone apps: Evidence, theory and improvements. Evid. Based Ment. Health 21, 116–119 (2018b).
Google Scholar
Das, A. et al. Conversational bots for psychotherapy: A study of generative transformer models using domain-specific dialogues. in Proceedings of the 21st Workshop on Biomedical Language Processing 285–297 (Association for Computational Linguistics, 2022). https://doi.org/10.18653/v1/2022.bionlp-1.27.
Liu, H. Towards automated psychotherapy via language modeling. Preprint at (2021).
Hamilton, J. Why generative AI (LLM) is ready for mental healthcare. LinkedIn (2023).
Shariff, A., Bonnefon, J.-F. & Rahwan, I. Psychological roadblocks to the adoption of self-driving vehicles. Nat. Hum. Behav. 1, 694–696 (2017).
Google Scholar
Markov, A. A. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (‘Example of a statistical investigation of the text of “Eugene Onegin” illustrating the dependence between samples in chain’). Izvistia Imperatorskoi Akad. Nauk Bull. L’Academie Imp. Sci. St-Petersbourg 7, 153–162 (1913).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Google Scholar
Baker, J. K. Stochastic modeling for automatic speech understanding. in Speech recognition: invited papers presented at the 1974 IEEE symposium (ed. Reddy, D. R.) (Academic Press, 1975).
Jelinek, F. Continuous speech recognition by statistical methods. Proc. IEEE 64, 532–556 (1976).
Google Scholar
Jurafsky, D. & Martin, J. H. N-gram language models. in Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (Pearson Prentice Hall, 2009).
Vaswani, A. et al. Attention is all you need. 31st Conf. Neural Inf. Process. Syst. (2017).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at (2022).
Gao, L. et al. The Pile: An 800GB dataset of diverse text for language modeling. Preprint at (2020).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint at (2019).
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Preprint at (2023).
Fairburn, C. G. & Patel, V. The impact of digital technology on psychological treatments and their dissemination. Behav. Res. Ther. 88, 19–25 (2017).
Google Scholar
Fisher, A. J. et al. Open trial of a personalized modular treatment for mood and anxiety. Behav. Res. Ther. 116, 69–79 (2019).
Google Scholar
Fan, X. et al. Utilization of self-diagnosis health chatbots in real-world settings: Case study. J. Med. Internet Res. 23, e19928 (2021).
Google Scholar
Coghlan, S. et al. To chat or bot to chat: Ethical issues with using chatbots in mental health. Digit. Health 9, 1–11 (2023).
Beatty, C., Malik, T., Meheli, S. & Sinha, C. Evaluating the therapeutic alliance with a free-text CBT conversational agent (Wysa): A mixed-methods study. Front. Digit. Health 4, 847991 (2022).
Google Scholar
Lin, B., Bouneffouf, D., Cecchi, G. & Varshney, K. R. Towards healthy AI: Large language models need therapists too. Preprint at (2023).
Weidinger, L. et al. Ethical and social risks of harm from language models. Preprint at (2021).
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (ACM, 2021). https://doi.org/10.1145/3442188.3445922.
Chamberlain, J. The risk-based approach of the European Union’s proposed artificial intelligence regulation: Some comments from a tort law perspective. Eur. J. Risk Regul. 14, 1–13 (2023).
Google Scholar
Norden, J. G. & Shah, N. R. What AI in health care can learn from the long road to autonomous vehicles. NEJM Catal. Innov. Care Deliv. (2022).
Sedlakova, J. & Trachsel, M. Conversational artificial intelligence in psychotherapy: A new therapeutic tool or agent? Am. J. Bioeth. 23, 4–13 (2023).
Google Scholar
Gearing, R. E. et al. Major ingredients of fidelity: A review and scientific guide to improving quality of intervention research implementation. Clin. Psychol. Rev. 31, 79–88 (2011).
Google Scholar
Wiltsey Stirman, S. Implementing evidence-based mental-health treatments: Attending to training, fidelity, adaptation, and context. Curr. Dir. Psychol. Sci. 31, 436–442 (2022).
Google Scholar
Waller, G. Evidence-based treatment and therapist drift. Behav. Res. Ther. 47, 119–127 (2009).
Google Scholar
Flemotomos, N. et al. “Am I a good therapist?” Automated evaluation of psychotherapy skills using speech and language technologies. CoRR, Abs, 2102 (10.3758) (2021).
Zhang, X. et al. You never know what you are going to get: Large-scale assessment of therapists’ supportive counseling skill use. Psychotherapy (2022).
Goldberg, S. B. et al. Machine learning and natural language processing in psychotherapy research: Alliance as example use case. J. Couns. Psychol. 67, 438–448 (2020).
Google Scholar
Wiltsey Stirman, S. et al. A novel approach to the assessment of fidelity to a cognitive behavioral therapy for PTSD using clinical worksheets: A proof of concept with cognitive processing therapy. Behav. Ther. 52, 656–672 (2021).
Google Scholar
Raviola, G., Naslund, J. A., Smith, S. L. & Patel, V. Innovative models in mental health delivery systems: Task sharing care with non-specialist providers to close the mental health treatment gap. Curr. Psychiatry Rep. 21, 44 (2019).
Google Scholar
American Psychological Association. Guidelines for clinical supervision in health service psychology. Am. Psychol. 70, 33–46 (2015).
Google Scholar
Cook, S. C., Schwartz, A. C. & Kaslow, N. J. Evidence-based psychotherapy: Advantages and challenges. Neurotherapeutics 14, 537–545 (2017).
Google Scholar
Leichsenring, F., Steinert, C., Rabung, S. & Ioannidis, J. P. A. The efficacy of psychotherapies and pharmacotherapies for mental disorders in adults: An umbrella review and meta‐analytic evaluation of recent meta‐analyses. World Psych. 21, 133–145 (2022).
Google Scholar
Cuijpers, P., van Straten, A., Andersson, G. & van Oppen, P. Psychotherapy for depression in adults: A meta-analysis of comparative outcome studies. J. Consult. Clin. Psychol. 76, 909–922 (2008).
Google Scholar
Morris, Z. S., Wooding, S. & Grant, J. The answer is 17 years, what is the question: Understanding time lags in translational research. J. R. Soc. Med. 104, 510–520 (2011).
Google Scholar
Chekroud, A. M. et al. The promise of machine learning in predicting treatment outcomes in psychiatry. World Psych. 20, 154–170 (2021).
Google Scholar
Kazdin, A. E. Mediators and mechanisms of change in psychotherapy research. Annu. Rev. Clin. Psychol. 3, 1–27 (2007).
Google Scholar
Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I. & Atkinson, P. M. Explainable artificial intelligence: An analytical review. WIREs Data Min. Knowl. Discov. 11, (2021).
Kelley, T. L. Interpretation of Educational Measurements. (World Book, 1927).
van Bronswijk, S. C. et al. Precision medicine for long-term depression outcomes using the Personalized Advantage Index approach: Cognitive therapy or interpersonal psychotherapy? Psychol. Med. 51, 279–289 (2021).
Google Scholar
Scala, J. J., Ganz, A. B. & Snyder, M. P. Precision medicine approaches to mental health care. Physiology 38, 82–98 (2023).
Google Scholar
Chorpita, B. F., Daleiden, E. L. & Weisz, J. R. Identifying and selecting the common elements of evidence based interventions: A distillation and matching model. Ment. Health Serv. Res. 7, 5–20 (2005).
Google Scholar
Chambless, D. L. & Hollon, S. D. Defining empirically supported therapies. J. Consult. Clin. Psychol. 66, 7–18 (1998).
Google Scholar
Tolin, D. F., McKay, D., Forman, E. M., Klonsky, E. D. & Thombs, B. D. Empirically supported treatment: Recommendations for a new model. Clin. Psychol. Sci. Pract. 22, 317–338 (2015).
Lilienfeld, S. O. Psychological treatments that cause harm. Perspect. Psychol. Sci. 2, 53–70 (2007).
Google Scholar
Wasil, A. R., Venturo-Conerly, K. E., Shingleton, R. M. & Weisz, J. R. A review of popular smartphone apps for depression and anxiety: Assessing the inclusion of evidence-based content. Behav. Res. Ther. 123, 103498 (2019).
Google Scholar
Torous, J. B. et al. A hierarchical framework for evaluation and informed decision making regarding smartphone apps for clinical care. Psychiatr. Serv. 69, 498–500 (2018).
Google Scholar
Gunasekar, S. et al. Textbooks are all you need. Preprint at (2023).
Wilhelm, E. et al. Measuring the burden of infodemics: Summary of the methods and results of the Fifth WHO Infodemic Management Conference. JMIR Infodemiology 3, e44207 (2023).
Google Scholar
Creed, T. A. et al. Knowledge and attitudes toward an artificial intelligence-based fidelity measurement in community cognitive behavioral therapy supervision. Adm. Policy Ment. Health Ment. Health Serv. Res. 49, 343–356 (2022).
Google Scholar
Aktan, M. E., Turhan, Z. & Dolu, İ. Attitudes and perspectives towards the preferences for artificial intelligence in psychotherapy. Comput. Hum. Behav. 133, 107273 (2022).
Google Scholar
Prescott, J. & Hanley, T. Therapists’ attitudes towards the use of AI in therapeutic practice: considering the therapeutic alliance. Ment. Health Soc. Incl. 27, 177–185 (2023).
Google Scholar
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. (2013).
Yogatama, D., De Masson d’Autume, C. & Kong, L. Adaptive semiparametric language models. Trans. Assoc. Comput. Linguist 9, 362–373 (2021).
Google Scholar
Stanley, B. & Brown, G. K. Safety planning intervention: A brief intervention to mitigate suicide risk. Cogn. Behav. Pract. 19, 256–264 (2012).
Google Scholar
Behzadan, V., Munir, A. & Yampolskiy, R. V. A psychopathological approach to safety engineering in AI and AGI. Preprint at (2018).
Lambert, M. J. & Harmon, K. L. The merits of implementing routine outcome monitoring in clinical practice. Clin. Psychol. Sci. Pract. 25, (2018).
Kjell, O. N. E., Kjell, K. & Schwartz, H. A. AI-based large language models are ready to transform psychological health assessment. Preprint at (2023).
First, M. B., Williams, J. B. W., Karg, R. S. & Spitzer, R. L. SCID-5-CV: Structured Clinical Interview for DSM-5 Disorders: Clinician Version. (American Psychiatric Association Publishing, 2016).
Shah, D. S., Schwartz, H. A. & Hovy, D. Predictive biases in natural language processing models: A conceptual framework and overview. in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 5248–5264 (Association for Computational Linguistics, 2020). https://doi.org/10.18653/v1/2020.acl-main.468.
Adams, L. M. & Miller, A. B. Mechanisms of mental-health disparities among minoritized groups: How well are the top journals in clinical psychology representing this work? Clin. Psychol. Sci. 10, 387–416 (2022).
Viswanath, H. & Zhang, T. FairPy: A toolkit for evaluation of social biases and their mitigation in large language models. Preprint at (2023).
von Zitzewitz, J., Boesch, P. M., Wolf, P. & Riener, R. Quantifying the human likeness of a humanoid robot. Int. J. Soc. Robot. 5, 263–276 (2013).
Google Scholar
White House Office of Science and Technology Policy. Blueprint for an AI bill of rights. (2022).
Parry, G., Castonguay, L. G., Borkovec, T. D. & Wolf, A. W. Practice research networks and psychological services research in the UK and USA. in Developing and Delivering Practice-Based Evidence (eds. Barkham, M., Hardy, G. E. & Mellor-Clark, J.) 311–325 (Wiley-Blackwell, 2010). https://doi.org/10.1002/9780470687994.ch12.
Craske, M. G., Treanor, M., Conway, C. C., Zbozinek, T. & Vervliet, B. Maximizing exposure therapy: An inhibitory learning approach. Behav. Res. Ther. 58, 10–23 (2014).
Google Scholar
Delgadillo, J. et al. Stratified care vs stepped care for depression: A cluster randomized clinical trial. JAMA Psychiatry 79, 101 (2022).
Google Scholar
Furukawa, T. A. et al. Dismantling, optimising, and personalising internet cognitive behavioural therapy for depression: A systematic review and component network meta-analysis using individual participant data. Lancet Psychiatry 8, 500–511 (2021).
Google Scholar
link