Research

Our lab focuses on digital phenotyping and digital interventions with mHealth data, predictors of health outcomes including social determinants with big data, identifying sub-groups of treatment responders using machine learning, clinical trials in mental and behavioral health, and multivariate methodology. 

Digital Phenotyping and Digital Interventions with mHealth Data

Recent advancements in mHealth technology, owing to the ubiquitous use of smartphones and wearable devices, have increased the potential of research studies to collect behavioral data on participants in their natural environment through active sensing (through frequent surveys via an app) and passive sensing (obtained through the sensors of the devices). The massive amount of longitudinal data collected on each individual is multi-modal, complex and have a high degree of missing data.

We are conducting several projects that seek to develop a statistical framework and produce open-source software to analyze such complex data. Specifically, we have developed a pre-processing algorithm (2SpamH) that identifies under-recorded passive data as “missing data” using a two-step K-nearest neighbor algorithm and imputes them with machine-learning approaches.

Using the pre-processed data, we have developed a functional data analysis framework to visually represent and analyze complex longitudinal patterns using flexible statistical models and used this methodology to predict behavioral activation from passively collected activity data (e.g., step counts). This set of analyses presents a digital phenotype of individuals participating in research studies that record their behavior through mHealth apps. The ability to utilize pre-processed passive data has also led the development of prediction models that utilizes a specialized branch of machine learning (semi-supervised learning) to predict adherence to psychotherapy among older adults with depression.

We have incorporated this algorithm in a just-in-time digital intervention to promote adherence to psychotherapy and currently proposing to test the intervention in a clinical trial that is part of our renewal proposal for our Weill Cornell ALACRITY Center. Some of this work is ongoing or in peer-review.

  • Zhang H, Lee J, Yu H, Wu Y, Carter E, Banerjee S (2023) 2SpamH: A two-stage algorithm for processing passively sensed mHealth data. Biometrics (submitted) 
  • Banerjee S, Wu Y, Zhang H, Soleman M, Choudhury T, Sirey JA, Kiosses DN, Marino P, Alexopoulos GS (2023) Predicting adherence to psychotherapy homework using actively and passively sensed mHealth data. 
  • Lee, J and Banerjee S. A functional data analysis framework for analyzing mHealth data. 
  • Solomonov N, Lee J, Zhang H, Benda N, and Banerjee S. Passive Sensing Activity Levels is Associated with Behavioral Activation in Psychotherapies for Late Life Depression. 
  • Lee J, Solomonov N, Banerjee S, Alexopoulos GS, Sirey JA. Use of Passive Sensing in Psychotherapy Studies in Late Life: A Pilot Example, Opportunities and Challenges. Front Psychiatry. 2021 Oct 28;12:732773. doi: 10.3389/fpsyt.2021.732773. PMID: 34777042; PMCID: PMC8580874. 

Identifying Sub-Groups of Treatment Responders using Machine Learning

Participants of clinical trials of psychotherapy and pharmacotherapy have heterogeneous response to treatment (some respond early, some late and some do not respond). To this end, we have identified sub-groups that have distinct trajectories of symptoms using latent growth curve modeling. We have then used machine learning algorithms to identify these trajectory sub-groups from baseline characteristics. These analyses have informed novel targets of interventions (predictors of sub-groups).

  • Alexopoulos GS, Raue PJ, Banerjee S, Mauer E, Marino P, Soliman M, Kanellopoulos D, Solomonov N, Adeagbo A, Sirey JA, Hull TD, Kiosses DN, Areán PA. Modifiable predictors of suicidal ideation during psychotherapy for late-life major depression. A machine learning approach. Transl Psychiatry. 2021 Oct 18;11(1):536. doi: 10.1038/s41398-021-01656-5. PMID: 34663787; PMCID: PMC8523563.
  • Solomonov N, Lee J, Banerjee S, Flückiger C, Kanellopoulos D, Gunning FM, Sirey JA, Liston C, Raue PJ, Hull TD, Areán PA, Alexopoulos GS. Modifiable predictors of nonresponse to psychotherapies for late-life depression with executive dysfunction: a machine learning approach. Mol Psychiatry. 2021 Sep;26(9):5190-5198. doi: 10.1038/s41380-020-0836-z. Epub 2020 Jul 10. PMID: 32651477; PMCID: PMC8120667.
  • Jackson DS, Banerjee S, Sirey JA, Pollari C, Solomonov N, Novitch R, Chalfin A, Wu Y, Alexopoulos GS. Two Interventions for PatientsWith Major Depression and Severe Chronic Obstructive Pulmonary Disease: Impact on Quality of Life. Am J Geriatr Psychiatry. 2019 May;27(5):502-511. doi: 10.1016/j.jagp.2018.12.004. Epub 2018 Dec 7. PMID: 30630702; PMCID: PMC6443466.
  • Alexopoulos GS, Sirey JA, Banerjee S, Jackson DS, Kiosses DN, Pollari C, Novitch RS, Artis A, Raue PJ. Two Interventions for Patients with Major Depression and Severe Chronic Obstructive Pulmonary Disease: Impact on Dyspnea-Related Disability. Am J Geriatr Psychiatry. 2018 Feb;26(2):162-171. doi: 10.1016/j.jagp.2017.10.002. Epub 2017 Oct 10. PMID: 29117913; PMCID: PMC5817020.

Predictors of Health Outcomes including Social Determinants with Big Data

We are conducting several studies that utilize “Big” real-world data (e.g., health insurance claims, electronic health records and registries) to develop predictive models on various health outcomes (e.g., healthcare utilization, severe outcomes in COVID+ patients).

Specifically, we have utilized the Health Care Cost Institute data on health insurance claims of over 50 million commercially insured individuals to study adverse mental health outcomes and preventable hospitalization among a cohort of depressed middle-aged and older adults. We have also utilized a COVID registry and the New York City-wide electronic health record repository (NYC-CDRN) to predict severe outcomes such as intubation or death among COVID+ patients. The salient feature of these predictive models is a method to harmonize longitudinal predictors (e.g., laboratory values, vital signs, diagnostic history) in a predictive model framework paying particular attention to missing data and how they can be modeled. We have also utilized the same COVID data to show that social deprivation index (SDI) plays an important role on who acquires COVID-19 and its severity; but once hospitalized, SDI appears less important.

  • Evans L, Wu Y, Kim M-h, Alexopoulos GS, Pathak J, Banerjee S (2023) A population-based risk stratification model for predicting preventable hospitalization: an observational study of commercially insured older adults with depression. BMC Health Services Research (accepted). 
  • Mauer E, Lee J, Choi J, Zhang H, Hoffman K, Easthausen I, Rajan M, Weiner M, Kaushal R, Safford M, Steel P, Banerjee S (2021) A predictive model of clinical deterioration among hospitalized COVID-19 patients by harnessing hospital course trajectories. J Biomed Inform. 2021 Apr 30;118:103794. doi: 10.1016/j.jbi.2021.103794. Epub ahead of print. PMID: 33933654; PMCID: PMC8084618. 
  • Goyal P, Schenck E, Wu Y, Zhang Y, Visaria A, Orlander D, Xi W, Díaz I, Morozyuk D, Weiner M, Kaushal R, Banerjee S (2023). Influence of social deprivation index on in-hospital outcomes of COVID-19. Sci Rep. 2023 Jan 31;13(1):1746. doi: 10.1038/s41598-023-28362-0. PMID: 36720999; PMCID: PMC9887560. 
  • Xi W, Banerjee S, Olfson M, Alexopoulos GS, Xiao Y, Pathak J. Effects of social deprivation on risk factors for suicidal ideation and suicide attempts in commercially insured US youth and adults. Sci Rep. 2023 Mar 13;13(1):4151. doi: 10.1038/s41598-023-31387-0. PMID: 36914764; PMCID: PMC10011396 
  • Xi W, Banerjee S, Penfold RB, Simon GE, Alexopoulos GS, Pathak J. Healthcare utilization among patients with psychiatric hospitalization admitted through the emergency department (ED): A claims-based study. Gen Hosp Psychiatry. 2020 Oct 7;67:92-99. doi: 10.1016/j.genhosppsych.2020.10.001. Epub ahead of print. PMID: 33068850 

Clinical Trials in Mental and Behavioral Health

We designed and analyzed several randomized trials (including cluster randomized trials) which studied various behavioral interventions, psychotherapies, drugs, and home care management interventions on older adults with depression, psychosis, and bipolar disorder.

We used various statistical models that include linear and generalized linear mixed-effects models, multi-level hierarchical models (for cluster randomized trials), and generalized estimating equations to analyze data from these trials. One salient statistical feature of such trials on older adults is a high degree of missing data. We have incorporated state-of-the-art statistical techniques, such as pattern mixture models and shared parameter analysis, to account for such issues.

We have also constructed models to evaluate moderators and mediators of treatment response. We have also applied advanced statistical techniques such as variable selection (e.g. LASSO, ElasticNet, etc.), multivariate methodology (e.g. cluster and factor analysis) and sub-group identification using latent class mixed models and latent growth curve models to increase the information yield from data generated in clinical trials by generating hypotheses of personalized treatment effects.

  • Alexopoulos GS, Raue PJ, Banerjee S, Marino P, Renn BN, Solomonov N, Adeagbo A, Sirey JA, Hull TD, Kiosses DN, Mauer E, Areán PA. Comparing the streamlined psychotherapy "Engage" with problem-solving therapy in late-life major depression. A randomized clinical trial. Mol Psychiatry. 2020 Jul 1;PubMed PMID: 32612251
  • Flint AJ, Meyers BS, Rothschild AJ, Whyte EM, Alexopoulos GS, Rudorfer MV, Marino P, Banerjee S, Pollari CD, Wu Y, Voineskos AN, Mulsant BH. Effect of Continuing Olanzapine vs Placebo on Relapse Among Patients With Psychotic Depression in Remission: The STOP-PD II Randomized Clinical Trial. JAMA. 2019 Aug 20;322(7):622-631. PubMed PMID: 31429896; PubMed Central PMCID: PMC6704758
  • Young RC, Mulsant BH, Sajatovic M, Gildengers AG, Gyulai L, Al Jurdi RK, Beyer J, Evans J, Banerjee S, Greenberg R, Marino P, Kunik ME, Chen P, Barrett M, Schulberg HC, Bruce ML, Reynolds CF, Alexopoulos GS. GERI-BD: A Randomized Double-Blind Controlled Trial of Lithium and Divalproex in the Treatment of Mania in Older Patients With Bipolar Disorder. Focus (Am Psychiatr Publ). 2019 Jul;17(3):314-321. PubMed PMID: 32015723; PubMed Central PMCID: PMC6996060.
  • Sirey JA, Banerjee S, Marino P, Bruce ML, Halkett A, Turnwald M, Chiang C, Liles B, Artis A, Blow F, Kales HC. Adherence to Depression Treatment in Primary Care: A Randomized Clinical Trial. JAMA Psychiatry. 2017 Nov 1;74(11):1129-1135. PubMed PMID: 28973066; PubMed Central PMCID: PMC5710215.

Multivariate Methodology

Research studies in medicine do not always analyze multiple correlated outcomes primarily due to the difficulty in interpretation and statistical complexity. Our team aims to understand the interplay between multiple correlated outcomes in determining treatment efficacy, mediating treatment effect, and discovering patient sub-groups.

The main step in analyzing multiple correlated outcomes is to model the covariance/correlation between these traits accurately. To do so, we have studied the estimation of the covariance matrix in higher dimensions and proposed an improved estimator which shows robust performance in a wide range of situations.

We have developed a Bayesian multivariate model in the context of quantitative trait loci (see contribution to statistical genetics) to detect genetic loci jointly affecting multiple correlated outcomes/traits. In the spirit of multivariate statistics, we have also developed methods for performing a multivariate meta-analysis of survival curves and applied them to distributed health network data.

In addition to our research in multivariate methodology, we have applied multivariate clustering and classification techniques (e.g., hierarchical clustering, linear discriminant analysis, factor analysis, etc.) to identify patient sub-groups in various applications (e.g., sub-groups based on their clinical profile) in mental health research.

  • Banerjee S, Monni S. An Orthogonally Equivariant Estimator of the Covariance Matrix in High Dimensions and for Small Sample Sizes. J Stat Plan Inference. 2021 Jul;213:16-32. doi: 10.1016/j.jspi.2020.10.006. Epub 2020 Nov 16. PMID: 33281277; PMCID: PMC7709931.
  • Banerjee S, Monni S, Wells MT. A regularized profile likelihood approach to covariance matrix estimation. Journal of Statistical Planning and Inference. 2016 Jun 28.
  • Banerjee S, Cafri G, Isaacs AJ, Graves S, Paxton E, Marinac-Dabic D, Sedrakyan A. A distributed health data network analysis of survival outcomes: the International Consortium of Orthopaedic Registries perspective. J Bone Joint Surg Am. 2014 Dec 17;96 Suppl 1:7-11. PubMed PMID: 25520413; PubMed Central PMCID: PMC4271424
  • Banerjee S, Yandell BS, Yi N. Bayesian quantitative trait loci mapping for multiple traits. Genetics. 2008 Aug;179(4):2275-89. PubMed PMID: 18689903; PubMed Central PMCID: PMC2516097

Weill Cornell Medicine Samprit Banerjee Lab 402 E 67th Street New York, NY 10065 Phone: 646-962-8014