Assessment of the Risk Factors of MDD Recurrence Based on Deep Learning Approaches

Research Article

J Psychiatry Mental Disord. 2021; 6(3): 1044.

Assessment of the Risk Factors of MDD Recurrence Based on Deep Learning Approaches

Wu Q1,2, Zhao W2, Yang X3, Tan H2, You L2, Xu H2, Zhou Y2 and Zhou X2,4,5*

1School of Public Health, Xi’an Jiaotong University Health Science Center, China

2Center for Computational Systems Medicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA

3The First Affiliated Hospital of Xi’an Jiaotong University Health Science Center, Xi’an, China

4Department of Integrative Biology and Pharmacology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX, USA

5School of Dentistry, University of Texas Health Science Center at Houston, Houston, TX, USA

*Corresponding author: Xiaobo Zhou, Center for Computational Systems Medicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA

Received: June 09, 2021; Accepted: July 09, 2021; Published: July 16, 2021

Abstract

Objective: Explore the risk factors related to the recurrence of MDD and provide a basis for the prevention and control of MDD.

Methods: Patients with MDD were extracted from two large, multi-center clinical datasets. The inpatients and outpatients between January 2000 and December 2015 were collected. Eligible patients were 18-90 years-old and had a diagnosis of MDD. The MDD were identified based on the MDD-related ICD-9- CM diagnosis codes; and MDD-related ICD-10-CM diagnosis codes. Eventually, 140,497 patients were qualified for further analysis, including 69.2% female patients. Among of 140,497, 20, 078 patients (14.3%) had no comorbidities. Logistic regression, SVM, and LSTM were employed to predict the key risk factors associated with MDD recurrence.

Results: The MDD patients with married /life partners had a lower prevalence rate (9.2%) of MDD recurrence than the patients with single marital status (11.8%). The primary MDD patients had a higher MDD recurrent rate (11.7%) than secondary MDD patients (10.5%). Primary MDD was associated with MDD recurrence (OR 2.49, 95% CI 1.53-3.96) via logistic regression analysis. Insomnia, anxiety and single marital status were also top-ranked risk factors for the MDD recurrence. The prediction accuracy of logistic regression, SVM and LSTM were 0.736, 0.791 and 0.834, respectively.

Conclusions: Building statistical models by mining existing EHR data can explore the risk factors associated with MDD recurrence. Our results indicated that primary MDD, never married, anxiety symptoms, and insomnia were risk factors for MDD recurrence. The prediction accuracy of the LSTM model was higher than the other two approaches.

Keywords: MDD; Prognosis; EHR; Data mining; LSTM

Abbreviations

CI: Confidence Interval; EHR: Electronic Health Records; ICD- 9-CM: International Classification Of Diseases, 9th Revision-Clinical Modification; ICD-10-CM: International Classification of Diseases, Tenth Revision, Clinical Modification; LSTM: Long Short-Term Memory; MDD: Major Depressive Disorder; OR: Odds Ratio; RNNs: Recurrent Neural Networks; SVM: Support Vector Machine

Introduction

Major Depressive Disorder (MDD) is one of the most common medical illnesses worldwide with a lifelong prevalence up to 16% [1] and a leading cause of disability worldwide. MDD is characterized by a long-lasting depressed mood or marked loss of interest/pleasure in all or nearly all activities [2]. There has been a gradual increase in the prevalence of MDD [3]. MDD is highly associated with poor mental health and socio-economic status [4]. MDD can impact mood and behavior as well as various physical functions, thereby reducing patients’ quality of life [5]. MDD recurrent nature is also one of the most crippling and devastating aspects of depression [6,7]. Every recurrence also carries a 10-20% risk of becoming unremitting and chronic, along with a heightened risk for suicide, both of which can lead to serious comorbidities and lethal consequences associated with depression. One of the most important challenges in the management of MDD is to prevent depression relapse. Individuals with a first depressive episode have a 40% to 60% chance experiencing a subsequent episode. Individuals suffering from 2 episodes have an approximate 60% recurrence probability [7]. Therefore, accurate prediction prognosis of MDD is important to prevent MDD recurrence that leads to disability.

However, the prediction of prognosis of MDD is limited by small sample size, budget, and physicians’ experience and so on. On the other hand, the electronic health records (EHR) have been collected in the clinic, but appear to have not been fully utilized in [8] various clinical studies. EHR for patients with longitudinal health information is a valuable source for exploring to diagnose diseases and assist clinical decision-making. However, it is quite challenging to mine EHRs efficiently. First, EHR data is heterogeneous and contains various types of features. Then, the data is essentially sparse and biased, due to the patient’s irregular visits, lack of certain tests and missing values. Recent studies have taken advantage of EHRs for predictive modeling tasks in early prediction of chronic disease [9] and monitoring disease progression [10]. How to deal with the heterogeneity and sparsity of EHR data and reasonably explain the predicted results are key problems to be solved by modeling. Regression models and Support Vector Machine (SVM) have been applied to predict the progression of the patient’s health status previously. However, these models do not provide a comprehensive analysis of the long-term of diagnostic information, which may lead to miss the severe symptoms of the past. The machine learning models such as SVM and random Forest (FR) are used to deal with the complex interactions between predictive factors, but lack interpretability for understanding disease etiology. In addition, since disease progression is a complex and dynamic process, understanding the etiology of a disease requires repeated clinical measurements over time rather than relying only upon a baseline profile. Therefore, the time series models, such as recurrent neural networks, appear to be more suitable for analyzing and understanding such data. Recent much works [10,11] suggest that deep learning can significantly improve prediction performance. To deal with the temporality of multivariate sequences, dynamically modeling the sequential data is necessary. The Recurrent Neural Network (RNN) is a class of artificial neural network where connections between nodes form a directed data along a sequence. This allows exhibiting temporal dynamic behavior for a time series. Therefore, RNN is often used for times series prediction, such as Long Short-Term Memory (LSTM). Taking advantage of the capability of RNN in memorizing historical records, several RNN-based models have been used to derive accurate and robust representations of patient visits [12].

In this study, we determined the prognostic risk factors for the patients with MDD, and predicted the recurrent MDD. In addition, the accuracy of predict model was evaluated.

Methods

Study design

We adopted a retrospective study to analyze the risk factor of MDD prognosis. The cases were patients with recurrent MDD and the patients with a single episode of MDD were used as controls (Figure 1).