Prognostic Modeling of Chronic Kidney Disease Progression: Bridging Mild and Severe Stages through a Machine Learning Approach

Special Article: Case Reports in Hypertension

Austin J Clin Case Rep. 2023; 10(9): 1310.

Prognostic Modeling of Chronic Kidney Disease Progression: Bridging Mild and Severe Stages through a Machine Learning Approach

Karamo Bah1*; Amadou Wurry Jallow2; Adama Ns Bah3

¹Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taiwan

²Department of Medical Laboratory Science and Biotechnology, Taipei Medical University, Taiwan

³Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taiwan

*Corresponding author: Karamo Bah Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, Taiwan. Email: kamasbah@gmail.com

Received: October 31, 2023 Accepted: November 30, 2023 Published: December 07, 2023

Abstract

Background and Aim: Chronic Kidney Disease (CKD) is a condition where the kidneys gradually lose their ability to function properly over time. It is into stages based on the severity of kidney damage and the level of kidney function. The objective of our study is to employ machine learning models for the prediction of Chronic Kidney Disease (CKD) progression.

Methods: Our study is centered on the prediction of CKD progression from mild (I, II, III) to advanced stages (IV, V, VI). We utilized logistic regression with a lasso-penalized approach and random forest model for our predictive analysis. We assessed the significance of features using the Gini index derived from the random forest model. The performance of our models was evaluated based on the Area Under Receiver Operating Characteristic (AU-ROC), AU-Precision-Recall (PR) curves, recall, precision and accuracy.

Results: Our study showcases remarkable predictive performance of CKD progression from milder (I, II, III) to severe stages (IV, V, VI). Random forest model achieved an accuracy of 85%, a recall rate of 86%, a precision rate of 83%, an AU-ROC score of 92%, and an AU-PR score of 83%. The logistic regression model exhibited an accuracy of 84%, a recall rate of 84%, a precision rate of 85%, an AU-ROC score of 92%, and an AU-PR score of 81%. Regarding variable importance, our model identifies creatinine as the most critical feature, followed by eGFR.

Conclusion: Our findings indicate that machine learning models hold promise in predicting CKD progression with substantial discriminative capabilities, as evidenced by high AUROC curves. This suggests their potential utility in real-world clinical settings for identifying patients at risk of transitioning from mild to severe stages of CKD.

Keywords: Chronic Kidney Disease; Machine Learning; Logistic Regression; Random Forest; Classification Model

Introduction

In 2013, the toll of Chronic Kidney Disease (CKD) claimed the lives of approximately one million individuals [1]. This burden disproportionately afflicts the developing world, where low to middle-income nations bear the weight of 387.5 million CKD cases, comprising 177.4 million male patients and 210.1 million female patients [2]. These statistics underscore the pervasive nature of CKD within developing regions, and the prevalence continues to surge.Chronic Kidney Disease (CKD) stands as a significant medical issue affecting numerous individuals worldwide. This condition entails the gradual deterioration of kidney function, leading to a reduced capacity to efficiently filter waste and excess fluids from the bloodstream, a process vital for urine production [3]. The term "chronic" is applied due to the slow, often extended, progression of this damage. CKD's global impact underscores its status as a pressing concern in healthcare, touching the lives of people across the globe. CKD represents a widespread and severe medical condition [4], characterized by a gradual decline in kidney function, a process that typically unfolds over months to years [3]. One distinguishing aspect of CKD is its silent nature, with symptoms often remaining latent until the disease reaches advanced stages [5]. One distinguishing aspect of CKD is its silent nature, with symptoms often remaining latent until the disease reaches advanced stages.

In recent years, the adoption of Electronic Health Records (EHRs) has witnessed significant growth within healthcare systems [6]. This wealth of electronic health data has ushered in unprecedented opportunities for computational methodologies. These approaches not only serve to enhance our existing understanding of various medical conditions but also enable the development of predictive models for assessing patient risk. For instance, conditions like breast cancer [7] and myocardial infarction [8] have already witnessed successful modelling through the application of machine learning algorithms. Machine learning, a subfield of artificial intelligence, is dedicated to crafting algorithms that can discern patterns or relationships within a set of variables [9]. These algorithms are adept at predicting the value or outcome of an unknown variable based on the information gleaned from historical data. In the realm of healthcare, machine learning models can be effectively harnessed to forecast a patient's susceptibility to a particular disease by analyzing the wealth of information housed within their health records. Furthermore, the output of machine learning algorithms isn't merely a black box; it often provides insights that can be manually scrutinized. This examination aids in deciphering which specific variables play pivotal roles in indicating diverse patient outcomes. Extensive efforts have been dedicated to the early detection of CKD, to initiate treatment in its nascent stages.

The objective of your study is to employ machine learning algorithms, lasso penalized logistic regression [10] and random forests [11] in the context of prediction and risk factor analysis for Chronic Kidney Disease (CKD) progression. Specifically, the focus is on forecasting the transition of CKD from its milder stages (I, II, III) to advanced, severe stages (IV, V, VI). The study seeks to enhance our comprehension of this progression phenomenon across diverse disease stages. The potential ramifications of achieving this goal include the advancement of early intervention strategies and improvements in patient care within the context of CKD management. Through this empirical exploration, we anticipate unravelling deeper insights into the mechanisms governing CKD progression. These insights, in turn, have the potential to equip medical practitioners with tools to refine risk assessment, enabling more timely interventions and tailored patient care strategies. This study aspires to contribute substantively to the enhancement of clinical decision-making, ultimately leading to improved patient outcomes. This model will uncover the salient variables exerting the most significant influence on the transition process.

Related Studies

Chronic Kidney Disease (CKD) is a pervasive and serious global health issue that poses a significant burden on healthcare systems. The condition is characterized by a gradual decline in kidney function over time, with five stages ranging from mild to severe. As CKD advances, it can lead to complications like cardiovascular disease and End-Stage Renal Disease (ESRD), necessitating dialysis or kidney transplantation.

Leveraging machine learning and data mining methods, researchers have embarked on a diverse range of studies aimed at extracting valuable insights from datasets related to Chronic Kidney Disease (CKD) [12]. The adoption of machine learning serves a twofold purpose: to streamline the analytical process, reduce time requirements, and enhance prediction accuracy through data mining categorization techniques [13]. Furthermore, the application of machine learning extends to the realms of disease diagnosis and treatment, encompassing a spectrum of medical conditions. Employing data-gathering techniques, a multitude of endeavors have been undertaken to extract valuable insights from CKD datasets. Numerous studies have been done using machine learning.

Bemando et al. delved into an exploration of the intricate relationship between blood-related diseases and their distinctive characteristics. Employing a range of classifier methods including Gaussian Naive Bayes, Bernoulli Naive Bayes, and Random Forest, these researchers brought forth compelling insights. Notably, in their investigation, Naive Bayes exhibited remarkable accuracy, surpassing other algorithms [14]. In a distinct avenue of medical research, Kumar and Polepaka crafted an innovative approach to predict illnesses. Their arsenal included powerful tools like Random Forest and Convolutional Neural Networks (CNN), alongside other machine learning methodologies. These algorithms demonstrated notable prowess in classifying illness datasets, delivering precision, recall, and F1-score metrics of excellence. Intriguingly, Random Forest stood out, showcasing superior accuracy and statistical performance [15]. The pursuit of enhanced statistical analysis outcomes led Acharya et al. to navigate the landscape of medical-linked illness datasets. Employing a multifaceted approach that included Convolutional Neural Networks (CNN) and an array of machine learning algorithms, they ventured into the realm of ECG datasets. Here, they achieved a commendable classification accuracy rate of 94% [16]. In the domain of medical illness prediction, Desai et al. devised a sophisticated methodology. The author harnessed the capabilities of both back-propagation Neural Networks (NN) and Logistic Regression (LR) classification algorithms. These strategic choices yielded distinctive outcomes, with a comprehensive statistical analysis concluding that logistic regression outperformed other algorithms in terms of accuracy and predictive capabilities [17]. Patil et al. undertook the creation of a comprehensive database dedicated to ECG arrhythmia-related medical conditions. Within this endeavour, the researchers harnessed the potential of machine learning approaches, including Support Vector Machine (SVM) and the ingenious Cuckoo Search-Optimized Neural Network. The results were impressive, with the support vector machine yielding an enhanced accuracy rate of 94.44% [18].

Methods

Data Source

In this retrospective study, we conducted a comprehensive analysis using data sourced from the Medical Information Mart for Intensive Care (MIMIC) repositories. These repositories house a vast collection of de-identified health-related information about critically ill patients admitted to the Beth Israel Deaconess Medical Center, a leading tertiary medical institution located in Boston, USA [19].

The dataset at our disposal encompasses a diverse range of variables, including demographic details, vital signs, laboratory results, prescription records, and clinical notes. These data sources offer invaluable insights into the profiles of critically ill patients.

For this investigation, we focused specifically on the latest iteration of the MIMIC databases, namely MIMIC-III v1.4. This clinical database spans a timeframe from 2001 to 2012, incorporating data recorded through two distinct systems: MetaVision (iMDSoft, Wakefield, MA, USA) and CareVue (Philips Healthcare, Cambridge, MA, USA). It's noteworthy that the initial Philips CareVue system, which archived data from 2001 to 2008, was subsequently succeeded by the more advanced MetaVision data management system. The MetaVision system continues to be actively employed for data management and analysis to this day.

Patients Population

The patients in this study were selected based on their ICD-9 codes, which are a standardized way of categorizing medical conditions and diagnoses. In the context of Chronic Kidney Disease (CKD), the ICD-9 codes used in this study represented different stages of the disease. Among the patient cohort, 674 individuals exhibited mild stages (I, II, III) of CKD, indicative of milder manifestations of this condition. Furthermore, a group of 1,286 patients received diagnoses reflecting severe stages (IV, V, VI) of CKD. The distribution of patients within these distinct categories is detailed in Table 1.