Screening Targets for Diagnosis and Treatment of Cognitive Dysfunction in Stroke through Transcriptome Combined with Machine Learning

Research Article

J Bacteriol Mycol. 2023; 10(3): 1213.

Screening Targets for Diagnosis and Treatment of Cognitive Dysfunction in Stroke through Transcriptome Combined with Machine Learning

Zhibin Chen1*#; Junxiong Li2#; Yangbo Hou1#; Qiaoyan Zhu1*; Zhizhen Shi1*

¹Department of Neurology, Putuo Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China

²Department of Acupuncture, Huadong Hospital Affiliated to Fudan University, Shanghai 20040, China

*Corresponding author: Zhibin Chen Department of Neurology, Putuo Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China. Email: [email protected]

#These authors have been contributed equally to this author.

Received: October 25, 2023 Accepted: November 27, 2023 Published: December 04, 2023

Abstract

Post-Stroke Cognitive Impairment (PSCI) is one of the major complications after stroke. The evaluation of PSCI usually depends on neuropsychology tests, but the results of these tests are subjective and inaccurate. Need to find more objective indicators as identification markers of PSCI. In this study, we use machine learning to find biomakers of PSCI, and established regulatory networks at transcriptional level. Several gene such as ORC1, TOMM40L and SHISAL2A are identified biomakers, and several miRNA such as hsa-mir-130b-3p and hsa-mir-484 are interacted most tight with this biomakers genes. The results of this study help to better distinguish patients with PS and PSCI in clinical practice, and identifying relevant biomarker genes and miRNAs that can serve as potential therapeutic sites.

Keywords: PSCI; Machine learning

Introduction

Post-Stroke Cognitive Impairment (PSCI) is a clinical syndrome characterized by varying degrees of cognitive impairment that occur within 3 months after a stroke [1]. It encompasses different types of cognitive impairments resulting from stroke events, such as multiple infarctions, infarctions in critical areas, subcortical infarctions, and cerebral hemorrhages [2]. PSCI can also include clinical subtypes where cognitive impairment worsens in other neurodegenerative diseases following a stroke event.Previous study has reported that patients with post-stroke cognitive impairment exhibit an 8% mortality rate within 1.5 years [3]. However, the mortality rate significantly rises to 50% when the condition progresses to late-stage post-stroke dementia. Due to advancements in sequencing technology, gene sequencing has become extensively utilized in disease research. Analyzing gene expression profiling in patients' peripheral blood holds great significance for early disease detection [4]. The development of disease classifiers based on patient gene expression data using machine learning methods has gained substantial attention recently. Machine learning techniques have already found widespread application in the clinical diagnosis of cardiovascular diseases, such as coronary artery calcification scoring. The integration of key mRNAs and traditional diagnostic methods shows promise in enhancing the latter's accuracy [5]. In this study, we obtained gene expression data sets from stroke patients and post-stroke cognitive impairment patients in the Gene Expression Omnibus (GEO) database. We utilized the XG-Boost machine learning algorithm to identify distinguishing feature genes. Subsequently, the gene expression profiles were tested in the collected clinical samples. The identified feature genes in this study have potential applications in diagnosis and as biomarkers.

Materials and Methods

Data Sources

We used bioinformatics and experimental methods to explore the biological characteristics of sepsis. First, we used the GEO query package of the R software (version 4.1.0, http://rproject.org/) to download the sample source from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/ ) database. The reliable sepsis expression profile GSE186798 are all from Homo sapiens. GSE186798 is based on GPL23038 and GPL23159. This data set contains 60 brain tissue, including 30 sepsis and 30 healthy controls.

Gene Ontology and Functional Enrichment Analysis

We conducted Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis (www.genome.jp/kegg/) to identify the biological functions of the genes. Significant pathways with a P-value less than 0.05 were ultimately selected.

Immune Infiltration Analysis

We employed the CIBERSORT algorithm to examine the connection between genes associated with diagnosis and the expression of immune cell-related markers. In particular, we calculated the relative proportions of various immune cell types in the significant samples (P<0.05) from the GSE186798 dataset. This analysis provided us with the abundance of 22 immune cell types and allowed us to determine the correlation between the diagnosis-related genes and the content of each immune cell type using the Spearman correlation coefficient. Additionally, we conducted Pearson correlation analysis using the GSE186798 dataset to evaluate the correlation between immune test sites and the diagnosis-related genes.

Identification of Transcription Factors and miRNAs

In order to better comprehend the major variations at the transcriptional level and gain insights into the crucial regulatory molecules, we investigated the interaction networks between Differentially Expressed Genes (DEGs) and microRNAs (miRNAs), as well as the interaction networks between Transcription Factors (TFs) and DEGs. In our analysis, we employed the NetworkAnalyst platform to identify TFs from the JASPAR database that displayed significant topological relevance and had a tendency to bind to the common DEGs. To construct the DEG-miRNA network, we utilized the TarBase and miRTarBase databases to extract miRNAs that were associated with the common DEGs, with a particular focus on topological analysis.

Evaluation of Applicant Drugs

In this analysis, the Protein–Drug Interaction (PDI) and identified pharmacological molecules were predicted by using the common DEGs. The web portal od Enrichr and the Drug Signatures Database (DSigDB) were used to analyze the drug moleculars based on the DEGs . Enrichr (http://amp.pharm. mssm.edu/Enrichr) contains a large collection of diverse gene set libraries available for analysis and download, which can be used to explore gene-set enrichment across a genome-wide scale (39). DSigDB is a new gene set resource for gene set enrichment analysis, which related drugs/compounds and their target genes. The DSigDB database was accessed through Enrichr under the Diseases/Drugs function.

Results

PS and PSCI Has no Significant Different on Gene Expression Model

Principal Component Analysis (PCA), is a dimensionality reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set [6,7]. However, the disadvantage of PCA is that the data has not passed the Permutation test, Permutational Multivariate Analysis of Variance (PERMANOVA) uses the Distance matrix (such as Euclidean distance and Bray Curtis distance) to decompose the total variance, analyze the explanatory power of different grouping factors or different environmental factors on sample differences, and use Permutation test to analyze the statistical significance of each variable interpretation [1,2]. In this study, PCA and PERMANOVA were used to determine whether there was a difference in gene expression between PS and PSCI. From the figure, it can be seen that the sample distribution of PS and PSIC is uniform and there is no giant difference, and the P-value obtained by the PERMANOVA algorithm is 0.978, which indicates that the gene expression pattern bwtween PS and PSIC has no significant differences (Figure 1A).