Predicting Schizophrenia via Cognitive Biomarkers: Development of a Machine Learning Algorithm for Early Diagnosis

Research Article

J Schizophr Res. 2025; 12(1): 1049.

Predicting Schizophrenia via Cognitive Biomarkers: Development of a Machine Learning Algorithm for Early Diagnosis

Lyakhov ML*

Great Neck North High School, USA

*Corresponding author: Michael Lyakhov, Great Neck North High School, USA Email: mlyakhov1@student.gn.k12.ny.us

Received: March 31, 2025 Accepted: April 16, 2025 Published: April 21, 2025

Abstract

Schizophrenia (SCZ) is a chronic psychiatric disorder affecting over 20 million people worldwide, with early diagnosis hindered by inconsistent biomarkers and subtle symptom onset. This study presents a machine-learning pipeline to identify cognitive biomarkers from neuroimaging data for early and reliable SCZ diagnosis. Resting-state and Multi-Source Interference Task (MSIT) fMRI datasets were analyzed, highlighting key regions such as the anterior cingulate and dorsolateral prefrontal cortex. A novel approach integrated MSITderived features—linked to psychosis treatment progression—into resting-state data from prodromal patients. This enabled the prediction of psychosis onset through functional connectivity and machine learning. The pipeline automates preprocessing, feature extraction, and classification, and is available as an opensource Python library, promoting reproducibility and scalability. This research underscores the potential of cognitive biomarkers in early SCZ detection and offers a robust framework for broader psychiatric applications. Future work will explore multi-modal data integration to improve diagnostic precision across

Introduction

Over twenty million individuals worldwide have schizophrenia (SCZ), a severe and long-lasting mental illness. However, only 30% of cases improve because 50% of patients refuse treatment, and another 20% are unresponsive to drugs [1]. Those with SCZ typically experience a plethora of symptoms, especially as the disease progresses, including hallucinations, delusions, and disorganized speech, which can significantly reduce quality of life and impair daily function [1]. SCZ signs are clustered into broad categories of positive and negative symptoms.

Positive Symptoms

The active stage of SCZ, also known as psychosis, is when positive symptoms, those that result in an excess or distortion of behaviors [2], are most prevalent, with patients experiencing hallucinations, delusions, thought disorder, and hyperactivity [3].

Hallucinations and Delusions: Hallucinations are sensory fabrications in which a perceived sensory input accompanies false perceptions of reality, while delusions are false perceptions of reality that persist despite contrary evidence [3]. Antipsychotic medications are effective in treating these symptoms by blocking the dopamine-2 (D2) receptor, therefore easing hyperactive dopamine transmission and handling delusional symptoms [4]. Side effects of antipsychotics yield complications ranging from relatively unpleasant (e.g., constipation, sexual dysfunction) to disfiguring (e.g., weight gain, tardive dyskinesia) to life-threatening (e.g., myocarditis, agranulocytosis) as described by patients [5]. These effects may repel patients from continuous use of medications despite doctor recommendations.

That is, the more obvious, substantial symptoms of SCZ—thus the ones that impel most patients to get treatment—are hallucinations and delusions, both of which can be treated with a 75% success rate [4]. Delusions, however, can quickly become imperceptible to patients, as this symptom also entails one of the most dangerous developments associated with SCZ—anosognosia.

Anosognosia: Anosognosia is defined as a condition in which one is in denial of an apparent disability or deficit, frequently resulting in a patient’s refusal of treatment [3]. In many patients with SCZ, delusions develop into a fervent denial of treatment before professional help can intervene [3].

Negative Symptoms

The negative symptoms of SCZ, which include cognitive impairment, disorganized thought, and social withdrawal, are prevalent throughout the illness’s entire progression [2]. These symptoms are also seen as the first, preluding symptoms of SCZ.

Cognitive Impairment: The most prominent negative symptom observed is cognitive impairment, which includes executive function, attention, and working memory problems. Cognitive impairment is typically quantified through the measurement of a patient’s performance on a series of brain tasks. The MATRICS™ Consensus Cognitive Battery (MCCB™) Evaluation is a collection of 10 diverse examinations that is used as the industry standard in determining a change in cognitive function and targets common cognitive domains associated with SCZ [6]. Cognitive Remediation Therapy (CRT) is a treatment strategy for enhancing cognitive abilities in individuals with SCZ. However, it is expensive and grows to be less effective as SCZ progresses [2]. Moreover, CRT is especially unsuccessful for anosognosic SCZ patients, as they are typically confrontational and skeptical of treatment [7].

Thought Disorder: Thought disorder (TD), more colloquially known as disorganized thinking, can be especially damaging because it impairs the ability of the patient to self-reflect, thus exacerbating anosognosia [8]. Moreover, even if the patient is not enduring a psychotic episode, TD significantly detriments daily life by rendering patients unable to speak coherently and reason [9]. Further, as shown in Figure 1, neurological anosognosia symptoms are similar to those of neurocognitive disorders like TD.

Social Withdrawal: Many individuals with SCZ remove themselves from social interactions, which is coupled with an overall lack of motivation. Social withdrawal is a particularly troubling symptom, as it diminishes both the likelihood and quality of treatment participation [10,11]. As individuals retreat further from social interactions, they often lose the opportunity for supportive relationships that might encourage trust in medical interventions. Various psychosocial and cognitive-based interventions like CRT may be effective in addressing certain symptoms of SCZ [12]. Individuals exhibiting social withdrawal often demonstrate high dropout rates and low treatment engagement. [12].

Early Diagnosis

The premature diagnosis of SCZ can alleviate and prevent the development of the disorder by avoiding the transition into a state of psychosis. If the disorder was diagnosed prematurely, there would be a smaller likelihood of patients developing anosognosia, and therefore, the chance of a full recovery would dramatically increase [13].

Chemical Biomarkers

Biomarkers are specific metrics that serve as indications of disease development. Chemical biomarkers, such as neurotransmitter levels, inflammatory markers, and specific proteins found in cerebrospinal fluid or blood, have long been investigated for their potential to aid in the prognosis of SCZ [14]. Such biomarkers aim to offer objective measures that reflect underlying biological processes associated with the disorder, potentially enabling early detection and more targeted interventions. However, they have proven largely ineffective as reliable diagnostic tools because they are either not specific enough, resulting in a mistaken indication of the disorder, or they are too specific and only apply to statistically insignificant fragments in a dataset [15]. Moreover, the etiology of SCZ has not yet been identified, which further limits the specificity of biomarkers, as studies tend to identify clusters of interdependent biomarkers rather than exclusive ones [15]. Chemical biomarkers often display significant variability between individuals with SCZ. Factors like genetic background, environmental influences, lifestyle, and even concurrent medications can affect biomarker levels, leading to inconsistent results across patient populations [16]. This variability complicates efforts to establish standardized biomarker thresholds that reliably indicate SCZ, as opposed to individual variability or external factors. For example, a study by Saha et al. [17] found that neurotransmitter levels can fluctuate based on the stage of illness, current symptoms, or even the time of day. Such instability limits their utility for consistent diagnosis, as levels may not correlate directly with the patient’s current clinical state.

Problem Statements

1. Variability and Limitations of Biomarkers for SCZ Diagnosis.

Biomarkers cannot currently yield accurate prognoses for SCZ because of the multifaceted nature of the disease's etiology.

2. Treatment Resistance and the Impact of Anosognosia.

A major barrier to effective treatment in SCZ is anosognosia. This phenomenon is particularly prevalent in individuals who suffer from the second stage of SCZ, which involves delusions/hallucinations and can be avoided if the illness is treated before psychosis ensues. Compounded by cognitive impairment and social withdrawal, anosognosia prevents patients from seeking or adhering to necessary interventions despite existing treatments, which exacerbates the disease’s progression. As such, a preliminary diagnostic tool is needed to most effectively treat SCZ.

Objectives

1. This study seeks to develop a prognosis algorithm based on a comprehensive assessment of the negative symptoms of SCZ that can identify cognitive biomarkers.

2. The prognosis algorithm will also use fMRI data to predict whether an individual will develop schizophrenia--more specifically, psychosis--, thereby increasing early intervention rates in SCZ and preventing treatment refusal.

Methodology

Role of Student vs. Mentor

I conducted all of the work for this project independently, including the development of the machine learning (ML) model and the execution of the data analysis framework. My mentor, Mrs. York, provided minimal edits to my paper, but the conceptualization, implementation, and refinement of the ML model and the adaptive application were entirely self-directed.

Participants and Data Sources

This study utilized two datasets from Zucker Hillside Hospital containing functional magnetic resonance imaging (fMRI) scans of participants completing the MSIT task before and after 12 weeks of treatment and resting-state fMRI for patients in the prodromal stage of SCZ. These datasets provided the trajectory of cognitive improvement with treatment in disidentified patients with SCZ, which was later emulated by an ML algorithm to identify biomarkers. Participants in the datasets were diagnosed with SCZ according to DSM-5 criteria. fMRI scans were included only if they demonstrated minimal motion artifacts, defined as less than 3 mm of translational movement and 3° of rotational deviation. The datasets consisted of a diverse participant pool with significant age and ethnicity ranges to maximize the model's generalizability.

Equipment and Software

The study utilized a Windows-11-based workstation with an Intel Xeon W-2255 processor, 64 GB RAM, and a 2 TB solid-state drive for computational tasks. A VPN was established to the dataset’s network via SSH X11 forwarding with PuTTy and Xming. All processing, ML model development, and figure creation were conducted using MATLAB with the SPM12 toolbox and the FMRIB Software Library.

Data Preprocessing

To ensure data uniformity and analytical accuracy, preprocessing was conducted in sequential steps for both neuroimaging and cognitive datasets.

fMRI Preprocessing

The functional connectivity fMRI data underwent preprocessing using MATLAB's Statistical Parametric Mapping (SPM12) toolbox. This included spatial realignment and normalization to the Montreal Neurological Institute (MNI) space to standardize spatial orientation across participants.

Gaussian kernel smoothing was then applied with a full width at half maximum (FWHM) of 8 mm to improve signal-to-noise ratio and highlight regional activity patterns. Eliminating this obscurity in the scan made the statistical analysis more accurate and informative. Outlier scans exceeding the predefined motion thresholds were excluded to preserve data quality.

Figure 2 illustrates a detailed fMRI preprocessing workflow designed to standardize the data and minimize artifacts for accurate neuroimaging analysis. The process began with data stored in NIfTI format, with a repetition time (TR) of 0.75 seconds, ensuring compatibility with analysis tools and capturing high temporal resolution for precise neural dynamics. Motion correction was performed to realign fMRI volumes to the mean image, with a strict tolerance of ±2 mm for translational movements and ±2° for rotational deviations to exclude scans with excessive motion and preserve voxellevel alignment; this restriction was more strict than the dataset’s to ensure quality scans and reduce overfitting. Slice timing correction was applied to account for interleaved slice acquisition, using the middle slice as a reference point to synchronize signal timing across the brain. Co-registration aligned structural T1-weighted images with functional data, enabling precise anatomical localization of neural activity and setting the stage for accurate spatial normalization. Functional images were then normalized to the MNI152 template with a voxel size of 2 mm isotropic and a bounding box of [-90, -126, -72; 90, 90, 108] to ensure uniform alignment with a standard brain space. Denoising was performed by regressing out motion parameters and physiological confounds, with global intensity scaled to a mean of 100 for consistency across scans. Temporal filtering using a bandpass range of 0.01–0.1 Hz was applied to retain relevant neural signals while removing low-frequency drifts and noise. Finally, smoothing was conducted using a Gaussian kernel with an 8 mm full-width at half maximum (FWHM) to enhance the signal-to-noise ratio and improve the detection of regionally significant activation patterns. This preprocessing pipeline ensured high-quality, artifact-free data suitable for downstream statistical and machine-learning analyses.