Identifying Gene Signatures Associated with Cancer Stem Cells and Drug Resistance from Triple Negative Breast Cancer Cells after Gene Targeting Treatment

Research Article

Austin J Biomed Eng. 2014;1(4): 1018.

Identifying Gene Signatures Associated with Cancer Stem Cells and Drug Resistance from Triple Negative Breast Cancer Cells after Gene Targeting Treatment

Paul Jang, Tim Holleran, Michael Burns, Kenan Gebizlioglu, Alex Governale, Yi Lisa Lyu and Li Cai*

Department of Biomedical Engineering, Rutgers University, USA

*Corresponding author: :Li Cai, Department of Biomedical Engineering, Rutgers University, 599 Taylor, Road, Piscataway, NJ 08854, USA.

Received: July 04, 2014; Accepted: Aug 05, 2014; Published: Aug 07, 2014


Over 90% of breast cancer death is due to metastatic disease; however, the metastatic behavior of aggressive breast cancer is still not well understood. Accumulating evidence support the idea that initiation, maintenance and metastasis of tumors are through cancer stem cells (or tumor initiating cells). In this study, we focused on the representative MDA-MB-231 cells of the highly aggressive triple negative breast cancer subtype. We obtained and re-analyzed three previously published datasets on MDA-MB-231 cells after treatment targeting the expression ofGATA3, Pin1, or LSD1. Two distinct computational algorithms (dChip and GEO2R) were employed to cross-compare the resulting gene expression profiles. We identified a common gene signature consists of eight genes among the three datasets. Three of the eight genes, i.e., ABCC3, AGR2, and PTGES, were highly correlated with the properties of breast cancer stem cells and drug resistance. Thus, they are predicted as potential genemarkers of breast cancer stem cells and may serve as novel therapeutic targets to combat poor prognostic breast cancers.

Keywords: Genome-wide gene expression; Triple negative breast cancer; MDA-MB-231; Cancer stem cells; Gene signature; Computational analysis


One in eight women are diagnosed with breast cancer in the United States, and more than 90% of deaths by breast cancer is attributed to metastatic diseases [1,2]. There are a variety of breast cancer subtypes that vary based on molecular markers such as human epidermal growth factor receptor 2 (HER2), estrogen receptors (ER), and progesterone receptors (PR) [3,4]. The triple-negative breast cancer (TNBC) poses the greatest threat as they are negative for all three of the above markers, making them extremely difficult to target for treatment. In addition, these triple-negative breast cancers are highly aggressive and associated with poor prognosis [5].

Recent studies have strongly supported that metastasis is due to the initiation and maintenance of cancer stem cells in tumors [6,7]. Like normal stem cells, cancer stem cells have the ability to self-renew and differentiate [6-8]. Thus, identifying underlying gene signatures of a representative TNBC cell line such as MDA-MB-231 could indicate markers of those cancer stem cells. These gene signatures then possess potential as therapeutic targets.

Many public databases have accessible datasets for gene expression studies of various cancers. For instance, the NCBI Gene Expression Omnibus (GEO) has 49,161datasets from breast cancer studies (as of August 5, 2014) and is gaining new data at 300% per year. The availability of genome-wide gene expression datasets offers cost-effective secondary opportunities to investigate additional research questions that were not included in the original intended purpose. However, most of the existing approaches for large-scale analyses are heuristic or lacking clear definitions of assumptions for the methods. Furthermore, although many of these studies haveaddressed the problematic metastatic behavior of breast cancer, they mainly focused on the gene expression changes after targeting a singlegene [9-11]. Since breast cancer is known to contain heterogeneous cell population, single gene targeting may not be effective [12,13].

In this study, we applied a systematic approach to conduct a secondary analysis of the public gene expression data and integrate breast cancer genomic studies. We attempt to identify common gene signatures that persist after various treatments via single gene targeting on a representative TNBC cell lineMDA-MB-231. We integrated the results of these treatments that effectively inhibited aggressive cancer behavior and compared the gene signatures among them. A common gene signature persistent across these various treatments could indicate resistance to treatments, which is a characteristic of cancer stem cells. Thus, our study facilitates the reuse of the vast amount of public datasets to answer additional questions, reduce the necessity to generate new data, and improve our understanding of cellular functions and networks under a variety of perturbations with breast cancer cells.


Triple-negative breast cancer cells: MDA-MB-231

Three datasets generated from a single cell line(MDA-MB-231 cells) were selected for the following considerations: 1) MDA-MB-231 cell line is a representative of triple-negative cell line and commonly used in studies of metastatic breast cancer and breast cancer stem cells; 2) The MDA-MB-231 cells have also been shown to express many crucial biological and molecular features of basal triple negative breast cancer [14]; 3) the common use of MDA-MB-231 cells in research allowed for a more varied pool of treatments for analysis; and 4) the design of the study is to identify common genes after different gene targeting. Since different cancer cells possess their own cellular and molecular properties, there is no evidence that a common gene signature exist after the same gene targeting. In addition, the study using a single cell line ensures the confidence in the interpretation of results and avoids complications from variety different cells.

Gene chip platform

To ensure the most comprehensive results in comparing across studies, it was important to choose studies that used the same or closely related platform of genome-wide gene expression analysis. Two platforms were selected for their relative common use effectively allowing the greatest variety for treatment selection and reliability of genome-wide expression. The two platforms chosen were Affymetrix Human Genome U133 plus 2.0 (GPL570) and U133A 2.0 (GPL571). Though it would have been ideal to choose only one of these platforms, the advantage of expanding the pool of treatments outweighed the possible loss of data due to the considerable similarities between the two platforms as described by the Affymetrix HG-U133A 2.0/HGU133 2.0 Plus Technical Note.


The selection of datasets was based on the efficacy/reliability of the study and the use of at least three biological replicates.

Dataset 1: GATA3 overexpression study (GSE24249)

GATA3 is one of the most frequently mutated genes in breast cancer [15]; it plays a critical role in luminal cell differentiation during mammary gland development [16,17]. The study by Chu et al. overexpressed the GATA3 transcription factor within the MDAMB- 231 cells via transduction with a lentivirus [9]. Overexpressing GATA3 within MDA-MB-231 cells suppressed the expression of various metastasis-related genes such as colony-stimulating factor-1 (CSF-1) via repression of the lysyl oxidase (LOX) expression [9]. LOX is a matrix protein that promotes metastasis by effecting change in cell proliferation, cross-linking of extracellular collagen types, andformation of a metastatic niche [9].

Dataset 2: PIN1 suppression study (GSE26262)

Pin1 is a key regulator downstream of miR-200c that promotes breast cancer stem cells and breast tumorigenicity [18,19]. The study of Girardini et al. showed that the influence of Pin1 on mutant p53 dependent promotion of cancer aggressiveness [10]. A study by Soussi and Wiman showed that the relation to human cancer and p53 mutation [20]. In addition, several other studies have suggested the cell migration and metastasis promoting abilities of mutant p53 [21-24]. Studies have shown that Pin1, a prolyl isomerase, promote both Her2/Neu/Ras and Notch1 dependent changes of breast cells [25,26]. Pin1 inhibits the antimetastatic factor p63 via a mutant p53- dependent mechanism and stimulates a mutant p53 transcriptional program to increase aggressiveness [10].

Dataset 3: LSD1 suppression study (GSE30775)

LSD1 is a component of the Mi-2/nucleosome remodeling and deacetylase (NuRD) complex [27]; itis critically involved in the mechanism of de-methylating lysine4 of histone H3 and lysine 9 of histone H3 [28,29]. Studies showed that growth inhibition of breast cancer cells upon pharmacological LSD1 inhibition, e.g., siRNA knockdown of LSD1, promoted expression of proliferation-associated genes like p21, ERBB2 and CCNA2 [28]. In aggressive cancer cell lines, the presence of LSD1 was associated with the suppression of proinflammatory cytokine expression such as IL1α, IL1β, IL6, and IL8 as well as the regulation of tumorigenesis [30].

Software for data analysis

Two computational programs with distinct algorithms were used in this study, i.e., DDNA-Chip Analyzer (dChip) [31] and GEO2R [32,33]. dChip is a model-based approach allows probe-level analysis on multiple arrays [31]. By pooling information across multiple arrays, it is possible to assess standard errors for the expression indexes. This approach also allows automatic probe selection in the analysis stage to reduce errors due to cross-hybridizing probes and image contamination. High-level analysis in dChip includes comparative analysis and hierarchical clustering [31]. GEO2R uses linear models and empirical Bayes methods for assessing differential expression in microarray experiments [33]. A significantly differentially expressed gene is defined by the following criteria: 1) gene expression fold change ≥2.0 (for up-regulated genes) or ≤ 0.5(for down-regulated genes); 2) absolute intensity difference value ≥100; and 3) p-value ≤0.05.


Three datasets from genome-wide gene expression studies on MDA-MB-231 cells were selected and downloaded from the NCBI website (Table 1). The dataset GSE24249 contains 3 control samples and 3 experimental samples with GATA3 overexpression [9]. The 6 samples from dataset GSE26262 were included in this analysis containing 3 control samples (siCtl) and 3experimental samples with Pin1 knockdown (siPin1) [10]. The dataset GSE30775 contains 3 control samples and 3 LSD1 knockdown samples (siRNA-LSD1) [28].