In-silico Approach to Map Transcription Factor Binding Motifs onto Drosophila Cardiac Genes

Research Article

Austin J Biotechnol Bioeng. 2014;1(1): 8.

In-silico Approach to Map Transcription Factor Binding Motifs onto Drosophila Cardiac Genes

Jain Prerna1 and Hasija Yasha1*

1Department of Biotechnology, Delhi Technological University, India

*Corresponding author: Mahendra Rai K, Department of Biotechnology, Sant Gadge Baba Amravati University, Amravati- 444602 Maharashtra, India.

Received: June 30, 2014; Accepted: July 25, 2014; Published: July 28, 2014

Abstract

The development of multi cellular organisms requires a consortium of different types of cells that interact to form a functional organism. Each cell has a unique genetic profile that is regulated in a specific manner according to different developmental stages of an organism. This control is orchestrated by regulatory sequences called enhancers which are gene regulatory sequences that dictate the spatio-temporal patterns of gene expression by controlling transcriptional activities. A common feature of the regulatory enhancers is the presence of multiple binding sites known as Transcription Factor Binding Sites (TFBS), which binds to multiple transcription factors. A molecular understanding of enhancers and various transcription factors that bind to these is necessary for determining complex biological networks. The binding sites within the enhancers are conserved in nature, thus finding out those sites can help in uncovering various interaction mechanisms. In the present study, we try to address this issue by computational approach to predict TFBS within the set of enhancers in Drosophila melanogaster heart organ. We collected all the known enhancers that are active in cardiac mesoderm. The motifs were identified and functionally characterized by comparing with the database of known motifs. Putative motifs were mapped onto our dataset of enhancer sequences. We believe that these mapped enhancer sequences can be used to predict various de novo enhancers in the entire Drosophila melanogaster genome using machine learning techniques. Thus these findings helps to discover mechanisms currently unknown and may be important in gene regulation.

Keywords: Transcription factor binding sites, Enhancers, Drosophila melanogaster, Cis regulatory module

Abbreviations

18w: 18 wheeler; Act 57b: Actin 57b; Atet: ABC Transporter Expressed in Trache; Bib: Big Brain; CRM: Cis Regulatory Module; Hh: Hedgehog; Kb: kilobase; Lea: Leak; MAST: Motif Alignment and Scan Tool; MEME: Multiple Em For Motif Elicitation; Mef2: Myocyte Enhancer Factor; Nkd: Naked cuticle; RC: Reporter Construct; Slp: Sloppy paired 1; Sur: Schmalspur; Tin: Tinman; Tl: Toll; TF: Transcription Factor; TFBS: Transcription Factor Binding Site

Introduction

Gene expression and control

Embryonic development is a tightly controlled process that ultimately leads to the development of a multi cellular organism comprising of complex tissues and organs. The development of different organs largely relies on the differentiation process, cell specification, cellular identity, as well as responses to environmental cues [1]. The precise spatio-temporal control of gene expression is the main driving force to the proper restriction of cell fates and for insuring the accuracy of cellular differentiation. This results in time-dependent and tissue-specific regulatory outputs, which are critical in regulating different stages of embryonic development. The knowledge of these transcriptional activation states at the right stage and time depends on several factors including the position of the gene in the genome, its chromatin structure and the transcriptional regulatory elements associated with each gene. These transcriptional regulatory elements play a major role in regulating gene expression and further decide the cell fate of the various cells in the developmental process [1,2].

During transcription, various transcription factors (TF) are involved, that bind to DNA in a specific sequence manner. The TFs bind to sequences called as transcription factor binding sites (TFBS) in the regulatory regions of the gene called enhancers which are organized in the form of modules, called as Cis-Regulatory Module (CRM). CRM sare regulatory sequences located few kilo bases away from gene of interest and bind to specific TFs at specific developmental stage to result in specific cell specification [3]. Overall, gene expression is regulated by the combination of all CRMs acting on genes throughout the organism's life. Previous studies have shown that gene encoding Transcription Factor tinman, has 4 CRMs controlling its expression which is a consequence of genetic pleiotropy. Thus, there exists as many as 10-fold more CRMs than genes [4].

Interaction between the TFs and CRMs form a development transcriptional regulatory network, encoding the specification and differentiation programmes of various cell types that are expressed at a particular stage in the development and finally lead to a full grown organism.

The prediction of these regulatory motifs, TFBS form an essential link in comparative genomics. These sequences are evolutionary conserved, and eventually we can find out the orthologous of these genes in higher and complex organism which help in understanding molecular mechanisms. But some of the hurdles to predictions are: these modules are located far away from the genes they regulate. Next, the presence of multiple transcription factor binding sites for various TFs leads to combinatorial control of gene regulation, thus making it difficult to associate with one gene [5,6]. The traditional approach to prediction involves the use of whole genome and techniques such as chromatin immune precipitation (ChIP) and ChIP-Seq to test many sequence fragments for regulatory activity in a reporter gene assay. These assays are highly intensive and these cannot assay all the tissues under all conditions [7]. Thus computational tools have been used to predict all the modules and binding sites effectively for example, one strategy is to scan whole genome in search of certain sequence based signatures which can be TFBS or specific histone modification based signatures. The data and the signatures are curated from published literature. These predictions can be uncertain, thus further experimental validations are always necessary [5-7].

Overview of Drosophila melanogaster Heart Development

Drosophila melanogaster as a model organism

Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans, thus an ideal system for developing and evaluating comparative genomics methodologies [8]. The annotated genome sequence of D. melanogaster, together with its associated biology, helps in unravelling various cellular and metabolic mechanisms. The first organ to be formed during embryogenesis is heart and is necessary to circulate blood systemically and support the progression of organogenesis. The early events in Drosophila heart development have been studied in detail. Previous studies have provided information about various factors involved in the early determination and differentiation of the cardiac mesoderm. Some of the factors such as Tin are evolutionary conserved and their homologues have been identified [9].

Studies have shown that large proportion of the diversity of living organisms results from differential regulation of gene transcription. Transcriptional regulation between species differs due to changes in interaction of TFs with enhancer sequences. These changes are very important criterion to specify which gene is expressed and at which stage. These mechanisms by which protein: DNA interactions evolve are therefore an important question in evolutionary biology [10,13]. Present work involves identifying and analyzing the genes that are specifically expressed in drosophila heart cells.

Cardiac specific genes and transcription factors

Cardio genesis proceeds via the activation of a complex regulatory network of cardiac structural genes. Significant progress has been made in defining the genes which contribute to heart development, in previous studies [11]. Figure 1 represents participation of various genes in the development of cardiac, visceral, skeletal muscles from dorsal mesoderm.