Computational Structure Analysis and Function Prediction of an Uncharacterized Protein (I6U7D0) of Pyrococcus furiosus COM1

Research Article

Austin J Comput Biol Bioinform. 2014;1(2): 5.

Computational Structure Analysis and Function Prediction of an Uncharacterized Protein (I6U7D0) of Pyrococcus furiosus COM1

Oany AR*, Ahmad SAI, Siddikey MAA, Hossain MU and Ferdoushi A

Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Bangladesh

*Corresponding author: Oany AR, Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh-1902, Tangail, Bangladesh

Received: October 24, 2014; Accepted: December 18, 2014; Published: December 22, 2014

Abstract

Pyrococcus furiosus is a hyperthermophilic Archaea. An uncharacterized protein of this Achaea, I6U7D0 (UniProt accession) containing 349 residues was selected for in silico analysis. Various bioinformatic tools were used to predict the structure and function of this protein. Sequence similarity was searched through UniProt and non-redundant database using BLASTp program of NCBI and homology was found with methyltransferases. Multiple sequence alignment was used to locate the conserved residues. The secondary and three dimensional structures were predicted. The validation of the three dimensional structure was obtained through PROCHECK, Verify3D and ERRAT program. CASTp server was used to predict the active site of the protein. Molecular docking with the ligand ACY (Acetic Acid) was performed using Molegro Virtual Docker to visualize the interactions between the ligand and amino acid residues in the protein. Finally, all the accumulated results suggested the biological function of the target protein to be a methyltransferase.

Keywords: Sequence alignments; Molecular docking; Protein-Ligand interactions; Active site

Introduction

Pyrococcus furiosus is a hyperthermophilic Archaea. It is considered a model organism to study the hyperthermophilic extremophiles, mostly due to its rapid growth at 100oC and the already sequenced genome [1]. Studying P. furiosus, as well as other extremophiles might possess a lot of potentials because of its unique genomic and physiological features. Its thermostable enzymes and other unique proteins might be used in various applications.

With the advancement in sequencing technologies, it is now considerably easier to obtain the whole genome sequence of such single cell organisms. Still, there are protein sequences with functions yet to be discovered or experimentally confirmed. We presume these uncharacterized proteins a vast unexplored field with numerous opportunities, both as medical and industrial tools. In silico analysis might assist in determining the biological functions of such uncharacterized proteins. This can partly be facilitated by predicting the three-dimensional (3D) structure of the targeted protein. When the experimentally obtained structure is unavailable, comparative or homology modelling can sometimes provide a useful 3D model for the protein of interest that is related to at least one known protein structure. Comparative or homology modelling predicts the 3D structure of a given protein sequence based primarily on its alignment to one or more proteins of known structure. Over the past few decades the number of sequences in the comprehensive public sequence databases, such as Swiss-Prot/TrEMBL [2] and GenPept, [3] have increased to a greater extent compared to the number of experimentally determined structures deposited in the Protein Data Bank (PDB) [4]. As a result, a gap has formed between the number of known sequences and confirmed functions. The in silico prediction methods for 3D structure and biological function of proteins might assist in reducing this gap [5].

Prediction methods are based on fold assignment, target-template alignment, model building and model evaluation [6]. However, in silico predicted 3D structures are confirmed only by experimental methods such as X-ray crystallography and NMR spectroscopy [7]. Homology-based gene annotation has been the standard method for allocating a function to a novel uncharacterized protein during the last decades. With the development of new algorithms and bioinformatic tools, now a day various other methods can complement the classical homology search. These methods are designed to detect presumed functional constraints on genome evolution that is known as 'genomic context' approaches [8]. Some recent study has also followed the above analysis in order to propose the function of protein, which exists in protein level [9]. In this study, an attempt has been made to predict the structure and biological function of an uncharacterized protein (I6U7D0) using various bioinformatic tools.

Materials and Methods

Sequence retrieval

Initially we searched the UniProtKB protein (www.uniprot.org/) [10] and UniProt entryI6U7D0 of Achaea, pyrococcus furiosus consist of 349 amino acid residues was selected by targeted selection. Then the sequence was stored as a FASTA format sequence.

Physico-chemical properties analysis

The ProtParam (https://web.expasy.org/protparam/) [11] tool of ExPASy was used for the analysis of the proteins physiological and chemical properties deduced from our protein sequence. The properties including aliphatic index, GRAVY (Grand Average of Hydropathy), Extinction coefficients, isoelectric point (pI), molecular weight etc. were analyzed through this tool.

Homology identification

To get the preliminary prediction about the function of the targeted protein, similarity search was performed with the NCBI protein Database (https://www.ncbi.nlm.nih.gov/) against non-redundant and SwissProt [12] database to find out the proteins that might have structural similarities with that of the uncharacterized protein by using BLASTp program [13].

Structure prediction

The retrieved sequence was used for the prediction of the Secondary structure of the protein by SABLE server (https://sable. cchmc.org/) [14] and the tertiary structure was predicted by (PS)2v2 server (https://ps2v2.life.nctu.edu.tw/) of the Molecular Bioinformatics Center, National Chiao Tung University [15]. The three dimensional structure was predicted one the basis of best scoring template for higher accuracy.

Model quality assessment

Finally the quality of the predicted three dimensional structure was assessed by PROCHECK [16], Verify3D (https://nihserver.mbi. ucla.edu/Verify_3D/) [17] and ERRAT Structure Evaluation server [18].

Multiple sequence alignment and phylogeny analysis

Multiple sequence alignment was carried between the uncharacterized protein and the proteins that had structural similarity with uncharacterized protein by BioEdit biological sequence alignment editor tool [19]. The phylogeny analysis was done by CLC Sequence Viewer v7.0.2 (https://www.clcbio.com).

Protein-Protein Interaction Analysis

Protein residues are interacting with each other for their accurate functions. Here we used STRING (https://string-db.org/) a database of known and predicted protein interactions, works through physical and functional associations. That derived from Genomic Context, high-throughput experiments, (Conserved) Co-expression, and Previous Knowledge. This database is quantitatively integrates interaction data from above sources [20].

Active site detection

The active site of the protein was determined by the Computed Atlas of Surface Topography of Protein (CASTp) (https://sts.bioengr. uic.edu/castp/) [21] provides an online resource for locating, delineating and measuring concave surface regions on three-dimensional structures of proteins.

Comparative docking analysis

Further, docking studies were initiated by using Molegro Virtual Docker (MVD) [22]. However, Molegro Virtual Docker (MVD) is an integrated environment for studying and predicting how ligands interact with macro-molecules, it is usually work within specific grid line is defined by a position (x, y, z), for the hypothetical protein the grid line incorporating ligand were, X=56.69; Y=16.86; and Z=36.17. Before the docking study was perform, we fetched the ligand molecule from the (PDB: 2QM3), a methyltransferase protein of p. furiosus. Then docking both of the protein was performed to validate it.

Results and Discussion

The physiological and chemical properties of the hypothetical protein are described in Table 1. The BLASTp results against non-redundant and SwissProt database are shown in Table 2 and 3. Blastp analysis of the FASTA sequence of the targeted protein against non-redundant and SwissProt databases revealed an average of 80% homology with other methyltransferase proteins. SABLE server predicted the secondary structure of the protein having a good confidence of the prediction (Figure 1) and the (PS)2 server predicted the three dimensional structure of the protein with 96.49% identity with the highest scoring template (PDB ID: 2QM3A) depicted in Figure 2. Validation of the predicted three dimensional model was assessed by PROCHECK through Ramachandran plot, where shows the distribution of φ and ψ angle in the model within the limits (Figure 3 and Table 4). Residues in the most favored regions covered 92.7%, which is the quality of a valid model. Finally the established model of 3D structure for the target sequence was verified by structure validation server verifies 3D and ERRAT. The highest score of 0.72 in the Verify 3D graph indicates that the environmental profile of the model is good and the overall quality factor predicted by the ERRAT server was 87.574 indicates a good model.