Oany AR; Ahmad SAI; Siddikey MAA; Hossain MU; Ferdoushi A

Research Article

Austin J Comput Biol Bioinform. 2014;1(2): 5.

Computational Structure Analysis and Function Prediction of an Uncharacterized Protein (I6U7D0) of Pyrococcus furiosus COM1

Oany AR*, Ahmad SAI, Siddikey MAA, Hossain MU and Ferdoushi A

Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Bangladesh

*Corresponding author: Oany AR, Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh-1902, Tangail, Bangladesh

Received: October 24, 2014; Accepted: December 18, 2014; Published: December 22, 2014

Abstract

Pyrococcus furiosus is a hyperthermophilic Archaea. An uncharacterized protein of this Achaea, I6U7D0 (UniProt accession) containing 349 residues was selected for in silico analysis. Various bioinformatic tools were used to predict the structure and function of this protein. Sequence similarity was searched through UniProt and non-redundant database using BLASTp program of NCBI and homology was found with methyltransferases. Multiple sequence alignment was used to locate the conserved residues. The secondary and three dimensional structures were predicted. The validation of the three dimensional structure was obtained through PROCHECK, Verify3D and ERRAT program. CASTp server was used to predict the active site of the protein. Molecular docking with the ligand ACY (Acetic Acid) was performed using Molegro Virtual Docker to visualize the interactions between the ligand and amino acid residues in the protein. Finally, all the accumulated results suggested the biological function of the target protein to be a methyltransferase.

Keywords: Sequence alignments; Molecular docking; Protein-Ligand interactions; Active site

Introduction

Pyrococcus furiosus is a hyperthermophilic Archaea. It is considered a model organism to study the hyperthermophilic extremophiles, mostly due to its rapid growth at 100oC and the already sequenced genome [1]. Studying P. furiosus, as well as other extremophiles might possess a lot of potentials because of its unique genomic and physiological features. Its thermostable enzymes and other unique proteins might be used in various applications.

With the advancement in sequencing technologies, it is now considerably easier to obtain the whole genome sequence of such single cell organisms. Still, there are protein sequences with functions yet to be discovered or experimentally confirmed. We presume these uncharacterized proteins a vast unexplored field with numerous opportunities, both as medical and industrial tools. In silico analysis might assist in determining the biological functions of such uncharacterized proteins. This can partly be facilitated by predicting the three-dimensional (3D) structure of the targeted protein. When the experimentally obtained structure is unavailable, comparative or homology modelling can sometimes provide a useful 3D model for the protein of interest that is related to at least one known protein structure. Comparative or homology modelling predicts the 3D structure of a given protein sequence based primarily on its alignment to one or more proteins of known structure. Over the past few decades the number of sequences in the comprehensive public sequence databases, such as Swiss-Prot/TrEMBL [2] and GenPept, [3] have increased to a greater extent compared to the number of experimentally determined structures deposited in the Protein Data Bank (PDB) [4]. As a result, a gap has formed between the number of known sequences and confirmed functions. The in silico prediction methods for 3D structure and biological function of proteins might assist in reducing this gap [5].

Prediction methods are based on fold assignment, target-template alignment, model building and model evaluation [6]. However, in silico predicted 3D structures are confirmed only by experimental methods such as X-ray crystallography and NMR spectroscopy [7]. Homology-based gene annotation has been the standard method for allocating a function to a novel uncharacterized protein during the last decades. With the development of new algorithms and bioinformatic tools, now a day various other methods can complement the classical homology search. These methods are designed to detect presumed functional constraints on genome evolution that is known as 'genomic context' approaches [8]. Some recent study has also followed the above analysis in order to propose the function of protein, which exists in protein level [9]. In this study, an attempt has been made to predict the structure and biological function of an uncharacterized protein (I6U7D0) using various bioinformatic tools.

Materials and Methods

Sequence retrieval

Initially we searched the UniProtKB protein (www.uniprot.org/) [10] and UniProt entryI6U7D0 of Achaea, pyrococcus furiosus consist of 349 amino acid residues was selected by targeted selection. Then the sequence was stored as a FASTA format sequence.

Physico-chemical properties analysis

The ProtParam (https://web.expasy.org/protparam/) [11] tool of ExPASy was used for the analysis of the proteins physiological and chemical properties deduced from our protein sequence. The properties including aliphatic index, GRAVY (Grand Average of Hydropathy), Extinction coefficients, isoelectric point (pI), molecular weight etc. were analyzed through this tool.

Homology identification

To get the preliminary prediction about the function of the targeted protein, similarity search was performed with the NCBI protein Database (https://www.ncbi.nlm.nih.gov/) against non-redundant and SwissProt [12] database to find out the proteins that might have structural similarities with that of the uncharacterized protein by using BLASTp program [13].

Structure prediction

The retrieved sequence was used for the prediction of the Secondary structure of the protein by SABLE server (https://sable. cchmc.org/) [14] and the tertiary structure was predicted by (PS)2v2 server (https://ps²v2.life.nctu.edu.tw/) of the Molecular Bioinformatics Center, National Chiao Tung University [15]. The three dimensional structure was predicted one the basis of best scoring template for higher accuracy.

Model quality assessment

Finally the quality of the predicted three dimensional structure was assessed by PROCHECK [16], Verify3D (https://nihserver.mbi. ucla.edu/Verify_3D/) [17] and ERRAT Structure Evaluation server [18].

Multiple sequence alignment and phylogeny analysis

Multiple sequence alignment was carried between the uncharacterized protein and the proteins that had structural similarity with uncharacterized protein by BioEdit biological sequence alignment editor tool [19]. The phylogeny analysis was done by CLC Sequence Viewer v7.0.2 (https://www.clcbio.com).

Protein-Protein Interaction Analysis

Protein residues are interacting with each other for their accurate functions. Here we used STRING (https://string-db.org/) a database of known and predicted protein interactions, works through physical and functional associations. That derived from Genomic Context, high-throughput experiments, (Conserved) Co-expression, and Previous Knowledge. This database is quantitatively integrates interaction data from above sources [20].

Active site detection

The active site of the protein was determined by the Computed Atlas of Surface Topography of Protein (CASTp) (https://sts.bioengr. uic.edu/castp/) [21] provides an online resource for locating, delineating and measuring concave surface regions on three-dimensional structures of proteins.

Comparative docking analysis

Further, docking studies were initiated by using Molegro Virtual Docker (MVD) [22]. However, Molegro Virtual Docker (MVD) is an integrated environment for studying and predicting how ligands interact with macro-molecules, it is usually work within specific grid line is defined by a position (x, y, z), for the hypothetical protein the grid line incorporating ligand were, X=56.69; Y=16.86; and Z=36.17. Before the docking study was perform, we fetched the ligand molecule from the (PDB: 2QM3), a methyltransferase protein of p. furiosus. Then docking both of the protein was performed to validate it.

Results and Discussion

The physiological and chemical properties of the hypothetical protein are described in Table 1. The BLASTp results against non-redundant and SwissProt database are shown in Table 2 and 3. Blastp analysis of the FASTA sequence of the targeted protein against non-redundant and SwissProt databases revealed an average of 80% homology with other methyltransferase proteins. SABLE server predicted the secondary structure of the protein having a good confidence of the prediction (Figure 1) and the (PS)² server predicted the three dimensional structure of the protein with 96.49% identity with the highest scoring template (PDB ID: 2QM3A) depicted in Figure 2. Validation of the predicted three dimensional model was assessed by PROCHECK through Ramachandran plot, where shows the distribution of φ and ψ angle in the model within the limits (Figure 3 and Table 4). Residues in the most favored regions covered 92.7%, which is the quality of a valid model. Finally the established model of 3D structure for the target sequence was verified by structure validation server verifies 3D and ERRAT. The highest score of 0.72 in the Verify 3D graph indicates that the environmental profile of the model is good and the overall quality factor predicted by the ERRAT server was 87.574 indicates a good model.

Figure 1: Secondary structure analysis by using SABLE.

    
    
    Figure 1:  Secondary structure analysis by using SABLE.

Figure 2: Predicted three dimensional structure of the hypothetical protein.

    
    
    Figure 2:  Predicted three dimensional structure of the hypothetical protein.

Figure 3: Ramachandran plot of modelled structure validated by PROCHECK program.

    
    
    Figure 3:  Ramachandran plot of modelled structure validated by PROCHECK program.

Figure 4: Multiple sequence alignment of different homologous protein.

    
    
    Figure 4:  Multiple sequence alignment of different homologous protein.

Table 1: Physico-chemical properties analysis of the hypothetical protein.




  
    No of Amino    acid 
    MW 
    pI 
    (Asp + Glu) 
    (Arg + Lys) 
    Ext.    coefficient 
    Aliphatic    index (AI) 
    Instability index (II) 
    Grand average of hydropathicity (GRAVY) 
  
  
    349 
    40326.8
    4.68 
    67
    43
    49975
    96.07
    39.36
    -0.323



Table 1:  Physico-chemical properties analysis of the hypothetical protein.

Table 2: Similar protein obtained from UniProt database.




  
    Entry� Name 
    Organism 
    Protein name 
    Identity 
    Score 
    e-value 
  
  
    I3RDM7_9EURY 
    Pyrococcus sp. ST04
    Putative methyltransferase
    93%
    1,720
    0.0
  
  
    Q9UZ33_PYRAB 
    Pyrococcusabyssi
    Predicted methyltransferase, DUF43 family
    91%
    1,711
    0.0
  
  
    F0LHU2_THEBM 
    Thermococcusbarophilus
    Predicted methyltransferase 
    84%
    1,576
    0.0
  
  
    H3ZM66_THELI
    Thermococcuslitoralis
    Methyltransferase
    81%
    1,524
    0.0
  
  
    C6A0P9_THESM
    Thermococcussibiricus
    Predicted methyltransferase 
    78%
    1,478
    0.0



Table 2:  Similar protein obtained from UniProt database.

Table 3: Similar protein obtained from Non-redundant UniProt KB/SwissProt sequences.




  
    Protein� ID 
    Organism 
    Protein name 
    Identity 
    Score 
    e-value 
  
  
    ref|WP_014733863.1 
    Pyrococcus sp.
    Methyltransferase
    93%
    673
    0.0
  
  
    ref|WP_013467640.1 
    Thermococcusbarophilus
    Methyltransferase
    84%
    615
    0.0
  
  
    ref|WP_012766155.1 
    Thermococcussibiricus 
    Methyltransferase
    78%
    577
    0.0
  
  
    ref|WP_014806245.1
    Anaerobaculum    mobile
    Methyltransferase
    49%
    328
    7e-107
  
  
    ref|WP_011026055.1
    Thermoanaeobacter
      tengcongensis 
    Methyltransferase
    45%
    316
    3e-102



Table 3:  Similar protein obtained from Non-redundant UniProt KB/SwissProt sequences.

Table 4: Ramachandran plot statistics of the hypothetical protein.




  
    Ramachandran plot statistics 
     
    (%)
  
  
    Residues in the most favored regions [A, B, L] 
    290
    92.7%
  
  
    Residues in the additional allowed regions [a, b, l, p]
    20
    6.4%
  
  
    Residues in the generously allowed regions [a, b, l, p]
    3
    1%
  
  
    Residues in the disallowed regions
    0
    0.0%
  
  
    Number of non-glycine and non-proline residues
    313
    100.0%
  
  
    Number of end-residues (excl. Gly and Pro)
    2
     
  
  
    Number of glycine residues (shown in triangles)
    18
     
  
  
    Number of proline residues
    16
     
  
  
    Total number of residues
    342



Table 4:  Ramachandran plot statistics of the hypothetical protein.

Multiple sequence alignment (Figure 4) was considered the FASTA sequences of the uncharacterized protein (I6U7D0) and the homologous annotated proteins. In order to confirm homology assessment between the proteins, down to the complex and subunit level, phylogenetic analysis was additionally performed. Phylogenetic tree was constructed based on the alignment and BLAST result give the similar concept about the protein is shown in Figure 5. The distances between branches are also included.

Figure 5: Phylogenic trees with true distance of different methyltransferase proteins.

    
    
    Figure 5:  Phylogenic trees with true distance of different methyltransferase proteins.

STRING protein-protein interaction network revealed that our hypothetical protein strongly interacts with the reverse gyrase (rgy) protein, this interaction gives us some insights that the protein could act as DNA/RNA methyltransferase (Figure 6). The predicted active site with their amino acid residues of the protein were depicted in Figure 7. Finally the comparative molecular docking study with the ligand ACY that was bounded in the active site of the protein that ensures us the function of the hypothetical protein is methyltransferase which displayed binding energies of - 59.3778 and -53.9562 kcal/mol for 2QM3 and I6U7DO proteins, respectively (Figure 8 and Table 5).