Graphics Card Processing: Acceleration of Multiple Sequence Alignment

Research Article

Austin J Comput Biol Bioinform. 2014;1(2): 6.

Graphics Card Processing: Acceleration of Multiple Sequence Alignment

Hanif MK* and Zimmermann KH

Institute of Computer Technology, Hamburg University of Technology, Germany

*Corresponding author: Muhammad Kashif Hanif, Institute of Computer Technology, Hamburg University of Technology, 21071 Hamburg, Germany

Received: September 16, 2014; Accepted: December 01, 2014; Published: December 05, 2014

Abstract

ClustalW is the most widely used heuristic method for multiple sequence alignment. It consists of three stages: distance matrix calculation, guide tree compilation, and greedy-fashion alignment. The high computational complexity demands methods to accelerate the algorithm. In this work, the efficient mapping of the progressive alignment stage onto graphics processing unit by using a combination of wavefront and matrix-matrix product techniques will be studied. The experimental results exhibit one order of magnitude speed-up over the serial version.

Keywords: Alignment; Progressive alignment; Graphics processor card; ClustalW; Performance

Introduction

Sequence alignment is the fundamental technique in molecular biology to compare sequences and to identify regions of similarity that are eventually consequences of structural, functional, or evolutionary relationships [1- 4]. Sequence alignment is performed for all kinds of organic molecules, like DNA, RNA, or protein sequences. Multiple sequence alignment is the technique to align three or more sequences simultaneously. The aligned sequences are obtained by inserting gaps and have equal length. However, multiple sequence alignment is very time-consuming. For instance, optimal dynamic programming methods require O(2knk) steps to simultaneously align k sequences of length O(n) [4]. A variety of heuristic methods have been developed to cope with multiple sequence alignment problems. The most widely accepted heuristic method for aligning multiple sequences is progressive alignment [5,6]. This method aligns more closely related sequences first and then gradually adds more divergent sequences [7]. The alignment accuracy can be improved by assessing the sequences according to their relatedness. A progressive alignment algorithm can handle a larger number of sequences in practical time scales. The most widely used progressive alignment programs are ClustalW [5, 8, 9], T-Coffee [6,10], MAFFT [11,12], and MUSCLE [13,14].

ClustalW is a typical progressive alignment algorithm making use of the policy "once a gap, always a gap", i.e., gaps introduced earlier in the alignment remain valid as new sequences are added [9,15]. It works in three stages (Figure 1). In the first stage, the distances between all pairs of sequences are calculated by pairwise sequence alignment. Pairwise sequence alignment can be calculated by the dynamic programming based method of Needleman-Wunsch [16] or one of its varieties like Smith-Waterman [17] or a fast heuristic method [9,18-20]. The scores of attained pairwise alignments are converted into distances which are input for the subsequent stage [9].