Comparison of Data Partitioning Schema of Parallel Pairwise Alignment on Shared Memory System

Auriza Rahmad Akbar, Heru Sukoco, Wisnu Ananta Kusuma

Abstract


The pairwise alignment (PA) algorithm is widely used in bioinformatics to analyze biological sequence. With the advance of sequencer technology, a massive amount of DNA fragments are sequenced much quicker and cheaper. The alignment algorithm needs to be parallelized to be able to align them in a shorter time. Many previous researches have parallelize PA algorithm using various data partitioning schema, but it is unclear which one is the best. The data partitioning schema is important for parallel PA performance, because this algorithm use dynamic programming technique that needs intense inter-thread communication. In this paper, we compared four partitioning schemas to find the best performing one on shared memory system. Those schemas are: blocked columnwise, rowwise, antidiagonal, and blocked columnwise with manual scheduling and loop unrolling. The last schema gave the best performance of 89% efficiency on 4 threads. This result provided fine-grain parallelism that can be used further to develop parallel multiple sequence alignment (MSA).

Keywords


data partition; pairwise alignment; parallel processing; shared memory

Full Text:

PDF

References


Cohen J. Bioinformatics: an introduction for computer scientists. ACM Computing Surveys (CSUR). 2004; 36: 122–158.

Waterman MS, Smith TF, Beyer WA. Some biological sequence metrics. Advances in Mathematics. 1976; 20: 367–387.

Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994; 22: 4673–4680.

Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology. 2000; 302: 205–217.

Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research. 2002; 30: 3059–3066.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004; 32: 1792–1797.