Performance Evaluation of Leading Protein Multiple Sequence Alignment Methods
Arunima Mishra1, B. K. Tripathi2, S. S. Soam3
1Arunima Mishra*, Computer Science & Engineering, Dr APJ Abdul Kalam Technical University, Lucknow, India.
2Bipin Kumar Tripathi, Computer Science & Engineering, College, Bijnor, India.
3Sudhir Singh Soam, Computer Science & Engineering, Institute of Engineering and Technology, Lucknow, India.
Manuscript received on September 21, 2019. | Revised Manuscript received on October 15, 2019. | Manuscript published on October 30, 2019. | PP: 771-776 | Volume-9 Issue-1, October 2019 | Retrieval Number: A1369109119/2019©BEIESP | DOI: 10.35940/ijeat.A1369.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Protein Multiple sequence alignment (MSA) is a process, that helps in alignment of more than two protein sequences to establish an evolutionary relationship between the sequences. As part of Protein MSA, the biological sequences are aligned in a way to identify maximum similarities. Over time the sequencing technologies are becoming more sophisticated and hence the volume of biological data generated is increasing at an enormous rate. This increase in volume of data poses a challenge to the existing methods used to perform effective MSA as with the increase in data volume the computational complexities also increases and the speed to process decreases. The accuracy of MSA is another factor critically important as many bioinformatics inferences are dependent on the output of MSA. This paper elaborates on the existing state of the art methods of protein MSA and performs a comparison of four leading methods namely MAFFT, Clustal Omega, MUSCLE and Prob Cons based on the speed and accuracy of these methods. B Ali BASE version 3.0 (B Ali BASE is a repository of manually refined multiple sequence alignments) has been used as a benchmark database and accuracy of alignment methods is computed through the two widely used criteria named Sum of pair score (S Pscore) and total column score (T Cscore). We also recorded the execution time for each method in order to compute the execution speed.
Keywords: Multiple Sequence Alignment, Execution speed, Sum of pair score, Total column score.