q-frame hash comparison based exact stringmatching algorithms for DNA sequences


Karcıoğlu A. A., Bulut H.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.34, sa.9, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 34 Sayı: 9
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1002/cpe.6505
  • Dergi Adı: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Anahtar Kelimeler: DNA sequences, hash function, hash-based string matching, pattern matching, sequence analysis, stringmatching algorithms
  • Atatürk Üniversitesi Adresli: Evet

Özet

The importance of string matching is due to its applications in many fields, such as medicine and bioinformatics. Various string matching algorithms are developed to speed up the search. Especially, hash-based exact string matching algorithms are among the most time-efficient ones. The efficiency of hash-based approaches depends on the hash function. Hence, perfect hashing plays an essential role in hash-based string matching. In this study, two q-frame hash comparison-based exact string matching algorithms, Hq-QF and HqBM-QF, are proposed. We have used a collision-free perfect hash function for DNA sequences in the proposed algorithms. In the first approach, after hash values match for the last qcharacters, the character comparisons in the Hash-q algorithm are replaced with q-frame hash comparison. In the second approach, we improved the first approach by utilizing the shift size indicated at the (m - 1)th entry in the good suffix shift table. Since the number of character comparisons is minimized, the worst-case time complexity of the proposed algorithms is O(n(m - ([m/q] q))). In both approaches, q-frame hash comparisons replace most character comparisons as a trade-off. The results show that the proposed approaches are more efficient than the Hash-q algorithm in terms of runtime efficiency and the number of character comparisons.