Comparison of complex-valued and real-valued neural networks for protein sequence classification


Yakupoğlu A., BİLGİN Ö. C.

Neural Computing and Applications, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s00521-024-10368-y
  • Dergi Adı: Neural Computing and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
  • Anahtar Kelimeler: Complex deep learning, Complex sequence encoding, Protein classification
  • Atatürk Üniversitesi Adresli: Evet

Özet

In recent years, tremendous progress has been made in the field of real-valued deep learning. Despite successful applications using amplitude and phase features, complex-valued deep learning methods remain an actively researched area with significant potential. This study investigates the potential of complex-valued networks in biological sequence analysis. In this context, the sequences encoded by a novel approach proposed for encoding protein sequences into complex numbers are classified by complex networks and compared with a real method available in the literature. This comparative study is carried out separately for three different sequence forms of protein sequences: DNA, codon and amino acid. Both real and complex networks achieved very high test accuracies of 90% and above. In statistical analyses using tenfold cross-validation, the complex-valued method yielded average accuracies of 88% (± 6), 84% (± 8) and 87% (± 8) for DNA, codon and amino acid sequences, respectively. The real-valued method gave mean accuracies of 91% (± 8), 88% (± 6) and 88% (± 7), respectively. According to the comparative t-test, there was no statistically significant difference between the two methods at the p = 0.05 level, but the findings highlight the potential for achieving high success in biological sequence analysis of complex networks despite their current limitations.