MMHFNet: Multi-modal and multi-layer hybrid fusion network for voice pathology detection

Mohammed, Hussein; ÖMEROĞLU, Aslı; ORAL, Emin

doi:10.1016/j.eswa.2023.119790

MMHFNet: Multi-modal and multi-layer hybrid fusion network for voice pathology detection

Atıf İçin Kopyala

Mohammed H. M., ÖMEROĞLU A. N., ORAL E. A.

Expert Systems with Applications, cilt.223, 2023 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 223
Basım Tarihi: 2023
Doi Numarası: 10.1016/j.eswa.2023.119790
Dergi Adı: Expert Systems with Applications
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
Anahtar Kelimeler: Voice pathology detection, Multi-modal data fusion, Multi-layer fusion, Deep learning, CNN, LSTM
Atatürk Üniversitesi Adresli: Evet

Özet

Automatic voice pathology detection using non-invasive techniques that utilize patients’ speech and electroglottograph (EGG) signals play a vital role in diagnosis and early medical intervention. In this paper, a novel deep Multi-Modal and Multi-Layer Hybrid Fusion Network (MMHFNet) is proposed to improve the performance of non-invasive voice pathology detection systems. MMHFNet simultaneously incorporates complementary information of different modalities (speech and EGG signals). It also vertically combines the low-level features, extracted from shallow layers, and high-level features, extracted from deep layers, to take the full advantage of spatio-spectral information of different layers for multi-layer fusion. The features extracted by MMHFNet are then fed into an LSTM classification network to diagnose the voice pathology. Comprehensive experiments are conducted on the publicly available Saarbruecken Voice Database (SVD) to evaluate the performance of the proposed MMHFNet. This dataset is used in two manners; one using its all samples and the other with selected samples to form the largest balanced SVD dataset. Experimental results demonstrated that the proposed MMHFNet achieves accuracy rates of 91% and 96.05% for datasets with all and balanced samples, respectively.