Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

ÖZKAN, EMRE; Ozbek, İbrahim; Demirekler, Muebeccel

doi:10.1109/tasl.2009.2022198

Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

ÖZKAN E., Ozbek I. Y., Demirekler M.

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, cilt.17, sa.8, ss.1518-1532, 2009 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 17 Sayı: 8
Basım Tarihi: 2009
Doi Numarası: 10.1109/tasl.2009.2022198
Dergi Adı: IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.1518-1532
Anahtar Kelimeler: Dirichlet process, formant tracking, particle filter, spectral representation, spectrum estimation, vocal tract resonance (VTR), FORMANT TRACKING, PREDICTION, FILTER, TRANSFORMATION
Atatürk Üniversitesi Adresli: Evet

Özet

In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the spectrum during the speech utterance. The analysis is based on a new state space representation of concatenated tube model. We show that the number of formants which appear in the spectrum is directly related to the location of the constriction of the vocal tract (i.e., the location of the excitation). Moreover, the disappearance of the formants in the spectrum is explained by "uncontrollable modes" of the state space model. Under the assumption of existence of varying number of formants in the spectrum, we propose the use of a DPM model based multi-target tracking algorithm for tracking unknown number of formants. The tracking algorithm defines a hierarchical Bayesian model for the unknown formant states and the inference is done via Rao-Blackwellized particle filter.