On Improving Dynamic State Space Approaches to Articulatory Inversion With MAP-Based Parameter Estimation

ÖZBEK İ. Y., Hasegawa-Johnson M., Demirekler M.

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, vol.20, no.1, pp.67-81, 2012 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 20 Issue: 1
  • Publication Date: 2012
  • Doi Number: 10.1109/tasl.2011.2157496
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.67-81
  • Keywords: Acoustic-to-articulatory inversion, interactive multiple model (IMM) smoothing, jump Markov linear system (JMLS), maximum-likelihood (ML) and maximum a posteriori (MAP) learning, VOCAL-TRACT, PROBABILISTIC FUNCTIONS, SPEECH, MODEL, RECOGNITION, REGRESSION, INFORMATION, HYPOTHESIS, ACOUSTICS, MOVEMENTS
  • Ataturk University Affiliated: Yes


This paper presents a complete framework for articulatory inversion based on jump Markov linear systems (JMLS). In the model, the acoustic measurements and the position of each articulator are considered as observable measurement and continuous-valued hidden state of the system, respectively, and discrete regimes of the system are represented by the use of a discrete-valued hidden modal state. Articulatory inversion based on JMLS involves learning the model parameter set of the system and making inference about the state (position of each articulator) of the system using acoustic measurements. Iterative learning algorithms based on maximum-likelihood (ML) and maximum a posteriori (MAP) criteria are proposed to learn the model parameter set of the JMLS. It is shown that the learning procedure of the JMLS is a generalized version of hidden Markov model (HMM) training when both acoustic and articulatory data are given. In this paper, it is shown that the MAP-based learning algorithm improves modeling performance of the system and gives significantly better results compared to ML. The inference stage of the proposed algorithm is based on an interacting multiple models (IMM) approach, and done online (filtering), and/or offline (smoothing). Formulas are provided for IMM-based JMLS smoothing. It is shown that smoothing significantly improves the performance of articulatory inversion compared to filtering. Several experiments are conducted with the MOCHA database to show the performance of the proposed method. Comparison of the performance of the proposed method with the ones given in the literature shows that the proposed method improves the performance of state space approaches, making state space approaches comparable to the best published results.