Approaches in Automatic Speech Recognition based on classic acoustic models seem not to exploit all the information lying in a speech signal; furthermore decoding procedures have real time constraints preventing the system to achieve optimal alignment between acoustic models and signal. In this paper, we present an approach to speech recognition in which Factorial Hidden Markov Models (FHMM) are used as syllabic acoustic models. An alignment algorithm is used for unit decoding. As applicative domain we choose numbers (range 0-999,999) uttered in Italian. Syllabic accuracy in our model is 84.81%, correctness on numbers is 77.74%. Aim of the experiment is to show that the performances of FHMMs lie in the ability to retrieve the presence of two different temporal dynamics in a speech segments: the former with a quasi-segmental timing, the latter presenting a quasi-syllabic trend. Moreover, we evaluate a unit decoding process based on a dynamic programming algorithm in order to exploit the acoustic models performances at best.

Speech Recognition with Factorial-HMM Syllabic Acoustic Models / Coro, Gianpaolo; Cutugno, Francesco; Caropreso, F.. - ELETTRONICO. - (2007), pp. 870-873. (Intervento presentato al convegno Interspeech 2007 tenutosi a Antwerp (BE) nel August, 27-31, 2007).

Speech Recognition with Factorial-HMM Syllabic Acoustic Models

CORO, GIANPAOLO;CUTUGNO, FRANCESCO;
2007

Abstract

Approaches in Automatic Speech Recognition based on classic acoustic models seem not to exploit all the information lying in a speech signal; furthermore decoding procedures have real time constraints preventing the system to achieve optimal alignment between acoustic models and signal. In this paper, we present an approach to speech recognition in which Factorial Hidden Markov Models (FHMM) are used as syllabic acoustic models. An alignment algorithm is used for unit decoding. As applicative domain we choose numbers (range 0-999,999) uttered in Italian. Syllabic accuracy in our model is 84.81%, correctness on numbers is 77.74%. Aim of the experiment is to show that the performances of FHMMs lie in the ability to retrieve the presence of two different temporal dynamics in a speech segments: the former with a quasi-segmental timing, the latter presenting a quasi-syllabic trend. Moreover, we evaluate a unit decoding process based on a dynamic programming algorithm in order to exploit the acoustic models performances at best.
2007
Speech Recognition with Factorial-HMM Syllabic Acoustic Models / Coro, Gianpaolo; Cutugno, Francesco; Caropreso, F.. - ELETTRONICO. - (2007), pp. 870-873. (Intervento presentato al convegno Interspeech 2007 tenutosi a Antwerp (BE) nel August, 27-31, 2007).
File in questo prodotto:
File Dimensione Formato  
2.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Dominio pubblico
Dimensione 204.94 kB
Formato Adobe PDF
204.94 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/205879
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact