LEXICON AND LANGUAGE MODELING OF EGYPTIAN DIALECT AUTOMATIC SPEECH RECOGNITION
Taha Merghani1, Tuka Al Hanai2, James Glass2.
1Jackson State University, Jackson, MS, 2Massachusetts Institute of Technology, Cambridge, MA.
The task of developing automatic speech recognition (ASR) systems for Arabic is challenging as a result of the myriad dialects, complicated morphology (the way words are put together), and ambiguous orthography (the way the language is written). We explored lexical, language, and acoustic models in the domain of resource-poor languages with limited data, focusing on the Aljazeera corpus, composed of 12 hours of the Egyptian Arabic dialect. With the aid of KALDI, a speech recognition toolkit, we trained several acoustic models including Gaussian mixture model based hidden Markov models (GMM-HMM), deep neural networks (DNN), and long- short-term memory recursive neural networks (LSTM-RNN). In addition, we built lexical and language models evaluating the use of graphemes and diacritized pronunciations in the lexicon. The diacritized lexicons and language models were generated using the MADAMIRA text processing toolkit of Egyptian Arabic, and have evaluated ASR performance using the word error rate (WER) metric.