Engineering Analysis and Recognition of Nigerian English: An Insight into Low Resource Languages
DOI:
https://doi.org/10.14738/tmlai.24.334Keywords:
Nigerian English, Limited Resource Language, Automatic Speech Recognition (ASR)Abstract
A comparative analysis between Nigerian English (NE) and American English (AE) is presented in this article. The study is aimed at highlighting differences in the speech parameters, and how they influence speech processing and automatic speech recognition (ASR). The UILSpeech corpus of Nigerian-Accented English isolated word recordings, read speech utterances, and video recordings is used as a reference for Nigerian English. The corpus captures the linguistic diversity of Nigeria with data collected from native speakers of Hausa, Igbo, and Yoruba languages. The UILSpeech corpus is intended to provide a unique opportunity for application and expansion of speech processing techniques to a limited resource language dialect. The acoustic-phonetic differences between American English (AE) and Nigerian English (NE) are studied in terms of pronunciation variations, vowel locations in the formant space, mean fundamental frequency, and phone model distances in the acoustic space, as well as through visual speech analysis of the speakers’ articulators. A strong impact of the AE–NE acoustic mismatch on ASR is observed. A combination of model adaptation and extension of the AE lexicon for newly established NE pronunciation variants is shown to substantially improve performance of the AE-trained ASR system in the new NE task. This study is a part of the pioneering efforts towards incorporating speech technology in Nigerian English and is intended to provide a development basis for other low resource language dialects and languages.
References
U. Gut and J.-T. Milde, “The prosody of Nigerian English,” in SP-2002, 2002, pp. 367–370.
C. T. Hodge, “Yoruba: Basic course,” ED – 010 – 462 Report NDEA – VI – 375, US Foreign Service Institute, 1963.
A. A. Fakoya, Nigerian English: A Morpholecta Classification, Ph.D. thesis, Lagos State University, 2007.
S. Amuda, Boril, H., Sangwan, A. and Hansen, J. H. L. (2010). “Limited Resource Speech Recognition for Nigerian English.” Proc. of IEEE ICASSP’10, 5090-5093.
M. Jibril, “Phonological Variation in Nigerian English”, Ph.D Thesis at University of Lancaster 1986
T. T. Ajani “Is There Indeed A ‘Nigerian English’?” Journal of Humanities & Social Sciences, 1(1), 2007.
T. Ufomata “Setting Priorities in Teaching English Pronunciation in ESL Contexts”, Seminar presentation as a British Academy Visiting Fellow at University College London, 1996.
A. Bamgbose, “Language in Contact: Yoruba and English in Nigeria”, Education and Development, 2(1), pp. 329-341, 1982.
W. Voiers, I. Dynastat, and T. Austin, “Diagnostic Acceptability Measure for Speech Communication System,” in Proc. of IEEE ICASSP, vol. 2, pp. 204–207, 1977.
M. A. Koler, “A Comparison of the New 2400 bps MELP Federal Standard with other Standard Coders,” in Proc. of IEEE ICASSP, 1997.
L. M, Arslan and J. H. L. Hansen, “Language Accent Classification in American English”, Speech Communication, vol. 18, pp. 353-367, ELSEVIER, 1996.
L. M, Arslan and J. H. L. Hansen, “A Study of Temporal Features Frequency Characteristics in American English Foreign Accent”, Journal of Acoustical Society of America, vol. 201(1), pp. 28-40, July, 1997.
J. S. Garofolo, L. F. Lamel, J. G. Fisher,W.M. andFiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus, LDC93S1, 1993.
J.-L. Gauvain and Chin-Hui Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech & Audio Processing, 2(2), pp. 291–298, 1994.
S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), pp. 357–366, 1980.
K. Sjolander and J. Beskow, “WaveSurfer – An Open Source Speech Tool,” in Proc. of ICSLP‘00, Beijing, China, 2000, vol. 4, pp. 464–467.
R. D. Kent and C. Read, The Acoustic Analysis of Speech, Whurr Publishers, San Diego, 1992.
J. Silva and S. Narayanan, “Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models,” IEEE Transactions on Audio, Speech, and Language Processing, 14(3), pp. 890–906, 2006.
J. H. L. Hansen, “Analysis and Compensation of Speech Under Stress and Noise for Environmental Robustness in Speech Recognition,” Speech Communication, 20(1-2), pp. 151–173, 1996.
J. H. L. Hansen, E. Ruzanski, H. Boril, J. Meyerhoff, “TEO-Based Speaker Stress Assessment Using Hybrid Classification and Tracking Schemes,” International Journal of Speech Technology, Springer, June 2012, DOI 10.1007/s10772-012-9165-1.
T. Hasan, H. Boril, A. Sangwan, J. H. L. Hansen, “Multi-Modal Highlight Generation for Sports Videos Using an Information-Theoretic Excitability Measure,” EURASIP Journal on Advances in Signal Processing, 2013:173, 2013.
H. Boril, J. H. L. Hansen, “Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments,” IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379-1393, 2010.
H. Boril, Q. Zhang, A. Ziaei, J. H. L. Hansen, D. Xu, J. Gilkerson, J. A. Richards, Y. Zhang, X. Xu, H. Mao, L. Xiao, F. Jiang, “Automatic Assessment of Language Background in Toddlers Through Phonotactic and Pitch Pattern Modeling of Short Vocalizations,” accepted to Workshop on Child Computer Interaction (WOCCI), September, Singapore, 2014.
M. Mehrabani, H. Boril, J. H. L. Hansen, “Dialect Distance Assessment Method Based on Comparison of Pitch Pattern Statistical Models,” in Proc. of IEEE ICASSP'10, 5158-5161, Dallas, TX, 2010.
Link: http://www.avs4you.com (accessed on Aug 20, 2014).