Download PDFOpen PDF in browserRobust Features in Deep Neural Networks for Transcoded Speech Recognition DSR and AMR NBEasyChair Preprint 131676 pages•Date: May 2, 2024AbstractAutomatic speech recognition (ASR) performance in mobile communications degrades significantly when the environment contains many sources of variability. For example, when the test environment differs from the training environment, and when the acoustic environment contains disturbances such as noise, channel distortion, speaker differences, and mobile codecs. In this work, we have used two mobile network speech recognition architectures. The first one is Distributed Speech Recognition based on the DSR codec, and the second architecture is based on the Adaptive Multi-Rate Narrow-Band (AMR-NB) codec. We propose a novel robust feature extraction (Front-End) technique to improve speech recognition performance in noisy mobile communications. This technique utilizes special parameters such as Gabor features, Power Normalized Spectrum Gabor filter (PNS-Gabor), and Power Standardized Cepstral Coefficients (PNCC). These features consider psychoacoustic effects like the temporal masking effect and have different distributions of filter banks and filter forms to better model human perception. In the back end, we investigated speech classification systems using Continuous Hidden Markov Models (CHMM) and Deep Neural Networks (DNN). Based on the results obtained in noisy mobile communications, the proposed features PNS-Gabor and PNCC show significant improvements over conventional acoustic features such as Mel frequency cepstral coefficients (MFCC). Keyphrases: AMR-NB, ASR, DNN, DSR, HMM, MFCC, PN-Gabor, PNCC
|