Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network

Yıl: 2019 Cilt: 19 Sayı: 2 Sayfa Aralığı: 91 - 100 Metin Dili: İngilizce DOI: 10.26650/electrica.2019.18042 İndeks Tarihi: 02-01-2020

Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network

Öz:
This paper proposes a voice activity detection (VAD) method based on time and spectral domain features using multi-layer feed-forward neural network (MLF-NN) for various noisy conditions. In the proposed method, time features that were short-time energy and zero-crossing rate and spectral features that were entropy, centroid, roll-off, and flux of speech signals were extracted. Clean speech signals were used in training MLF-NN and the network was tested for noisy speech at various noisy conditions. The proposed VAD method was evaluated for six kinds of noises which are white, car, babble, airport, street, and train at four different signal-to-noise ratio (SNR) levels. The proposed method was tested on core TIMIT database and its performance was compared with SOHN, G.729B and Long-Term Spectral Flatness (LSFM) VAD methods in point of correct speech rate, false alarm rate, and overall accuracy rate. Extensive simulation results show that the proposed method gives the most successful average correct speech rate, false alarm rate, and overall accuracy rate in most low and high SNR level conditions for different noise environments.
Anahtar Kelime:

Konular: Mühendislik, Elektrik ve Elektronik
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • D. Freeman, G. Cosier, "The voice activity detector for the Pan-European digital cellular mobile telephone service", Acoustics, Speech, and Signal Processing, Glasgow, UK, 1989, pp. 369-372.
  • L. Zhang, Y. C. Gao, Z. Z. Bian, C. Lu, "Voice activity detection algorithm improvement in adaptive multi-rate speech coding of 3GPP", International Conference on Wireless Communications, Networking and Mobile Computing, Wuhan, China, 2005, pp. 1257-1260.
  • L. Karray, A. Martin, "Towards improving speech detection robustness for speech recognition in adverse conditions", Speech Communication, vol. 40, no. 3, pp. 261-276, May, 2003. [CrossRef]
  • A. Sangwan, R. Sah, R. V. Prasad, V. Gaurav, "VAD Techniques", Time, pp. 46-50, 2002.
  • K. Woo, T. Yang, K. Park, C. Lee, "Robust voice activity detection algorithm for estimating noise spectrum", Electronics Letters, vol. 36, no. 2, pp. 180-181, Jan, 2000. [CrossRef]
  • Y. Zhang, Z. Tang, Y. Li, Y. Luo, "A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement", Scientific World Journal, vol. 2014, no. 2014, May, 2014. [CrossRef]
  • B. V. Ilarsha, "A noise robust speech activity detection algorithm," International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong, China, 2004, pp. 322-325. [CrossRef]
  • C. Babu, P. Vanathi, "Performance analysis of voice activity detection algorithms for robust speech recognition", International Journal of Computing Science and Communication Technologies, vol. 2, no. 1, pp. 288-293, 2009.
  • R. G. Bachu, S. Kopparthi, B. Adapa, B. D. Barkana, "Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal", Advanced Techniques in Computing Sciences and Software Engineering, pp. 279-282, 2010. [CrossRef]
  • P. Khoa, "Noise robust voice activity detection", MSc Thesis, pp. 77, 2012.
  • J. A. Haigh, J. S. Mason, "Robust voice activity detection using cepstral features", Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation, Beijing, China, 1993.
  • K. Chung, S. Y. Oh, "Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments", Wireless Personal Communications, vol. 89, no. 3, pp. 1-13, 2015. [CrossRef]
  • S. H. Chen, H. Te Wu, Y. Chang, T. K. Truong, "Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator", Pattern Recognition Letters, vol. 28, no. 11, pp. 1327-1332, 2007. [CrossRef]
  • S. Chen, R. C. Guido, T. Truong, Y. Chang, "Improved voice activity detection algorithm using wavelet and support vector machine," Computer Speech & Language, vol. 24, no. 3, pp. 531-543, 2010. [CrossRef]
  • C. Z. Chong Feng, "Voice activity detection based on ensemble empirical mode decomposition and teager kurtosis," 12th International Conference on Signal Processing (ICSP), Hangzhou, China, 2014, pp. 455-460.
  • Y. Kanai, S. Morita, M. Unoki, "Concurrent processing of voice activity detection and noise reduction using empirical mode decomposition and modulation spectrum analysis", Proceedings of the Annual Conference of the International Speech Communication Association, Lyon, France, 2013, 742-746.
  • M. Sahidullah, G. Saha, "Comparison of Speech Activity Detection Techniques for Speaker Recognition", arXiv, no. arXiv:1210.0297, pp. 1-7, 2012.
  • G. Ferroni, R. Bonfigli, E. Principi, S. Squartini, F. Piazza, "A Deep Neural Network approach for Voice Activity Detection in multiroom domestic scenarios", International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 2015. [CrossRef]
  • J. Sohn, "A statistical model-based voice activity detection", IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, 1999. [CrossRef]
  • A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, J. P. Petit, "ITU-T recommendation G.729 annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications", IEEE Communications Magazine, vol. 35, no. 9, pp. 64-73, 1997. [CrossRef]
  • J. Ramírez, J. C. Segura, C. Benítez, Á. De la Torre, A. Rubio, "Efficient voice activity detection algorithms using long-term speech information", Speech Communication, vol. 42, no. 3-4, pp. 271-287, 2004. [CrossRef]
  • Y. Ma, A. Nishihara, "Efficient voice activity detection algorithm using long-term spectral flatness measure", EURASIP Journal on Audio, Speech, and Music Processing, vol. 21, pp. 1-18, 2013. [CrossRef]
  • D. Ghosh, R. Muralishankar, S. Gurugopinath, "Robust voice activity detection using frequency domain long-term differential entropy", Interspeech, pp, 1220-1224, Sept, 2018. [CrossRef]
  • S. Graf, T. Herbig, M. Buck, G. Schmidt, "Features for voice activity detection: a comparative analysis", EURASIP Journal on Advances in Signal Processing, vol. 2015, no. 1, pp. 91, 2015. [CrossRef]
  • A. Pasad, K. Sabu, P. Rao, "Voice activity detection for children's read speech recognition in noisy conditions", 2017 23rd National Conference on Communications (NCC), Chennai, Indi, 2017. [CrossRef]
  • M. Farsinejad, M. Mohammadi, B. Nasersharif, A. Akbari, A. Framework, "A Model-based Voice Activity Detection Algorithm using probabilistic neural networks", Computer Engineering, vol. 326, pp. 8-11, 2008.
  • R. Johny Elton, P. Vasuki, J. Mohanalin, "Voice Activity Detection Using Fuzzy Entropy and Support Vector Machine", Entropy, vol. 18, no. 8, pp. 298, 2016. [CrossRef]
  • F. Bie, Z. Zhang, D. Wang, T. F. Zheng, "DNN-based Voice Activity Detection for Speaker Recognition", CLST Technical Report, pp. 1-11, 2015.
  • L. Wang, K. Phapatanaburi, Z. Oo, S. Nakagawa, M. Iwahashi, J. Dang, "Phase Aware Deep Neural Network for Noise Robust Voice Activity Detection", Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10-14 July, 2017, pp. 1087-1092. [CrossRef]
  • H. Mukherjee, S. M. Obaidullah, K. C. Santosh, S. Phadikar, K. Roy, "Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal", International Journal of Speech Technology, vol. 21, no. 4, pp. 1-8, 2018. [CrossRef]
  • M. R. Bouguelia, S. Nowaczyk, K. C. Santosh, A. Verikas, "Agreeing to disagree: active learning with noisy labels without crowdsourcing", International Journal of Machine Learning and Cybernetics, vol. 9, no. 8, pp. 1307-1319, 2018. [CrossRef]
  • Z. Ali, M. Talha, "Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments", IEEE Access, vol. 6, pp. 15494-15504, 2018. [CrossRef]
  • L. K. Hamaidi, M. Muma, A. M. Zoubir, "Robust distributed multi-speaker voice activity detection using stability selection for sparse non-negative feature extraction", 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 2017, pp. 161-165, 2017. [CrossRef]
  • Y.Q. Shi, R.W. Li, S. Zhang, S. Wang, X.Q. Yi, "A speech endpoint detection algorithm based on BP neural network and multiple features'', Applied Mechanics, Mechatronics and Intelligent Systems, pp. 393-402, 2016.
  • M. Jalil, F. A. Butt, A. Malik, "Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals", 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), Konya, Turkey, 2013, pp. 208-212. [CrossRef]
  • M. H. Moattar, M. M. Homayounpour, "A weighted feature voting approach for robust and real-time voice activity detection", ETRI Journal, vol. 33, no. 1, pp. 99-109, 2011. [CrossRef]
  • P. Renevey, A. Drygajlo, "Entropy based voice activity detection in very noisy conditions", Seventh European Conference on Speech Communication and Technology, 2001.
  • M. N. Stolar, M. Lech, S. J. Stolar, N. B. Allen, "Detection of Adolescent Depression from Speech Using Optimised Spectral Roll-Off Parameters", Biomed J Sci & Tech Res, vol. 5, no. 1, pp. 1-10, 2018. [CrossRef]
  • M. Hill, "Notes on Multilayer, Feedforward Neural Networks Fall 2007", pp. 1-7, 2007.
  • D. Svozil, V. Kvasnička, J. Pospíchal, "Introduction to multi-layer feed-forward neural networks", Chemometrics and Intelligent Laboratory Systems, vol. 39, no. 1, pp. 43-62, 1997. [CrossRef]
  • V. Zue, S. Seneff, J. Glass, "Speech database development at MIT: Timit and beyond," Speech Communication, vol. 9, no. 4, pp. 351356, 1990. [CrossRef]
  • P. C. Loizou, "Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum", IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 857-869, 2005. [CrossRef]
  • T. Drugman, Y. Stylianou, Y. Kida, M. Akamine, "Voice Activity Detection: Merging Source and Filter-based Information", IEEE Signal Processing Letters, vol. 23, no. 2, pp. 252-256, 2016. [CrossRef]
  • N. Dhananjaya, B. Yegnanarayana, "Voiced/nonvoiced detection based on robustness of voiced epochs", IEEE Signal Processing Letters, vol. 17, no. 3, pp. 273-276, 2010. [CrossRef]
  • N. Lezzoum, G. Gagnon, J. Voix, "Voice activity detection system for smart earphones", IEEE Transactions on Consumer Electronics, vol. 60, no. 4, pp. 737-744, 2014. [CrossRef]
  • D. Ying, Y. Yan, J. Dang, F. K. Soong, "Voice activity detection based on an unsupervised learning framework", IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 8, pp. 26242632, 2011. [CrossRef]
  • A. Varga, H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems", Speech Communication, vol. 12, no. 3, pp. 247-251, 1993. [CrossRef]
  • M. H. Moattar, M. M. Homayounpour, N. K. Kalantari, "A new approach for robust realtime Voice Activity Detection using spectral pattern", Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4478-4481, 2010. [CrossRef]
  • S. Dwijayanti, K. Yamamori, M. Miyoshi, "Enhancement of speech dynamics for voice activity detection using DNN", Eurasip Journal on Audio, Speech, and Music Processing, vol. 2018, no. 1, 2018. [CrossRef]
APA Arslan Ö, Engin E (2019). Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. , 91 - 100. 10.26650/electrica.2019.18042
Chicago Arslan Özkan,Engin Erkan Zeki Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. (2019): 91 - 100. 10.26650/electrica.2019.18042
MLA Arslan Özkan,Engin Erkan Zeki Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. , 2019, ss.91 - 100. 10.26650/electrica.2019.18042
AMA Arslan Ö,Engin E Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. . 2019; 91 - 100. 10.26650/electrica.2019.18042
Vancouver Arslan Ö,Engin E Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. . 2019; 91 - 100. 10.26650/electrica.2019.18042
IEEE Arslan Ö,Engin E "Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network." , ss.91 - 100, 2019. 10.26650/electrica.2019.18042
ISNAD Arslan, Özkan - Engin, Erkan Zeki. "Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network". (2019), 91-100. https://doi.org/10.26650/electrica.2019.18042
APA Arslan Ö, Engin E (2019). Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. Electrica, 19(2), 91 - 100. 10.26650/electrica.2019.18042
Chicago Arslan Özkan,Engin Erkan Zeki Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. Electrica 19, no.2 (2019): 91 - 100. 10.26650/electrica.2019.18042
MLA Arslan Özkan,Engin Erkan Zeki Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. Electrica, vol.19, no.2, 2019, ss.91 - 100. 10.26650/electrica.2019.18042
AMA Arslan Ö,Engin E Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. Electrica. 2019; 19(2): 91 - 100. 10.26650/electrica.2019.18042
Vancouver Arslan Ö,Engin E Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network. Electrica. 2019; 19(2): 91 - 100. 10.26650/electrica.2019.18042
IEEE Arslan Ö,Engin E "Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network." Electrica, 19, ss.91 - 100, 2019. 10.26650/electrica.2019.18042
ISNAD Arslan, Özkan - Engin, Erkan Zeki. "Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network". Electrica 19/2 (2019), 91-100. https://doi.org/10.26650/electrica.2019.18042