Registration for ICASSP is free of charge, but registration is required to view the videos. If you have not yet registered, please visit: https://cmsworkshops.com/ICASSP2020/Registration.asp.Access the full virtual conference by visiting: https://2020.ieeeicassp-virtual.org/attendee/login. Your username is your email address and your password is your confirmation number/registration ID.
Recent studies have shown that acoustic beamforming using a microphone array plays an important role in the construction of high-performance automatic speech recognition (ASR) systems, especially for noisy and overlapping speech conditions. In parallel with the success of multichannel beamforming for ASR, in the speech separation field, the time-domain audio separation network (TasNet), which accepts a time-domain mixture as input and directly estimates the time-domain waveforms for each source, achieves remarkable speech separation performance. In light of these two recent trends, the question of whether TasNet can benefit from beamforming to achieve high ASR performance in overlapping speech conditions naturally arises. Motivated by this question, this paper proposes a novel speech separation scheme, i.e., Beam-TasNet, which combines TasNet with the frequency-domain beamformer, i.e., a minimum variance distortionless response (MVDR) beamformer, through spatial covariance computation to achieve better ASR performance. Experiments on the spatialized WSJ0-2mix corpus show that our proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer.