Registration for ICASSP is free of charge, but registration is required to view the videos. If you have not yet registered, please visit: https://cmsworkshops.com/ICASSP2020/Registration.asp.Access the full virtual conference by visiting: https://2020.ieeeicassp-virtual.org/attendee/login. Your username is your email address and your password is your confirmation number/registration ID.

You need an account to view media

Sign in to view media

Don't have an account? Please contact us to request an account.

Speech Processing
SPE-L12.1
Lecture
Speech Separation and Extraction II: Multi-channel

BEAM-TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK MEETS FREQUENCY-DOMAIN BEAMFORMER

Tsubasa Ochiai

Date & Time

Thu, May 7, 2020

12:30 pm – 2:30 pm

Location

On-Demand

Abstract

Recent studies have shown that acoustic beamforming using a microphone array plays an important role in the construction of high-performance automatic speech recognition (ASR) systems, especially for noisy and overlapping speech conditions. In parallel with the success of multichannel beamforming for ASR, in the speech separation field, the time-domain audio separation network (TasNet), which accepts a time-domain mixture as input and directly estimates the time-domain waveforms for each source, achieves remarkable speech separation performance. In light of these two recent trends, the question of whether TasNet can benefit from beamforming to achieve high ASR performance in overlapping speech conditions naturally arises. Motivated by this question, this paper proposes a novel speech separation scheme, i.e., Beam-TasNet, which combines TasNet with the frequency-domain beamformer, i.e., a minimum variance distortionless response (MVDR) beamformer, through spatial covariance computation to achieve better ASR performance. Experiments on the spatialized WSJ0-2mix corpus show that our proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer.


Presenter

Tsubasa Ochiai

NTT Corporation
Sign in to join the conversationDon't have an account? Please contact us to request an account.
Sign in to view documentsDon't have an account? Please contact us to request an account.

Session Chair