Registration for ICASSP is free of charge, but registration is required to view the videos. If you have not yet registered, please visit: the full virtual conference by visiting: Your username is your email address and your password is your confirmation number/registration ID.

You need an account to view media

Sign in to view media

Don't have an account? Please contact us to request an account.

Speech Processing
Speech Synthesis and Voice Conversion II


Dongsuk Yook

Date & Time

Wed, May 6, 2020

12:30 pm – 2:30 pm




Voice conversion (VC) refers to transforming the speaker characteristics of an utterance without altering its linguistic contents. Many works on voice conversion require to have parallel training data that is highly expensive to acquire. Recently, the cycle-consistent adversarial network (CycleGAN), which does not require parallel training data, has been applied to voice conversion, showing the state-of-the-art performance. The CycleGAN based voice conversion, however, can be used only for a pair of speakers, i.e., one-to-one voice conversion between two speakers. In this paper, we extend the CycleGAN by conditioning the network on speakers. As a result, the proposed method can perform many-to-many voice conversion among multiple speakers using a single generative adversarial network (GAN). Compared to building multiple CycleGANs for each pair of speakers, the proposed method reduces the computational and spatial cost significantly without compromising the sound quality of the converted voice. Experimental results using the VCC2018 corpus confirm the efficiency of the proposed method.


Dongsuk Yook

Korea University
Sign in to join the conversationDon't have an account? Please contact us to request an account.
Sign in to view documentsDon't have an account? Please contact us to request an account.

Session Chair

Masami Akamine

Tohoku University