Download PDFOpen PDF in browser

The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018

EasyChair Preprint no. 65

8 pagesDate: April 15, 2018


This paper presents the NU non-parallel voice conversion (VC) system developed at Nagoya University for SPOKE task of Voice Conversion Challenge 2018 (VCC2018). The goal of the SPOKE task is to develop VC systems without the requirement of parallel training data. The key idea of our system development is to use text-to-speech (TTS) voice as a reference voice, making it possible to create two parallel training datasets between the source and TTS voices and between the TTS and target voices. Using these datasets, a cascade VC system is developed to convert the source voice into the target voice via the TTS voice as the reference. Furthermore, we also propose a system selection framework to avoid generating collapsed speech waveforms, which are often observed by using less accurately converted speech features in WaveNet vocoder. The VCC2018 results demonstrate that our system has achieved the 2nd best in terms of similarity (around 70% of the similarity score) and an above average in terms of naturalness (around 3.0 of the mean opinion score) among all submitted systems.

Keyphrases: Deep Neural Network, Non-parallel voice conversion, Reference speaker, System selection, voice conversion challenge 2018, WaveNet vocoder

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Yichiao Wu and Patrick Lumban Tobing and Tomoki Hayashi and Kazuhiro Kobayashi and Tomoki Toda},
  title = {The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018},
  howpublished = {EasyChair Preprint no. 65},
  doi = {10.29007/bw9p},
  year = {EasyChair, 2018}}
Download PDFOpen PDF in browser