Technical Name Zero-Shot Unseen Speaker Voice Synthesis
Project Operator National Central University
Project Host 王家慶
Summary
We use a pre-trained speaker encoder to obtain semantic information in speech,train a WavLM-based speaker encoder. We obtain domain-independent speaker information through Robust MAML for domain generalization training. So, the domain-independent speaker information can be applied to any untrained speaker. The effect of this speaker feature is transferred to the speech synthesis model, thereby achieving zero-resource speech synthesis results.
Scientific Breakthrough
In order to be able to fit the naturalnesssimilarity of the synthesized speech of zero-resource unregistered speakers in the speech conversion model, We use the Robust MAML training method to obtain the speaker features of domain generalization, so that it can overcome the problem of the domain gap between the unregistered datathe registered data, effectively retain the domain-independent speaker features of the target speakers, use this feature to the training of the speech synthesis model,improve the naturalnesssimilarity of unregistered speakers in the original model.
Industrial Applicability
The zero-resource speech synthesis system has a wide range of application scenarios,can be used in various situations. For example, through this technology, the voice actor only needs to dub a small number of sentences when dubbing,the rest of the sentences can be generated by the machine itself. The virtual characters can also be created. In addition, depending on the choice of semantic encoder, the technology can also be applied to data augmentationspeech conversion.
Keyword speaker encoding domain generalization meta learning speech synthesis voice conversion voice translation data augmentation transfer learning forward-looking voice technology deep learning
Notes
  • Contact
  • Jia-Ching Wang
other people also saw