Name: Zero-Shot Unseen Speaker Voice Synthesis
Brand: Future Tech Pavilion, FUTEX
SKU: P0032000007073

Home
About
Lastest
- News
Technologies
Media
- Videos
- Photos
- Download
- Press Releases
Awards
- Future Tech Award
- AI 創新獎
Events
English
- 繁體中文
- English

:::

Home
/
Year
/
2022
/
AIoT & Smart Applications
/
Zero-Shot Unseen Speaker Voice Synthesis

recommend

An Artificial Intelligence Medicine RecognitionVerification System in Hospital Dispensing Room

Human-Robot Co-Dancing: A Computer Vision-Based, No-Code, Intuitive Robot Arm Choreography Interface and Human-Robot Collaborative Creation System

AI-Embedded 5-Axis CNC Controller

Dual Deep Learning Models for Gastric Premalignant Condition Diagnosis in Precision Health

Trace

Technical Name	Zero-Shot Unseen Speaker Voice Synthesis
Project Operator	National Central University
Project Host	王家慶
Summary	We use a pre-trained speaker encoder to obtain semantic information in speech,train a WavLM-based speaker encoder. We obtain domain-independent speaker information through Robust MAML for domain generalization training. So, the domain-independent speaker information can be applied to any untrained speaker. The effect of this speaker feature is transferred to the speech synthesis model, thereby achieving zero-resource speech synthesis results.
Scientific Breakthrough	In order to be able to fit the naturalnesssimilarity of the synthesized speech of zero-resource unregistered speakers in the speech conversion model, We use the Robust MAML training method to obtain the speaker features of domain generalization, so that it can overcome the problem of the domain gap between the unregistered datathe registered data, effectively retain the domain-independent speaker features of the target speakers, use this feature to the training of the speech synthesis model,improve the naturalnesssimilarity of unregistered speakers in the original model.
Industrial Applicability	The zero-resource speech synthesis system has a wide range of application scenarios,can be used in various situations. For example, through this technology, the voice actor only needs to dub a small number of sentences when dubbing,the rest of the sentences can be generated by the machine itself. The virtual characters can also be created. In addition, depending on the choice of semantic encoder, the technology can also be applied to data augmentationspeech conversion.
Keyword	speaker encoding domain generalization meta learning speech synthesis voice conversion voice translation data augmentation transfer learning forward-looking voice technology deep learning

Contact
Jia-Ching Wang

Email
jiacwang@gmail.com

Matchmaking

other people also saw