• Personalized emotion sensing for spoken dialog interface

  • 2019 -10 -08
In recent years, emotion AI technology has attracted much attention. Gartner also pointed out that emotion AI is a 20 billion US dollar industry spanning a wide range fields of application! However, emotional expression differs from speakers to speakers due to a variety of idiosyncratic human factors. At present, most emotional recognition techniques fail to consider individual differences. Our recent technology integrates multimodal speech and language data to realize the personalization of emotion recognition and enable it to be applied across large scale real life scenarios.
This speech-based emotion recognition solution for spoken dialogs is composed of three major sub-modules. Firstly, a deep multimodal emotion recognition algorithm which learns discriminative representation for both speech and text. Secondly, a cross-corpus transfer learning that eliminates the discrepancy between corpora and enables leveraging of abundant unlabeled spoken dialog corpus. Thirdly, we integrate the personalized embedding to improve the emotion recognition framework within the attention mechanism that brings significant improvement.
Our prototype system is an integrative solution including automatic speech recognition, semantic processing, multimodal fusion, and personalized models through voice and text modeling. It can be easily integrated to existing speech solution or to be used as a standalone voice interface, including voice assistant, automotive industry, IoT, etc.