Science of Media Information

Exhibition Program 17

Converting English speech to native-like pronunciation

Speech conversion using vocal tract model and deep generative models

Abstract

We are interested in developing a pronunciation conversion system that can convert non-native speech into intelligible native-like speech. We take a signal processing-based approach using our recently developed model, called the composite Line Spectral Pair (LSP) representation, and a deep learning-based approach using the generative adversarial network (GAN). The former approach makes it possible to convert the vowel quality of speech within physical constraints of the voice production mechanism, whereas the latter approach makes it possible to convert synthetic speech so that it becomes as indistinguishable as possible from real speech. We hope to further develop a real-time system so that it can be used to overcome many kinds of barriers to our daily communication.

Photos

Poster


Please click the thumbnail image to open the full-size PDF file.

Presenters

Hirokazu Kameoka
Hirokazu Kameoka
Media Information Laboratory
Takuhiro Kaneko
Takuhiro Kaneko
Media Information Laboratory
Chihiro Watanabe
Chihiro Watanabe
Media Information Laboratory
Shigemi Aoyagi
Shigemi Aoyagi
Media Information Laboratory
Ko Tanaka
Ko Tanaka
Media Information Laboratory
Kaoru Hiramatsu
Kaoru Hiramatsu
Media Information Laboratory