2023
論文
- Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Cernocky & Dong Yu (2023). Neural Rarget Speech Extraction: An Overview. IEEE Signal Processing Magazine, 40 (3), 8-29.
- Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani & Shoko Araki (2023). Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 31, 835-848.
- Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2023). Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection. IEEE Access, 11, 13906-13917.
国際会議予稿
- Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura & Shinji Watanabe (2023). Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
- Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara & Marc Delcroix (2023). Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
- Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix & Ryo Masumura (2023). Leveraging Large Text Corpora for End-to-End Speech Summarization. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
- Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix & Reinhold Haeb-Umbach (2023). On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
- Taishi Nakashima, Rintaro Ikeshita, Nobutaka Ono, Shoko Araki & Tomohiro Nakatani (2023). Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
- Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget & Shoko Araki (2023). Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. Proc. Interspeech. Dublin, Ireland.
- Naoyuki Kamo, Marc Delcroix & Tomohiro Nakatani (2023). Target Speaker Extraction with Conditional Diffusion Model. Proc. Interspeech. Dublin, Ireland.
- Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani & Toshio Irino (2023). Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine. Proc. Interspeech. Dublin, Ireland.
- Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka & Nobukatsu Hojo (2023). Downstream Task Agnostic Speech Enhancement Conditioned on Self-Supervised Representation Loss. Proc. Interspeech. Dublin, Ireland.
- Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa & Taichi Asami (2023). Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data. Proc. Interspeech. Dublin, Ireland.
- Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix & Yukinori Honma (2023). SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?. Proc. Interspeech. Dublin, Ireland.
- Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa & Marc Delcroix (2023). Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization. Proc. Interspeech. Dublin, Ireland.
- Hikaru Yanagida, Yusuke Ijima & Naohiro Tawara (2023). Influence of Personal Traits on Impressions of One's Own Voice. Proc. Interspeech. Dublin, Ireland.
- Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura & Taichi Asami (2023). What are differences? Comparing DNN and human by their performance and characteristics in speaker age estimation. Proc. Interspeech. Dublin, Ireland.
- Yuki Kitagishi, Hosana Kamiyama, Naohiro Tawara, Atsunori Ogawa, Noboru Miyazaki & Taichi Asami (2023). Coarse-age loss: A new training method using coarse-age labeled data for speaker age estimation. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
- Koharu Horii, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa & Norihide Kitaoka (2023). Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
- Keigo Hojo, Daiki Mori, Yukoh Wakabayashi, Kengo Ohta, Atsunori Ogawa & Norihide Kitaoka (2023). Combining multiple end-to-end speech recognition models based on density ratio approach. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
- Tatsunari Takagi, Atsunori Ogawa, Norihide Kitaoka & Yukoh Wakabayashi (2023). Streaming end-to-end speech recognition using a CTC decoder with substituted linguistic information. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.