Signal Processing Research Group

[Japanese|English]

Human-friendly technologies, such as a voice interface, become increasingly popular in our daily lives. This means that speech will be used more and more to communicate with computers. Our group aims to achieve comfortable conversation with computers whenever, wherever, and involving whoever. With the powerful, state-of-the-art Digital Signal Processing technology, we are researching speech recognition technology, acoustic signal processing technology, and other various kinds of signal processing technologies.

Group Leader Shoko Araki

Research Index

Publications

2024

Journal Papers

  1. Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki & Shoji Makino (2024). Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 32, 1157-1172.
  2. Rintaro Ikeshita & Tomohiro Nakatani (2024). Geometrically-Regularized Fast Independent Vector Extraction by Pure Majorization-Minimization. IEEE Transactions on Signal Processing, 72, 1560-1575.

Peer-reviewed Conference Papers

  1. Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix & Naohiro Tawara (2024). Discriminative Training of VBx Diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  2. Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki & Jan Cernocky (2024). Target Speech Extraction with pre-trained self-supervised learning models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  3. William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2024). Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  4. Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada & Shoji Makino (2024). Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  5. Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki & Shigeru Katagiri (2024). How does end-to-end speech recognition training impact speech enhancement artifacts?. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  6. Keigo Wakayama, Tsubasa Ochiai, Marc Delcroix, Masahiro Yasuda, Shoichiro Saito, Shoko Araki & Akira Nakayama (2024). Online Target Sound Extraction with Knowledge Distillation from Partially Non-Causal Teacher. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  7. Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami & Yusuke Ijima (2024). What do self-supervised speech and speaker models learn? New findings from a cross model layer-wise analysis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  8. Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya & Yusuke Ijima (2024). Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  9. Naohiro Tawara, Marc Delcroix, Atsushi Ando & Atsunori Ogawa (2024). NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  10. Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki & Jan Cernocky (2024). Probing Self-supervised Learning Models with Target Speech Extraction. ICASSP2024 Satellite Workshop on Self-supervision in Audio, Speech, and Beyond (SASB). Seoul, Korea.
  11. Thilo von Neumann, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2024). MeetEval, Show Me the Errors! Interactive Visualization of Transcript Alignments for the Analysis of Conversational ASR. Show & Tell Demo, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Seoul, Korea.
  12. Thilo von Neumann, Christoph Cord-Landwehr Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2024). Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization. ICASSP2024 Satellite Workshop on Hands-Free Speech Communication and Microphone Array (HSCMA). Seoul, Korea.
  13. Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Delcroix Marc, Shoko Araki, Tetsuya Ueda & Shoji Makino (2024). Diffusion model-based MIMO speech denoising and dereverberation. ICASSP2024 Satellite Workshop on Hands-Free Speech Communication and Microphone Array (HSCMA) Workshop. Seoul, Korea.
  14. Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani & Shoko Araki (2024). ENSEMBLE INFERENCE FOR DIFFUSION MODEL-BASED SPEECH ENHANCEMENT. ICASSP2024 Satellite Workshop on Hands-Free Speech Communication and Microphone Array (HSCMA). Seoul, Korea.
  15. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Masato Mimura, Takatomo Kano, Atsunori Ogawa & Marc Delcroix (2024). Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation. Interspeech2024. Kos Island, Greece.
  16. Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama & Marc Delcroix (2024). SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling. Interspeech2024. Kos Island, Greece.
  17. Kenichi Fujita, Takanori Ashihara, Marc Delcroix & Yusuke Ijima (2024). Lightweight Zero-shot Text-to-Speech with Mixture of Adapters. Interspeech2024. Kos Island, Greece.
  18. Keigo Hojo, Yukoh Wakabayashi, Kengo Ohta, Atsunori Ogawa & Norihide Kitaoka (2024). Boosting CTC-based ASR using inter-layer attention-based CTC loss. Interspeech2024. Kos Island, Greece.
  19. Tatsunari Takagi, Yukoh Wakabayashi, Atsunori Ogawa & Norihide Kitaoka (2024). Text-only domain adaptation for CTC-based speech recognition through substitution of implicit linguistic information in the search space. Interspeech2024. Kos Island, Greece.
  20. Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki & Simon Doclo (2024). Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers. Interspeech2024. Kos Island, Greece.

2023

Journal Papers

  1. Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Cernocky & Dong Yu (2023). Neural Rarget Speech Extraction: An Overview. IEEE Signal Processing Magazine, 40 (3), 8-29.
  2. Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani & Shoko Araki (2023). Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 31, 835-848.
  3. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2023). Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection. IEEE Access, 11, 13906-13917.

Peer-reviewed Conference Papers

  1. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura & Shinji Watanabe (2023). Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  2. Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara & Marc Delcroix (2023). Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  3. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix & Ryo Masumura (2023). Leveraging Large Text Corpora for End-to-End Speech Summarization. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  4. Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix & Reinhold Haeb-Umbach (2023). On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
  5. Taishi Nakashima, Rintaro Ikeshita, Nobutaka Ono, Shoko Araki & Tomohiro Nakatani (2023). Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
  6. Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget & Shoko Araki (2023). Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. Proc. Interspeech. Dublin, Ireland.
  7. Naoyuki Kamo, Marc Delcroix & Tomohiro Nakatani (2023). Target Speaker Extraction with Conditional Diffusion Model. Proc. Interspeech. Dublin, Ireland.
  8. Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani & Toshio Irino (2023). Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine. Proc. Interspeech. Dublin, Ireland.
  9. Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka & Nobukatsu Hojo (2023). Downstream Task Agnostic Speech Enhancement Conditioned on Self-Supervised Representation Loss. Proc. Interspeech. Dublin, Ireland.
  10. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa & Taichi Asami (2023). Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data. Proc. Interspeech. Dublin, Ireland.
  11. Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix & Yukinori Honma (2023). SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?. Proc. Interspeech. Dublin, Ireland.
  12. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa & Marc Delcroix (2023). Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization. Proc. Interspeech. Dublin, Ireland.
  13. Hikaru Yanagida, Yusuke Ijima & Naohiro Tawara (2023). Influence of Personal Traits on Impressions of One's Own Voice. Proc. Interspeech. Dublin, Ireland.
  14. Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura & Taichi Asami (2023). What are differences? Comparing DNN and human by their performance and characteristics in speaker age estimation. Proc. Interspeech. Dublin, Ireland.
  15. Yuki Kitagishi, Hosana Kamiyama, Naohiro Tawara, Atsunori Ogawa, Noboru Miyazaki & Taichi Asami (2023). Coarse-age loss: A new training method using coarse-age labeled data for speaker age estimation. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  16. Koharu Horii, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa & Norihide Kitaoka (2023). Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  17. Keigo Hojo, Daiki Mori, Yukoh Wakabayashi, Kengo Ohta, Atsunori Ogawa & Norihide Kitaoka (2023). Combining multiple end-to-end speech recognition models based on density ratio approach. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  18. Tatsunari Takagi, Atsunori Ogawa, Norihide Kitaoka & Yukoh Wakabayashi (2023). Streaming end-to-end speech recognition using a CTC decoder with substituted linguistic information. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.

2022

Journal Papers

  1. Wangyou Zhang, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani, Shinji Watanabe & Yanmin Qian (2022). End-to-end dereverberation, beamforming, and speech recognition in a cocktail party. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 30, 3173-3188.
  2. Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi & Shoko Araki (2022). Soundbeam: target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  3. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo & Shoko Araki (2022). Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1032-1047.
  4. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2022). Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 576-589.
  5. Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Hiroto Ashihara, Tetsunori Kobayashi & Tetsuji Ogawa (2022). Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models. IEICE Transactions on Information and Systems, E105.D (1), 150-160.
  6. Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj & Sanjeev Khudanpur (2022). Joint speaker diarization and speech recognition based on region proposal networks. Computer Speech & Language, 72, 101316.

Peer-reviewed Conference Papers

  1. Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada & Kunio Kashino (2022). ConceptBeam: Concept driven target speech extraction. Proc. ACM International Conference on Multimedia(ACMMM). Lisbon, Portugal.
  2. Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita & Tomohiro Nakatani (2022). Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  3. Naoyuki Kamo, Rintaro Ikeshita, Keisuke Kinoshita & Tomohiro Nakatani (2022). Importance of Switch Optimization Criterion in Switching WPE Dereverberation. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  4. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo & Takafumi Moriya (2022). Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  5. Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2022). Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  6. Atsunori Ogawa, Naohiro Tawara, Marc Delcroix & Shoko Araki (2022). Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  7. Keisuke Kinoshita, Marc Delcroix & Tomoharu Iwata (2022). Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  8. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2022). SA-SDR: A Novel Loss Function for Separation of Meeting Style Data. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  9. Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix & Takahiro Shinozaki (2022). Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  10. Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki & Shigeru Katagiri (2022). How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR. Proc. Interspeech 2022.
  11. Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato & Tomohiro Nakatani (2022). Listen only to me! How well can target speech extraction handle false alarms?. Proc. Interspeech 2022.
  12. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka & Ryo Masumura (2022). Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. Proc. Interspeech 2022.
  13. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2022). Streaming Target-Speaker ASR with Neural Transducer. Proc. Interspeech 2022.
  14. Martin Kocour, Katerina Zmolikova, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukas Burget & Jan Cernocky (2022). Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. Proc. Interspeech 2022.
  15. Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa & Norihide Kitaoka (2022). End-to-End Spontaneous Speech Recognition Using Disfluency Labeling. Proc. Interspeech 2022.
  16. Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker & Reinhold Haeb-Umbach (2022). Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. Proc. Interspeech 2022.
  17. Rintaro Ikeshita & Tomohiro Nakatani (2022). ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis. 2022 30th European Signal Processing Conference (EUSIPCO).
  18. Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner & Jan Honza Černocký (2022). Analysis of Impact of Emotions on Target Speech Extraction and Speech Separation. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC).
  19. Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada & Shoji Makino (2022). Neural Virtual Microphone Estimator: Application to Multi-Talker Reverberant Mixtures. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  20. Naoyuki Kamo, Kenichi Arai, Atsunori Ogawa, Shoko Araki, Tomohiro Nakatani, Keisuke Kinoshita, Marc Delcroix, Tsubasa Ochiai & Toshio Irino (2022). Speech Intelligibility Prediction through Direct Estimation of Word Accuracy Using Conformer. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  21. Kenichi Arai, Atsunori Ogawa, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, Naoyuki Kamo & Toshio Irino (2022). Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  22. Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Arai, Atsunori Ogawa, Keisuke Kinoshita & Tomohiro Nakatani (2022). Effective data screening technique for crowdsourced speech intelligibility experiments: Evaluation with IRM-based speech enhancement. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  23. Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada & Kunio Kashino (2022). ConceptBeam: Concept Driven Target Speech Extraction. Proceedings of the 30th ACM International Conference on Multimedia.
  24. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo & Shoko Araki (2022). Switching Independent Vector Extraction and Its Joint Optimization with Weighted Prediction Error Dereverberation. Proc.~of 24th INTERNATIONAL congress on acoustics (ICA2022).
  25. Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2021). Attention-Based Multi-Hypothesis Fusion for Speech Summarization. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
  26. Naohiro Tawara, Atsunori Ogawa, Yuki Kitagishi, Hosana Kamiyama & Yusuke Ijima (2021). Robust speech-age estimation using local maximum mean discrepancy under mismatched recording condition. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

2021

Journal Papers

  1. T. Nakatani, C. R. Haeb-Umbach, J. Heymann, L.Drude, S. Watanabe, M. Delcroix, T. Nakatani, "Far-Field Automatic Speech Recognition," Proceedings of the IEEE, Volume: 109, Issue: 2, pp. 124-148, Feb. 2021.
  2. N. Ito, R. Ikeshita, H. Sawada and T. Nakatani, "A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021.
  3. R. Ikeshita, T. Nakatani and S. Araki, "Block Coordinate Descent Algorithms for Auxiliary-Function-Based Independent Vector Extraction," IEEE Transactions on Signal Processing, 2021.
  4. R. Ikeshita and T. Nakatani, "Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation," IEEE Signal Processing Letters, vol. 28, pp. 972-976, 2021.
  5. R. Ikeshita, N. Kamo and T. Nakatani, "Blind Signal Dereverberation Based on Mixture of Weighted Prediction Error Models," IEEE Signal Processing Letters, vol. 28, pp. 399-403, 2021.
  6. Rintaro Ikeshita, Keisuke Kinoshita, Naoyuki Kamo & Tomohiro Nakatani (2021). Online Speech Dereverberation Using Mixture of Multichannel Linear Prediction Models. IEEE Signal Processing Letters, 28, 1580-1584.

Peer-reviewed Conference Papers

  1. C. Li, Y. Luo, C. Han, J. Li, T. Yoshioka, T. Zhou, M. Delcroix, K. Kinoshita, C. Boeddeker, Y. Qian, S. Watanabe, and Z. Chen, "Dual-Path RNN for Long Recording Speech Separation," in Proc. 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 865-872.
  2. K. Zmolikova, M. Delcroix, L. Burget, T. Nakatani, and J. H. Černocky, "Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation," in Proc. 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 889-896.
  3. H. Sato, T. Ochiai, K. Kinoshita, M. Delcroix, T. Nakatani, S. Araki, "Multimodal Attention Fusion for Target Speaker Extraction," in Proc. IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784.
  4. C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, D. Kolossa, "Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization," in Proc. 2020 28th European Signal Processing Conference (EUSIPCO) , 2021, pp. 231-235.
  5. S. Watanabe, F. Boyer, X. Chang, P. Guo, T. Hayashi, Y. Higuchi, T. Hori, W. -C Huang, H. Inaguma, N. Kamo, S. Karita, C. Li, J. Shi, A. S. Subramanian, W. Zhang, "The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans," in Proc. 2021 IEEE Data Science & Learning Workshop (DSLW), 2021.
  6. J. Wissing, B. Boenninghoff, D. Kolossa, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and C. Schymura "Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4705-4709.
  7. C. Li, Z. Chen, Y. Luo, C. Han, T. Zhou, K. Kinoshita, M. Delcroix, S. Watanabe, and Y. Qian, "Dual-Path Modeling for Long Recording Speech Separation in Meetings," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5739-5743.
  8. P. Guo, F. Boyer, X. Chang, T. Hayashi, Y. Higuchi, H. Inaguma, N. Kamo, C. Li, D. Garcia-Romero, J. Shi, J. Shi, S. Watanabe, K. Wei, W. Zhang, and Y. Zhang, "Recent Developments on ESPnet Toolkit Boosted by Conformer," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5874-5878.
  9. M. Delcroix, K. Zmolikova, T. Ochiai, K. Kinoshita, and T. Nakatani, "Speaker Activity Driven Neural Speech Extraction," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6099-6103.
  10. T. Ochiai, M. Delcroix, T. Nakatani, R. Ikeshita, K. Kinoshita, and S. Araki, "Neural Network-Based Virtual Microphone Estimator," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6114-6118.
  11. A. Ogawa, N. Tawara, T. Kano, and M. Delcroix, "BLSTM-Based Confidence Estimation for End-to-End Speech Recognition," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6383-6387.
  12. N. Tawara, A. Ogawa, Y. Kitagishi, and H. Kamiyama, "Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6963-6967.
  13. K. Kinoshita, M. Delcroix and N. Tawara, "Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7198-7202.
  14. T. Moriya, T. Ashihara, T. Tanaka, T. Ochiai, H. Sato, A. Ando, Y. Ijima, R. Masumura, and Y. Shinohara, "SimpleFlat: A simple whole-network pre-training approach for RNN transducer-based end-to-end speech recognition," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5664-5668.
  15. C. Boeddeker, W. Zhang, T. Nakatani, K. Kinoshita, T. Ochiai, M. Delcroix, N. Kamo, Y. Qian, R. Haeb-Umbach, "Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 8428-8432.
  16. W. Zhang, C. Boeddeker, S. Watanabe, T. Nakatani, M. Delcroix, K. Kinoshita, T. Ochiai, N. Kamo, R. Haeb-Umbach, Y. Qian, "End-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6898-6902.
  17. Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach & Yanmin Qian (2021). End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  18. Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki & Shoji Makino (2021). Low Latency Online Blind Source Separation Based on Joint Optimization with Blind Dereverberation. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  19. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya & Naoyuki Kamo (2021). Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition. Proc. Interspeech 2021.
  20. Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix & Taichi Asami (2021). Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture. Proc. Interspeech 2021.
  21. Christopher Schymura, Benedikt Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki & Dorothea Kolossa (2021). PILOT: Introducing Transformers for Probabilistic Sound Event Localization. Proc. Interspeech 2021.
  22. Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita & Shoko Araki (2021). Few-Shot Learning of New Sound Classes for Target Sound Extraction. Proc. Interspeech 2021.
  23. Keisuke Kinoshita, Marc Delcroix & Naohiro Tawara (2021). Advances in Integration of End-to-End Neural and Clustering-Based Diarization for Real Conversational Speech. Proc. Interspeech 2021.
  24. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2021). Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers. Proc. Interspeech 2021.
  25. Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani & Zhuo Chen (2021). Continuous Speech Separation Using Speaker Inventory for Long Recording. Proc. Interspeech 2021.
  26. Katerina Zmolikova, Marc Delcroix, Desh Raj, Shinji Watanabe & Jan Černocký (2021). Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics. Proc. Interspeech 2021.
  27. Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita & Tomohiro Nakatani (2021). Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility. Proc. Interspeech 2021.
  28. Yosuke Higuchi, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi & Tetsuji Ogawa (2021). Noise-robust Attention Learning for End-to-End Speech Recognition. 2020 28th European Signal Processing Conference (EUSIPCO).
  29. Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki & Shoji Makino (2021). Low Latency Online Source Separation and Noise Reduction Based on Joint Optimization with Dereverberation. 2021 29th European Signal Processing Conference (EUSIPCO).
  30. Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari & Tomohiro Nakatani (2021). Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation. 2021 29th European Signal Processing Conference (EUSIPCO).
  31. Hiroshi Sawada, Rintaro Ikeshita & Tomohiro Nakatani (2021). Experimental Analysis of EM and MU Algorithms for Optimizing Full-rank Spatial Covariance Model. 2020 28th European Signal Processing Conference (EUSIPCO).
  32. Tomohiro Nakatani, Rintaro Ikeshita, Naoyuki Kamo, Keisuke Kinoshita, Shoko Araki & Sawada Hiroshi (2021). Switching convolutional beamformer. 2021 29th European Signal Processing Conference (EUSIPCO).
  33. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada & Shoko Araki (2021). Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  34. Atsunori Ogawa, Naohiro Tawara, Takatomo Kano & Marc Delcroix (2021). BLSTM-based confidence estimation for end-to-end speech recognition. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  35. Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix & Reinhold Haeb-Umbach (2021). Speeding up permutation invariant training for source separation. Speech Communication; 14th ITG Conference.

2020

Journal Papers

  1. K. Yamamoto, T. Irino, S. Araki, K. Kinoshita, T. Nakatani, "Speech intelligibility prediction using a multi-resolution Gammachirp envelope distortion index with common parameters for different noise conditions," Acoust. Sci. & Tech., Vol. 41 (1), pp. 396-399, Jan 2020.
  2. T Nakatani, C Boeddeker, K Kinoshita, R Ikeshita, M Delcroix, "Jointly optimal denoising, dereverberation, and source separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 28, pp. 2267-2282, 2020.
  3. S. Emura, H. Sawada, S. Araki, N. Harada, "Multi-delay sparse approach to residual crosstalk reduction for blind source separation, "IEEE Signal Processing Letters, vol. 27, pp.1630—1634, Sept. 2020.
  4. N. Ito and S. Godsill, "A Multi-Target Track-Before-Detect Particle Filter Using Superpositional Data in Non-Gaussian Noise," IEEE Signal Processing Letters, vol. 27, pp. 1075-1079, 2020.

Peer-reviewed Conference Papers

  1. K. Kinoshita, T. Ochiai, M. Delcroix, and T. Nakatani, "Improving noise robust automatic speech recognition with single-channel time-domain enhancement network," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7009-7013.
  2. T. von Neumann, K. Kinoshita, L. Drude, C. Boeddeker, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "End-to-end training of time domain audio separation and recognition," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7004-7008.
  3. N. Tawara, A. Ogawa, T. Iwata, M. Delcroix, and T. Ogawa, "Frame-level phoneme-invariant speaker embedding for text-independent speaker recognition on extremely short utterances," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6799-6803.
  4. N. Tawara, H. Kamiyama, S. Kobashikawa, and A. Ogawa, "Improving speaker-attribute estimation by voting based on speaker cluster information," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6594-6598.
  5. T. Nakatani, R. Takahashi, T. Ochiai, K. Kinoshita, R. Ikeshita, M. Delcroix, and S. Araki, "DNN-supported mask-based convolutional beamforming for simultaneous denoising, dereverberation, and source separation," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6399-6403.
  6. T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, and S. Araki, "Beam-Tasnet: Time-domain audio separation network meets frequency-domain beamformer," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6384-6388.
  7. M. Delcroix, T. Ochiai, K. Zmolikova, K. Kinoshita, N. Tawara, T. Nakatani, and S. Araki, "Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 691-695.
  8. T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, "Convergence-guaranteed independent positive semidefinite tensor analysis based on student's t distribution," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 681-685.
  9. R. Ikeshita, T. Nakatani, and S. Araki, "Overdetermined independent vector analysis," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 591-595.
  10. C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and D. Kolossa, "A dynamic stream weight backprop Kalman filter for audiovisual speaker tracking," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 581-585.
  11. K. Kinoshita, M. Delcroix, S. Araki, and T. Nakatani, "Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 381-385.
  12. C. Boeddeker, T. Nakatani, K. Kinoshita, and R. Haeb-Umbach, "Jointly optimal dereverberation and beamforming," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 216-220.
  13. Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech enhancement using self-adaptation and multi-head self-attention," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 181-185.
  14. S. Emura, H. Sawada, S. Araki, and N. Harada, "A frequency-domain BSS method based on l1 norm, unitary constraint, and cayley transform," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 111-115.
  15. A. Aroudi, M. Delcroix, T. Nakatani, K. Kinoshita, S. Araki, and S. Doclo, "Cognitive-Driven Convolutional Beamforming Using EEG-Based Auditory Attention Decoding," in Proc. 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), 2020, pp. 1-6.
  16. T. Nakatani, R. Ikeshita, K. Kinoshita, H. Sawada, and S. Araki, "Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 91-95.
  17. K. Arai, S. Araki, A. Ogawa, K. Kinoshita, T. Nakatani, and T. Irino, "Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System" in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 1156-1160.
  18. T. Ochiai, M. Delcroix, Y. Koizumi, H. Ito, K. Kinoshita, and S. Araki, "Listen to What You Want: Neural Network-based Universal Sound Selector" in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 1441-1445.
  19. K. Kinoshita, T. von Neumann, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "Multi-path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 2652-2656.
  20. T. von Neumann, C. Boeddeker, L. Drude, K. Kinoshita, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 3097-3101.
  21. A. Ogawa, N. Tawara, and M. Delcroix, "Language Model Data Augmentation Based on Text Domain Transfer," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 4926-4930.
  22. T. Moriya, T. Ochiai, S. Karita, H. Sato, T. Tanaka, T. Ashihara, R. Masumura, Y. Shinohara, and M. Delcroix, "Self-distillation for improving CTC-Transformer-based ASR systems," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 546–550.

2019

Journal Papers

  1. T. Nakatani and K. Kinoshita, "A unified convolutional beamformer for simultaneous denoising and dereverberation," IEEE Signal Processing Letters, vol. 26, no. 6, pp. 903-907, 2019.
  2. R. Haeb-Umbach, S. Watanabe, T. Nakatani, M. Bacchiani, B. Hoffmeister, M. L. Seltzer, H. Zen, and M. Souden, "Speech processing for digital home assistants: Combining signal processing with deep-learning techniques," IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 111-124, 2019.
  3. M. Hentschel, M. Delcroix, A. Ogawa, T. Iwata, and T. Nakatani, "Feature based domain adaptation for neural network language models with factorised hidden layers," IEICE Transactions on Information and Systems, vol. E102.D, no. 3, pp. 598-608, 2019.
  4. K. Zmolikova, M. Delcroix, K. Kinoshita, T. Ochiai, T. Nakatani, L. Burget, and J. Cernocky, "SpeakerBeam: Speaker aware neural network for target speaker extraction in speech mixtures," IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 4, pp. 800-814, 2019.
  5. K. Yamamoto, T. Irino, T. Matsui, S. Araki, K. Kinoshita, and T. Nakatani, "Speech intelligibility prediction with the dynamic compressive Gammachirp filterbank and modulation power spectrum, " Acoustical Science and Technology, vol. 40, no. 2, pp. 84-92, 2019.

Peer-reviewed Conference Papers

  1. S. Araki, N. Ono, K. Kinoshita, and M. Delcroix, "Projection back onto filtered observations for speech separation with distributed microphone array," in Proc. CAMSAP 2019 - IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2019, pp. 291-295.
  2. S. Karita, N. Chen, T. Hayashi, T. Hori, H. Inaguma, Z. Jiang, M. Someki, N. E. Y. Soplin, R. Yamamoto, X. Wang, S. Watanabe, T. Yoshimura, and W. Zhang, "A comparative study on transformer vs RNN in speech applications," in Proc. ASRU 2019 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 449-456.
  3. R. Ikeshita, N. Ito, T. Nakatani, and H. Sawada, "Independent low-rank matrix analysis with decorrelation learning," in Proc. WASPAA 2019 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019, pp. 288-292.
  4. T. Nakatani, K. Kinoshita, R. Ikeshita, H. Sawada, and S. Araki, "Simultaneous denoising, dereverberation, and source separation using a unified convolutional beamformer," in Proc. WASPAA 2019 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019, pp. 224-228.
  5. K. Arai, S. Araki, A. Ogawa, K. Kinoshita, T. Nakatani, K. Yamamoto, and T. Irino, "Predicting speech intelligibility of enhanced speech using phone accuracy of DNN-based ASR system," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 4275-4279.
  6. A. Ogawa, M. Delcroix, S. Karita, and T. Nakatani, "Improved deep duel model for rescoring N-best speech recognition list using backward LSTMLM and ensemble encoders," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 3900-3904.
  7. T. Ochiai, M. Delcroix, K. Kinoshita, A. Ogawa, and T. Nakatani, "Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 2718-2722.
  8. S. Karita, N. E. Y. Soplin, S. Watanabe, M. Delcroix, A. Ogawa, and T. Nakatani, "Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 1408-1412.
  9. M. Delcroix, S. Watanabe, T. Ochiai, K. Kinoshita, S. Karita, A. Ogawa, and T. Nakatani, "End-to-end SpeakerBeam for single channel target speech recognition," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 451-455.
  10. T. Nakatani and K. Kinoshita, "Simultaneous denoising and dereverberation for low-latency applications using frame-by-frame online unified convolutional beamformer," in Proc. Interspeech 2019 - the 20th Annual Conference of the International Speech Communication Association, 2019, pp. 111-115.
  11. T. Nakatani and K. Kinoshita, "Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation," in Proc. EUSIPCO 2019 - the 27th European Signal Processing Conference (EUSIPCO), 2019, pp. 1-5.
  12. R. Ikeshita, N. Ito, T. Nakatani, and H. Sawada, "A unifying framework for blind source separation based on a joint diagonalizability constraint," in Proc. EUSIPCO 2019 - the 27th European Signal Processing Conference (EUSIPCO), 2019, pp. 1-5.
  13. H. Sawada, R. Ikeshita, N. Ito, and T. Nakatani, "Computational acceleration and smart initialization of full-rank spatial covariance analysis," in Proc. EUSIPCO 2019 - the 27th European Signal Processing Conference (EUSIPCO), 2019, pp. 1-5.
  14. M. Hentschel, M. Delcroix, A. Ogawa, T. Iwata, and T. Nakatani, “A unified framework for feature-based domain adaptation of neural network language models," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7250-7254.
  15. A. Ogawa, T. Hirao, T. Nakatani, and M. Nagata, "ILP-based compressive speech summarization with content word coverage maximization and its oracle performance analysis," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7190-7194.
  16. T. Ochiai, M. Delcroix, K. Kinoshita, A. Ogawa, and T. Nakatani, "A unified framework for neural speech separation and extraction," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6975-6979.
  17. M. Delcroix, K. Zmolikova, T. Ochiai, K. Kinoshita, S. Araki, and T. Nakatani, "Compact network for SpeakerBeam target speaker extraction," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6965-6969.
  18. Y. Kubo, T. Nakatani, M. Delcroix, K. Kinoshita, and S. Araki, "Mask-based MVDR beamformer for noisy multisource environments: Introduction of time-varying spatial covariance model," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6855-6859.
  19. J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, "Joint optimization of neural network-based WPE dereverberation and acoustic model for robust online ASR," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6655-6659.
  20. S. Karita, S. Watanabe, T. Iwata, M. Delcroix, A. Ogawa, and T. Nakatani, "Semi-supervised end-to-end speech recognition using text-to-speech and autoencoders," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6166-6170.
  21. S. Araki, N. Ono, K. Kinoshita, and M. Delcroix, "Estimation of sampling frequency mismatch between distributed asynchronous microphones under existence of source movements with stationary time periods detection," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 785-789.
  22. N. Ito and T. Nakatani, "FastMNMF: Joint diagonalization based accelerated algorithms for multichannel nonnegative matrix factorization," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 371-375.
  23. T. v. Neumann, K. Kinoshita, M. Delcroix, S. Araki, T. Nakatani, and R. Haeb-Umbach, "All-neural online source separation, counting, and diarization for meeting analysis," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 91-95.
  24. S. Emura, S. Araki, T. Nakatani and N. Harada, "Distortionless beamforming optimized with l1-norm minimization," in Proc. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.

2018

Journal Papers

  1. M. Delcroix, K. Kinoshita, A. Ogawa, C. Huemmer, and T. Nakatani, "Context adaptive neural network based acoustic models for rapid adaptation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 5, pp. 895-908, 2018.
  2. M. Tomiyama, K. Yamasaki, K. Arai, M. Inubushi, K. Yoshimura, and A. Uchida, "Effect of bandwidth limitation of optical noise injection on common-signal-induced synchronization in multimode semiconductor lasers," Optics Express, vol. 26, p. 13521, 05 2018.
  3. S. Emura, S. Araki, T. Nakatani, and N. Harada, "Distortionless beamforming optimized with l1 norm minimization," IEEE signal processing letters, vol. 25, no. 7, pp. 936--940, July, 2018.

Peer-reviewed Conference Papers

  1. N. Ito and T. Nakatani, "Multiplicative updates and joint diagonalization based acceleration for under-determined BSS using a full-rank spatial covariance model," in Proc. 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018, pp. 231-235.
  2. M. Hentschel, M. Delcroix, A. Ogawa, and T. Nakatani, "Feature-based learning hidden unit contributions for domain adaptation of RNN-LMs," in Proc. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 1692-1696.
  3. M. Hentschel, M. Delcroix, A. Ogawa, T. Iwata, and T. Nakatani, "Factorised hidden layer based domain adaptation for recurrent neural network language models," in Proc. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 1940-1944.
  4. T. Moriya, R. Masumura, T., Asami, Y. Shinohara, M. Delcroix, Y., Yamaguchi, and Y. Aono, "Progressive neural network-based knowledge transfer in acoustic models," in Proc. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 998-1002.
  5. K. Yamamoto, T. Irino, S. Araki, K. Kinoshita, and T. Nakatani, "Speech intelligibility prediction using a multi-resolution Gammachirp envelope distortion index with common parameters for different noise conditions," in Proc. International Symposium on Universal Acoustical Communication, 2018.
  6. N. Ito and T. Nakatani, "FastFCA-AS: Joint diagonalization based acceleration of full-rank spatial covariance analysis for separating any number of sources," in Proc. IWAENC 2018 - the 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, pp. 151-155.
  7. S. Araki, N. Ono, K. Kinoshita, and M. Delcroix, "Comparison of reference microphone selection algorithms for distributed microphone array based speech enhancement in meeting recognition scenarios," in Proc. IWAENC 2018 - the 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, pp. 316-320.
  8. Y. Matsui, T. Nakatani, M. Delcroix, K. Kinoshita, N. Ito, S. Araki, and S. Makino, "Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming," in Proc. IWAENC 2018 - the 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, pp. 71-75.
  9. J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, "Frame-online DNN-WPE dereverberation," in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 2018, pp. 466-470.
  10. N. Ito, S. Araki, and T. Nakatani, "FastFCA: Joint diagonalization based acceleration of audio source separation using a full-rank spatial covariance model," in Proc. EUSIPCO 2018 - the 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 1667-1671.
  11. N. Ito, C. Schymura, S. Araki, and T. Nakatani, "Noisy cGMM: Complex Gaussian mixture model with non-sparse noise model for joint source separation and denoising," in Proc. EUSIPCO 2018 - the 26th European Signal Processing Conference (EUSIPCO), 2018, pp. 1662-1666.
  12. L. Drude, C. Boeddeker, J. Heymann, R. Haeb-Umbach, M.Kinoshita, M. Delcroix and T. Nakatani, “Integrating neural network based beamforming and weighted prediction error dereverberation,” in Proc. Interspeech 2018 - the 19th Annual Conference of the International Speech Communication Association, 2018, pp. 3043-3047.
  13. M. Delcroix, S. Watanabe, A. Ogawa, S. Karita, and T. Nakatani, "Auxiliary feature based adaptation of end-to-end ASR systems," in Proc. Interspeech 2018 - the 19th Annual Conference of the International Speech Communication Association, 2018, pp. 2444-2448.
  14. T. Moriya, S. Ueno, Y. Shinohara, M. Delcroix, Y. Yamaguchi, and Y. Aono, “Multi-task learning with augmentation strategy for acoustic-to-word attention-based encoder-decoder Speech Recognition,” in Proc. Interspeech 2018 - the 19th Annual Conference of the International Speech Communication Association, 2018, pp. 2399-2403.
  15. S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, "ESPnet: End-to-end speech processing toolkit," in Proc. Interspeech 2018 - the 19th Annual Conference of the International Speech Communication Association, 2018, pp. 2207-2211.
  16. K. Yamamoto, T. Irino, N. Ohashi, S. Araki, K. Kinoshita, and T. Nakatani, "Multi-resolution Gammachirp envelope distortion index for intelligibility prediction of noisy speech," in Proc. Interspeech 2018 - the 19th Annual Conference of the International Speech Communication Association, 2018, pp. 1863-1867.
  17. S. Karita, S. Watanabe, T. Iwata, A. Ogawa, and M. Delcroix, "Semi-supervised end-to-end speech recognition," in Proc. Interspeech 2018 - the 19th Annual Conference of the International Speech Communication Association, 2018, pp. 2-6.
  18. F.-R. Stoter, A. Liutkus, and N. Ito, “The 2018 Signal Separation and Evaluation Campaign,” In Proc. LVA/ICA, 2018, pp. 293-305.
  19. K. Zmolikova, M. Delcroix, K. Kinoshita, T. Higuchi, T. Nakatani, and J. Cernocky, "Optimization of speaker-aware multichannel speech extraction with ASR criterion," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6702-6706.
  20. A. Ogawa, M. Delcroix, S. Karita, and T. Nakatani, "Rescoring N-best speech recognition list based on one-on-one hypothesis comparison using encoder-classifier model," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6099-6103.
  21. T. Morioka, N. Tawara, T. Ogawa, A. Ogawa, T. Iwata, and T. Kobayashi, "Language model domain adaptation via recurrent neural networks with domain-shared and domain-specific representations," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6084-6088.
  22. S. Karita, A. Ogawa, M. Delcroix, and T. Nakatani, "Sequence training of encoder-decoder model using policy gradient for end-to-end speech recognition," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5839-5843.
  23. S. Araki, N. Ono, K. Kinoshita, and M. Delcroix, "Meeting recognition with asynchronous distributed microphone array using block-wise refinement of mask-based MVDR beamformer," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5694-5698.
  24. M. Delcroix, K. Zmolikova, K. Kinoshita, A. Ogawa, and T. Nakatani, "Single channel target speaker extraction and recognition with SpeakerBeam," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5554-5558.
  25. K. Kinoshita, L. Drude, M. Delcroix, and T. Nakatani, "Listening to each speaker one by one with recurrent selective hearing networks," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5064-5068.
  26. L. Drude, T. Higuchi, K. Kinoshita, T. Nakatani, and R. HaebUmbach, "Dual frequency- and block-permutation alignment for deep learning based block-online blind source separation," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 691-695.
  27. N. Ito, T. Makino, S. Araki, and T. Nakatani, "Maximum-likelihood online speaker diarization in noisy meetings based on categorical mixture model and probabilistic spatial dictionary," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 546-550.
  28. T. Higuchi, K. Kinoshita, N. Ito, S. Karita, and T. Nakatani, "Frame-by-frame closed-form update for mask-based adaptive MVDR beamforming," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 531-535.
  29. J. Azcarreta, N. Ito, S. Araki, and T. Nakatani, "Permutation-free cGMM: Complex Gaussian mixture model with inverse Wishart mixture model based spatial prior for permutation-free source separation and source counting," in Proc. ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 51-55.
  30. K. Arai, S. Shinohara, P. Davis, S. Sunada, and T. Harayama, "Chaotic laser based online physical random bit streaming system and its application to high-throughput encryption," in Proc. OFC 2018 - Optical Fiber Communication Conference, 2018, p. Tu3G.3.

2017

Journal Papers

  1. T. Kawase, K. Niwa, M. Fujimoto, K. Kobayashi, S. Araki, and T. Nakatani, "Integration of spatial cue-based noise reduction and speech model-based source restoration for real time speech enhancement," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E100. A (1027) , no. 5, pp. 1127-1136, May 2017.
  2. A. Ogawa, and T. Hori, "Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks," Speech Communication, vol. 89, pp. 70-83 2017, May 2017.
  3. T. Higuchi, N. Ito, S. Araki, T. Yoshioka, M. Delcroix, and T. Nakatani, "Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 4, pp. 780-793, April 2017.
  4. S. Shinohara, K. Arai, P. Davis, S. Sunada, and T. Harayama, "Chaotic laser based physical random bit streaming system with a computer application interface," Optics Express, vol. 25, pp. 6461-6474, 2017.
  5. N. Suzuki, T. Hida, M. Tomiyama, A. Uchida, K. Yoshimura, K. Arai, and M. Inubushi, "Common-signal-induced synchronization in semiconductor lasers with broadband optical noise signal," IEEE Journal of Selected Topics in Quantum Electronics, 2017.

Book Chapter, Tutorial Papers

  1. S. Watanabe, M. Delcroix, F. Metze and J. Hershey (eds) Springer "New Era for Robust Speech Recognition: Exploiting Deep Learning"

Peer-reviewed Conference Papers

  1. T. Higuchi, K. Kinoshita, M. Delcroix, and T. Nakatani, "Adversarial training for data-driven speech enhancement without parallel corpus," in Proc. ASRU 2017 - IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, pp. 40-47.
  2. S. Araki, N. Ono, K. Kinoshita, and M. Delcroix, "Meeting recognition with asynchronous distributed microphone array," in Proc. ASRU 2017 - IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, pp. 32-39.
  3. K. Zmolikova, M. Delcroix, K. Kinoshita, T. Higuchi, A. Ogawa, and T. Nakatani, "Learning speaker representation for neural network based multichannel speaker extraction," in Proc. ASRU 2017 - IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, pp. 8-15.
  4. H. Ashikawa, N. Tawara, A. Ogawa, T. Iwata, T. Kobayashi, and T. Ogawa, "Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations," in Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 1263-1267.
  5. M. Hentschel, A. Ogawa, M. Delcroix, T. Nakatani, and Y. Matsumoto, "Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs," in Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 618-621.
  6. D. Tran, M. Delcroix, A. Ogawa, and T. Nakatani, "Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 3852-3856.
  7. K. Yamamoto, T. Irino, T. Matsui, S. Araki, K. Kinoshita, and T. Nakatani, "Predicting speech intelligibility using a Gammachirp envelope distortion index based on the signal-to-distortion ratio," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 2949-2953.
  8. K. Zmolikova, M. Delcroix, K. Kinoshita, T. Higuchi, A. Ogawa, and T. Nakatani, "Speaker-aware neural network based beamformer for speaker extraction in speech mixtures," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 2655-2659.
  9. A. Ogawa, K. Kinoshita, M. Delcroix, and T. Nakatani, "Improved example-based speech enhancement by using deep neural network acoustic model for noise robust example search," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1963-1967.
  10. S. Karita, A. Ogawa, M. Delcroix, and T. Nakatani, "Forward-backward convolutional LSTM for acoustic modeling," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1601-1605.
  11. D. Tran, M. Delcroix, S. Karita, M. Hentschel, A. Ogawa, and T. Nakatani, "Unfolded deep recurrent convolutional neural network with jump ahead connections for acoustic modeling," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1596-1600.
  12. T. Higuchi, K. Kinoshita, M. Delcroix, K. Zmolikova, and T. Nakatani, "Deep clustering-based beamforming for separation with unknown number of sources," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 1183-1187.
  13. K. Kinoshita, M. Delcroix, H. Kwon, T. Mori, and T. Nakatani, "Neural network-based spectrum estimation for online WPE dereverberation," in Proc. Interspeech 2017 - the 18th Annual Conference of the International Speech Communication Association, 2017, pp. 384-388.
  14. D. Tran, M. Delcroix, A. Ogawa, C. Huemmer, and T. Nakatani, "Feedback connection for deep neural network-based acoustic modeling," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5240-5244.
  15. T. Ochiai, M. Delcroix, K. Kinoshita, A. Ogawa, T. Asami, S. Katagiri, and T. Nakatani, "Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5175-7179.
  16. T. Higuchi, T. Yoshioka, K. Kinoshita, and T. Nakatani, "Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5170-5174.
  17. C. Huemmer, M. Delcroix, A. Ogawa, K. Kinoshita, T. Nakatani, and W. Kellermann, "Online environmental adaptation of CNN-based acoustic models using spatial diffuseness features," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 4875-4879.
  18. N. Ito, S. Araki, M. Delcroix, and T. Nakatani, "Probabilistic spatial dictionary based online adaptive beamforming for meeting recognition in noisy and reverberant environments," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 681-685.
  19. T. Nakatani, N. Ito, T. Higuchi, S. Araki, and K. Kinoshita, "Integrating DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 286-290.
  20. K. Kinoshita, M. Delcroix, A. Ogawa, T. Higuchi, and T. Nakatani, "Deep mixture density network for statistical model-based feature enhancement," in Proc. ICASSP 2017 - 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 251-255.
  21. N. Ito, S. Araki, and T. Nakatani, "Data-driven and physical model-based designs of probabilistic spatial dictionary for online meeting diarization and adaptive beamforming," in Proc. EUSIPCO 2017 - the 25th European Signal Processing Conference (EUSIPCO), 2017, pp. 1165-1169.
  22. S. Araki, N. Ito, M. Delcroix, A. Ogawa, K. Kinoshita, T.Higuchi, T. Yoshioka, D. Tran, S. Karita, and T.Nakatani, "Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming," in Proc. HSCMA 2017 - the 5th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2017, pp. 16-20.
  23. A. Liutkus, F.-R. Stoter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, and J. Fontecave, "The 2016 Signal Separation Evaluation Campaign," in Proc. LVA/ICA 2017 - the 13th International Conference on Latent Variable Analysis and Signal Separation, 2017, pp. 323-332.
  24. Y. Kawashima, S. Shinohara, S. Sunada, and T. Harayama, "Asymmetric emission of the quadrupole-deformed microcavity laser with spatially selective pumping," Workshop on Asymmetric Microcavity and Wave Chaos, 2017.
  25. Y. Suzuki, S. Shinohara, S. Sunada, and T. Harayama, "Chiral mode lasing in an asymmetrically deformed microcavity laser," Workshop on Asymmetric Microcavity and Wave Chaos, 2017.
  26. T. Harayama, S. Sunada, and S. Shinohara, "Universal single-mode lasing in fully-chaotic billiard lasers," Workshop on Asymmetric Mircrocavity and Wave Chaos, 2017.

2016

Journal Papers

  1. A. Ogawa, T. Hori, and A. Nakamura, “Estimating speech recognition accuracy based on error type classification,” IEEE Trans. ASLP, vol. 24, no. 12, pp. 2400-2413, December 2016.
  2. S. Shinohara, T. Fukushima, S. Sunada, T. Harayama, and K. Arai, “Long-path formation in a deformed microdisk laser,” Physical Review A, vol. 94, 013831, July 2016.
  3. S. Sunada, S. Shinohara, T. Fukushima, and T. Harayama, ”Signature of Wave Chaos in Spectral Characteristics of Microcavity Lasers,” Phys. Rev. Lett. 116, 203903, May 2016.
  4. M. Delcroix, A. Ogawa, S.-J. Hahm, T. Nakatani, and A. Nakamura, “Differenced maximum mutual information criterion for robust unsupervised acoustic model adaptation,” Computer Speech and Language (CSL), Elsevier, vol. 36, pp. 24-41, March 2016.
  5. K. Kinoshita, M. Delcroix, S. Gannot, E. Habets, R. Haeb-Umbach, W. Kellermann, V. Leutnant, R. Maas, T. Nakatani, B. Raj, A. Sehr and T. Yoshioka1, “A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research,” EURASIP journal on advanced signal processing, DOI:10.1186/s13634-016-0306-6, January 2016.

Peer-reviewed Conference Papers

  1. S. Watanabe, X. Xiao, and M. Delcroix, “Multi-Microphone Speech Recognition,” APSIPA December 2016.
  2. T. Sasaki, I. Kakesu, A. Uchida, S. Sunada, K. Yoshimura, and K. Arai, “Common-signal-induced synchronization in photonic integrated circuits driven by constant-amplitude random-phase light,” NOLTA 2016, C1L-B4, vol. 1, pp. 566-569, November 2016.
  3. T. Higuchi, T. Yoshioka, and T. Nakatani, “Sparseness-based multichannel nonnegative matrix factorization for blind source separation,” IWANC 2016, September 2016.
  4. M. Fakhry, N. Ito, S. Araki, and T. Nakatani, “Modeling audio directional statistics using a probabilistic spatial dictionary for speaker diarization in real meetings,” IWAENC 2016, September 2016.
  5. T. Higuchi, T. Yoshioka, and T. Nakatani, “Optimization of speech enhancement front-end with speech recognition-level criterion,” Interspeech 2016, September 2016.
  6. A. Ogawa, S. Seki, K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, and K. Takeda, “Robust example search using bottleneck features for example-based speech enhancement,” Interspeech 2016, pp. 3733-3737, September 2016.
  7. M. Delcroix, K. Kinoshita, A. Ogawa, T. Yoshioka, D. Tran, and T. Nakatani, “Context adaptive neural network for rapid adaptation of deep CNN based acoustic models,” Interspeech 2016, pp. 1573-1577, September. 2016.
  8. D. Tran, M. Delcroix, A. Ogawa, and T. Nakatani, “Factorized linear input network for acoustic model adaptation in noisy conditions,” Interspeech 2016, pp. 3813-3817, September 2016.
  9. K. Yamamoto, T. Irino, T. Matsui, S. Araki, K. Kinoshita, and T. Nakatani, “Speech intelligibility prediction based on the envelope power spectrum model with the dynamic compressive Gammachirp auditory filterbank,” Interspeech 2016, pp. 2885-2889 September 2016.
  10. M. Delcroix, and S. Watanabe, “Recent advances in distant speech recognition,” Interspeech 2016, September 2016.
  11. K. Zmolikova, M. Karafiat, K. Vesel, M. Delcroix, S. Watanabe, L. Burget, and H. Cernock. “ Data selection by sequence summarizing neural network in mismatch condition training,” Interspeech 2016, September 2016.
  12. Li. Li, H. Kameoka, T. Higuchi, and H. Saruwatari, “Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech,” Interspeech 2016, September 2016.
  13. N. Ito, S. Araki, and T. Nakatani, “Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing,” pp. 1153-1157 EUSIPCO-2016, August 2016.
  14. N. Murata, H. Kameoka, K. Kinoshita, S. Araki, T. Nakatani, S. Koyama, and H. Saruwatari,“ Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution,” EUSIPCO 2016, pp. 1648-1652, August 2016.
  15. S. Sunada, S. Shinohara, T. Fukushima, and T. Harayama, “Wave-chaos-induced single-frequency lasing in microcavities,” NOLTA 2016 the 2016 International Symposium on Nonlinear Theory and its Applications, Paper C1L-B2, 2016.
  16. S. Orihara, K. Koyama, S. Shinohara, S. Sunada, T. Fukushima, and T. Harayama, “Optimal design of two-dimensional external cavities for delayed optical feedback,” NOLTA 2016 the 2016 International Symposium on Nonlinear Theory and its Applications, Paper B3L-B3, 2016.
  17. K. Kawashima, S. Shinohara, S. Sunada, T. Fukushima, and T. Harayama, “Asymmetric emission caused by chaos-assisted tunneling and synchronization in two-dimensional microcavity lasers,” NOLTA 2016 the 2016 International Symposium on Nonlinear Theory and its Applications, Paper C1L-B3, 2016.
  18. S. Suzuki, S. Sunada, S. Shinohara, T. Fukushima, and T. Harayama, “Fast physical random bit generation by chaotic lasers with delayed feedback using extremely short external cavities,” Proceedings of the NOLTA 2016 the 2016 International Symposium on Nonlinear Theory and its Applications, Paper B3L-B2, 2016.
  19. S. Sekiguchi, S. Shinohara, T. Fukushima, and T. Harayama, “Effects of phase space sticky motions in nearly-integrable dielectric billiards on far-field patterns,” NOLTA 2016 the 2016 International Symposium on Nonlinear Theory and its Applications, Paper C2L-B5, 2016.
  20. M. Fujimoto and T. Nakatani, "Multi-pass feature enhancement based on generative-discriminative hybrid approach for noise robust speech recognition," ICASSP 2016, pp. 5750-5754, March 2016.
  21. T. Kawase, K. Niwa, M. Fujimoto, N. Kamado, K. Kobayashi, S. Araki, and T. Nakatani, "Real-time integration of statistical model-based speech enhancement with unsupervised noise psd estimation using microphone array," ICASSP 2016, pp. 604-608, March 2016.
  22. S. Araki, M. Okada, T. Higuchi, A. Ogawa and T. Nakatani, "Spatial correlatoin model based observation vector clustering and MVDR beamforming for meeting recognition," ICASSP2016, pp. 385-389, 2016.
  23. H. Meutzner, S. Araki, M. Fujimoto and T. Nakatani, "A generative-discriminative hybrid approach to multi-channel noise reduction for robust automatic speech recognition," ICASSP2016, pp. 5740-5744, 2016.
  24. T. Yoshioka, K. Ohnishi, F. Fang, and T. Nakatani, “Noise robust speech recognition using recent developments in neural networks for computer vision,” ICASSP 2016, pp. 5730-5734, Mar. 2016.
  25. N. Ito, S. Araki, and T. Nakatani, "Modeling audio directional statistics using a complex Bingham mixture model and its application to blind diffuse noise reduction," ICASSP2016, pp. 465-468, March 2016.
  26. M. Delcroix, K. Kinoshita, C. Yu, A. Ogawa, T. Yoshioka, and T. Nakatani, “Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions,” Proc. of ICASSP’16, pp. 5270-5274, March 2016.
  27. S. Kundu, G. V. Mantena, Y. Qian, T. Tan, M. Delcroix, and K. C. Sim, “Joint acoustic factor learning for robust deep neural network based automatic speech recognition,” Proc. of ICASSP’16, pp. 5025-5029, March 2016.
  28. T. Higuchi, N. Ito, T. Yoshioka and T. Nakatani, "Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise," ICASSP 2016, pp. 5210-5214, March 2016.

Other Conference Papers

  1. K. Yamamoto, T. Irino, T. Matsui, S. Araki, K. Kinoshita, and T. Nakatani, “Analysis of acoustic features for speech intelligibility prediction models,” 5th ASA/ASJ Joint meeting, Journal of the Acoustical Society America, vol. 140, No. 4, Pt. 2, pp. 3114, November 2016.

2015

Journal Papers

  1. M. Espi, M. Fujimoto, K. Kinoshita, and T. Nakatani, "Exploiting spectro-temporal locality in deep learning based acoustic event detection," EURASIP Journal on Audio, Speech, and Music Processing, DOI 10.1186/s13636-015-0069-2December 2015.
  2. M. Espi, M. Fujimoto, and T. Nakatani, "Acoustic event detection in speech overlapping scenarios based on high resolution spectral input and deep learning," IEICE Transactions on Information and Systems, vol. E98-D, no. 10, pp. 1799-1807, October 2015.
  3. T. Harayama and S. Shinohara, “Ray-wave correspondence in chaotic dielectric billiards,” Physical Review E, vol. 92, p. 042916 (6 pages), 2015
  4. T. Yoshioka and M. J. F. Gales, “Environmentally robust ASR front-end for deep neural network acoustic models,” Computer Speech and Language, vol. 31, no. 1, pp. 65-86, May 2015.
  5. N. Ito, E. Vincent, T. Nakatani, N. Ono, S. Araki, and S. Sagayama, "Blind suppression of nonstationary diffuse acoustic noise based on spatial covariance matrix decomposition," Springer Journal of Signal Processing Systems, vol. 79, no. 2, pp. 145-157, May 2015.(招待論文)
  6. M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, S. Araki, T. Hori, and T. Nakatani, “Strategies for distant speech recognition in reverberant environments,” EURASIP Journal on Advances in Signal Processing, July 2015.
  7. M. Inubushi, K. Yoshimura, K. Arai, and Peter Davis, "Physical random bit generators and their reliability: focusing on chaotic laser systems", Nonlinear Theory and Its Applications, vol. 6, issue 2, pp. 133-143, 2015.

News Release

  1. NTT achieved top performance in a noisy speech recognition international challenge -Advances in distortionless speech enhancement and deep-learning speech recognition techniques-, 2015. 12.14.

Book Chapter, Tutorial Papers

  1. T. Oba, K. Kobayashi, H. Uematsu, T. Asami, K. Niwa, N. Kamado, T. Kawase, and T. Hori, "Media Processing Technology for Business Task Support," NTT Technical Review, vol. 13, no. 4, April 2015.
  2. S. Araki, M. Fujimoto, T. Yoshioka, M. Delcroix, M. Espi, and T. Nakatani, "Deep learning based distant talk speech processing in real world sound environments, " NTT Technical Review, 2015.

Peer-reviewed Conference Papers

  1. M. Espi, M. Fujimoto, K. Kinoshita, and T. Nakatani, "On the importance of feature extraction for acoustic event detection using deep neural networks," Interspeech 2015, pp. 2922-2926, September 2015.
  2. M. Fujimoto and T. Nakatani, "Feature enhancement based on generative-discriminative hybrid approach with GMMs and DNNs for noise robust speech recognition," ICASSP 2015, pp. 5019-5023, April 2015.
  3. D. Q. Truong, S. Nakamura, M. Delcroix, and T. Hori, "WFST-Based Structural Classification Integrating DNN Acoustic Features and RNN Language Features for Speech Recognition," ICASSP 2015, pp. 4959-4963, April 2015.
  4. S. Araki, T. Hayashi, M. Delcroix, M. Fujimoto, K. Takeda and T. Nakatani, "Exploring multi-channel features for denoising-autoencoder-based speech enhancement,"ICASSP2015, pp. 116-120, Apr. 2015.
  5. T. Yoshioka, N. Ito, M. Delcroix, A. Ogawa, K. Kinoshita, M. Fujimoto, C. Yu, W. J. Fabian, M. Espi, T. Higuchi, S. Araki, T. Nakatani, “The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices,” ASRU 2015, pp. 436-443, Dec. 2015.
  6. T. Yoshioka, S. Karita, and T. Nakatani, “Far-field speech recognition using CNN-DNN-HMM with convolution in time,” ICASSP 2015, pp. 4360-4364, Apr. 2015.
  7. N. Ono, Z. Rafii, D. Kitamura, N. Ito, and A. Liutkus, "The 2015 signal separation evaluation campaign," LVA/ICA2015, pp. 387-395, August 2015.
  8. N. Ito, S. Araki, and T. Nakatani, "Permutation-free clustering of relative transfer function features for blind source separation," EUSIPCO2015, pp. 409-413, September 2015.
  9. M. Delcroix, K. Kinoshita, T. Hori, and T. Nakatani, “Context adaptive deep neural networks for fast acoustic model adaptation,” Proc. of ICASSP’15, pp. 4535–4539, April 2015.
  10. K. Kinoshita, M. Delcroix, A. Ogawa, T. Nakatani, ``Text-informed speech enhancement with deep neural networks,'' Interspeech, pp.1760-1764, 2015
  11. K. Kinoshita, T. Nakatani, ``Modeling inter-node acoustic dependencies with Restricted Boltzmann Machine for distributed microphone array based BSS,'' ICASSP, pp. 464-468, 2015
  12. N. Suzuki, T. Hida, I. Kakesu, A. Uchida, K. Yoshimura, and K. Arai, "Effect of the bandwidth limitation of an optical noise signal used for common-signal induced synchronization in chaotic semiconductor lasers", XXXV Dynamics Days Europe 2015, September 2015.
  13. C. Yu, A. Ogawa, M. Delcroix, Takuya Yoshioka, Tomohiro Nakatani, and John H.L. Hansen, "Robust i-vector extraction for neural network adaptation in noisy environment," Proc. Interspeech, pp. 2854-2857, 2015.
  14. A. Ogawa and T. Hori, "ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks," Proc. IEEE ICASSP, pp. 4370-4374, 2015.
  15. K. Aoyama, A. Ogawa, T. Hattori, and T. Hori, "Double-layer neighborhood graph based similarity search for fast query-by-example sopken term detection," Proc. IEEE ICASSP, pp. 5216-5220, 2015.

2014

Journal Papers

  1. S. Shinohara, S. Sunada, T. Fukushima, T. Harayama, K. Arai, and K. Yoshimura, “Efficient optical path folding by using multiple total internal reflections in a microcavity,” Applied Physics Letters, vol. 105, p.151111 (4 pages), 2014.
  2. T. Fukushima, S. Shinohara, S. Sunada, T. Harayama, K. Sakaguchi, and Y. Tokuda, “Lasing of TM modes in a two-dimensional GaAs microlaser,” Optics Express, vol. 22, pp.11912-11917, 2014.
  3. S. Sunada, T. Fukushima, S. Shinohara, T. Harayama, K. Arai, and M. Adachi, “A compact chaotic laser device with a two-dimensional external cavity structure,” Applied Physics Letters, vol. 104, p.241105 (4 pages), 2014.
  4. S. Shinohara, T. Fukushima, S. Sunada, T. Harayama, K. Arai, and K. Yoshimura, “Anticorrelated bidirectional output from quasistadium-shaped semiconductor microlasers,” Optical Review, vol. 21, pp.113-116, 2014.
  5. T. Otsuka, K. Ishiguro, T., H. Sawada, and H. G. Okuno, “Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources Based on Bayesian Nonparametrics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 2218-2232, Oct. 2014.
  6. S. Sunada, K. Arai, K. Yoshimura, and M. Adachi, "Optical Phase Synchronization by Injection of Common Broadband Low-Coherent Light", Physical Review Letters, Vol. 112, 204101, May 2014.
  7. R. Takahashi, Y. Akizawa, A. Uchida, T. Harayama, K. Tsuzuki, S. Sunada, K. Arai, K. Yoshimura, and P. Davis, "Fast physical random bit generation with photonic integrated circuits with different external cavity lengths for chaos generation", Optics Express, vol. 22, pp. 11727-11740, May 2014.
  8. S. Yamahata, Y. Yamaguchi, A. Ogawa, H. Masataki, O. Yoshioka, and S. Takahashi, "Automatic vocabulary adaptation based on semantic and acoustic similarities," IEICE Trans. Inf. & Syst. Vol. E97-D, No.6, pp.1488-1496, June 2014.

Book Chapter, Tutorial Papers

  1. Y. Iwata, T. Nakatani, T. Yoshioka, M. Fujimoto, and H. Saito, "Maximum a posteriori spectral estimation with source log-spectral priors for multichannel speech enhancement," in "Advances in Speech and Audio Processing for Coding, Enhancement and Recognition," pp. 281-317, Springer, October 2014.

Peer-reviewed Conference Papers

  1. M. Fujimoto, Y. Kubo, and T. Nakatani, "Unsupervised non-parametric Bayesian modeling of non-stationary noise for model-based noise suppression," ICASSP 2014, pp. 5562-5566, May 2014.
  2. T. Hori, Y. Kubo, and A. Nakamura, "Real-time one-pass decoding with recurrent neural network language model for speech recognition," ICASSP 2014, pp. 6364-6368, May 2014.
  3. M. Espi, M. Fujimoto, Y. Kubo, and T. Nakatani, "Spectrogram patch based acoustic event detection and classification in speech overlapping conditions," in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Array (HSCMA 2014), pp. 117-121, May 2014.
  4. Y. Kubo, J. Suzuki, T. Hori, A. Nakamura, "Restructuring Output Layers of Deep Neural Networks Using Minimum Risk Parameter Clustering," Interspeech 2014, pp. 1068-1072, September 2014.
  5. T. Yoshioka, A. Ragni, and M. J. F. Gales, “Investigation of unsupervised adaptation of DNN acoustic models with filter bank input,” ICASSP 2014, pp. 6344-6348, May 2014.
  6. T. Yoshioka, X. Chen, and M. J. F. Gales, “Impact of single-microphone dereverberation on DNN-based meeting transcription systems,” ICASSP 2014, pp. 5527-5531, May 2014.
  7. N. Ito, S. Araki, T. Yoshioka, and T. Nakatani, "Relaxed disjointness based clustering for joint blind source separation and dereverberation," IWAENC2014, pp. 268-272, September, 2014.
  8. N. Ito, S. Araki, and T. Nakatani, "Probabilistic integration of diffuse noise suppression and dereverberation," ICASSP2014, pp. 5167-5171, May 2014.
  9. M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, T. Nakatani, and A. Nakamura, “Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge,” proc. of REVERB challenge workshop, May 2014. (Best performance on the recognition task of the REVERB Challenge)
  10. M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, N. Ito, K. Kinoshita, M. Espi, S. Araki, and T. Nakatani, “Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition,” Invited paper to IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 522-526 December 2014.
  11. K. Arai, "Synchronization of semiconductor lasers for secret key distribution", Forum Math-for-Industry 2014, Octorber 2014.
  12. I. Kakesu, N. Suzuki, A. Uchida, K. Yoshimura, K. Arai, and Peter Davis, "Frequency Dependence of Common-Signal-Induced Synchronization in Semiconductor Lasers with Constant-Amplitude and Random-Phase Light", The 2014 International Symposium on Nonlinear Theory and its Applications, pp. 466-469, September 2014.
  13. S. Sunada, K. Arai, K. Yoshimura, and M. Adachi, "Common Noise-Induced Optical Phase Synchronization in Lasers", The 2014 International Symposium on Nonlinear Theory and its Applications, pp. 470-473, September 2014.
  14. K. Arai, K. Yoshimura, S. Sunada, and A. Uchida, "Synchronization Induced by Common ASE Noise in Semiconductor Lasers", The 2014 International Symposium on Nonlinear Theory and its Applications, pp. 474-477, September 2014.
  15. A. Ogawa, K. Kinoshita, T. Hori, T. Nakatani, and A. Nakamura, "Fast segment search for corpus-based speech enhancement based on speech recognition technology," Proc. IEEE ICASSP, pp. 1576-1580, 2014.
  16. K. Aoyama, A. Ogawa, T. Hattori, T. Hori, and A. Nakamura, "Zero-resource spoken term detection using hierarchical graph-based similarity search," Proc. IEEE ICASSP, pp. 7143-7147, 2014.

Other Conference Papers

  1. M. Espi,M. Fujimoto,and T. Nakatani, "Detection and classification of acoustic events using multiple resolution spectrogram patch models," in Proceedings of ASJ Autumn Meeting, pp. 1529-1530, September 2014.

2013

Journal Papers

  1. M. Delcloix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.-J. Hahm, and A. Nakamura, "Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral & temporal modeling of sounds," Computer Speech and Language (CSL), vol. 27, issue 3, pp. 851-873, May 2013.
    [Sound Demo] Demonstration of the results obtained for the PASCAL 'CHiME' Challenge
  2. M. Delcroix, S. Watanabe, T. Nakatani, and A. Nakamura, "Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer," Computer Speech and Language, Elsevier, vol. 27, issue 1, pp. 350-368, January 2013.
  3. T. Nakatani, S. Araki, T. Yoshioka, M. Delcroix, and M. Fujimoto, "Dominance based integration of spatial and spectral features for speech enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 12, pp. 2516-2531, December 2013.
  4. T. Yoshioka and T. Nakatani, "Noise model transfer: novel approach to robustness against nonstationary noise," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2182-2192, October 2013.
  5. M. Souden, K. Kinoshita, M. Delcroix, and T. Nakatani, "Location feature integration for clustering-based speech separation in distributed microphone arrays," IEEE Transactions on Audio, Speech, and Language Processing, 2013.
  6. M. Souden, S. Araki, K. Kinoshita, T. Nakatani, and H. Sawada, "A multichannel MMSE-based framework for speech source separation and noise reduction," IEEE Transactions on Audio Speech and Language Processing, vol. 21, no. 9, pp. 1913-1928, September 2013.
  7. M. Souden, K. Kinoshita, and T. Nakatani, "Towards online maximum likelihood speech clustering and separation," Journal of Acoust. Soc. America (JASA) Express letter, vol. 133, no. 5, pp. EL339-EL345, 2013.
  8. H. Sawada, H. Kameoka, S. Araki, and N. Ueda, "Multichannel extensions of nonnegative matrix factorization with complex-valued data," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 971-982, May 2013.
  9. M. Suzuki, T. Yoshioka, S. Watanabe, N. Minematsu, and K. Hirose, "Feature enhancement with joint use of consecutive corrupted and noise feature vectors with discriminative region weighting," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no, 10, pp. 2172-2181, October 2013.
  10. J. Muramatsu, "Channel coding and lossy source coding using a generator of constrained random numbers," to appear in IEEE Transactions on Information Theory, 2013.
  11. J. Muramatsu and S. Miyake, "Corrections to "Hash property and coding theorems for sparse matrices and maximum-likelihood coding,"" IEEE Transactions on Information Theory, vol.IT-59, no. 10, pp. 6952-6953, October 2013.
  12. J. B. Goette, S. Shinohara, and M. Hentschel, "Are fresnel filtering and the angular Goos-Haenchen shift the same?," Journal of Optics 15 p. 014009, 2013.
  13. S. Sunada, T. Fukushima, S. Shinohara, T. Harayama, and M. Adachi, "Stable single-wavelength emission from stadium-shaped chaotic microcavity lasers," Physical Review A 88, p. 013802, 2013.
  14. T. Fukushima, S. Shinohara, S. Sunada, T. Harayama, K. Arai, K. Sakaguchi, and T. Tokuda, "Selective excitation of the lowest-order transverse ring modes in a quasi-stadium laser diode," Optics Letters 38, pp. 4158-4161, 2013.
  15. H. Koizumi, S. Morikatsu, H. Aida, T. Nozawa, I. Kakesu, A. Uchida, K. Yoshimura, J. Muramatsu, and P. Davis, "Information-theoretic secure key distribution based on common random-signal induced synchronization in unidirectionally-coupled cascades of semiconductor lasers," Optics Express, vol. 21, no. 15, pp. 17869-17893, July 2013.

Book Chapter, Tutorial Papers

  1. T. Hori, S. Araki, T. Nakatani, and A. Nakamura, "Advances in multi-speaker conversational speech recognition and understanding," NTT Technical Review, vol. 11, no. 12, December 2013.
  2. T. Hori and A. Nakamura, "Speech recognition algorithms using weighted finite-state transducers," Morgan & Claypool Publishers, January 2013.
  3. Y. Kubo, A. Ogawa, T. Hori, and A. Nakamura, "Speech recognition based on unified model of acoustic and language aspects of speech," NTT Technical Review, vol. 11, no. 12, December 2013.
  4. H. Masataki, T. Asami, S. Yamahata, and M. Fujimoto, "Speech recognition technology that can adapt to changes in service and environment," NTT Technical Review, vol. 11 no. 7, July 2013.
  5. A. Uchida, H. Koizumi, I. Kakesu, K. Yoshimura, J. Muramatsu, and P. Davis, "Synchronized semiconductor lases for secure key distribution," SPIE Newsroom, 10.1117/2.1201311.005200, 2013.

Peer-reviewed Conference Papers

  1. A. Ogawa, T. Hori, and A. Nakamura, "Discriminative recognition rate estimation for n-best list and its application to n-best rescoring," ICASSP2013, pp. 6832-6836, 2013.
  2. A. Ogawa, T. Hori, A. Nakamura, and T. Oba, "Recognition rate estimation based on error type classification and its applications," Invited Talk at Workshop Errare 2013.
  3. M. Fujimoto and T. Nakatani, "Model-based noise suppression using unsupervised estimation of hidden Markov model for non-stationary noise," Interspeech2013, pp. 2982-2986, August 2013.
  4. M. Delcroix, A. Ogawa, S.-J. Hahm, T. Nakatani, and A. Nakamura, "Unsupervised discriminative adaptation using differenced maximum mutual information based linear regression," ICASSP2013, pp. 7888-7892, 2013.
  5. M. Delcroix, Y. Kubo, T. Nakatani, and A. Nakamura, "Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?" Interspeech2013, pp. 2992-2996, 2013.
  6. K. Aoyama, A. Ogawa, T. Hattori, T. Hori, and A. Nakamura, "Graph index based query-by-example search on a large speech data set," ICASSP2013, pp. 8520-8524, 2013.
  7. Y. Kubo, T. Hori, and A. Nakamura, "A method for structure estimation of weighted finite-state transducers and its application to grapheme-to-phoneme conversion," Interspeech2013.
  8. T. Oba, A. Ogawa, T. Hori, H. Masataki, and A. Nakamura, "Unsupervised discriminative language modeling using error rate estimator," Interspeech2013, pp. 1223-1227, 2013.
  9. Y. Kubo, T. Hori, and A. Nakamura, "Large vocabulary continuous speech recognition based on WFST structured classifiers and deep bottleneck features," ICASSP2013, pp. 7629-7633, 2013.
  10. S.-J. Hahm, A. Ogawa, M. Delcroix, M. Fujimoto, T. Hori, and A. Nakamura, "Feature space variational Bayesian linear regression and its combination with model space VBLR," ICASSP2013, pp. 7898-7902, 2013.
  11. T. Yoshioka and T. Nakatani, "Noise model transfer using affine transformation with application to large vocabulary reverberant speech recognition," ICASSP2013, pp. 7058-7062, May 2013.
  12. T. Yoshioka and T. Nakatani, "Dereverberation for reverberation-robust microphone arrays," Proc. 21th European Signal Processing Conference (EUSIPCO 2013), September 2013.
  13. T. Nakatani, M. Souden, S. Araki, T. Yoshioka, T. Hori, and A. Ogawa, "Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition," ICASSP2013, pp. 7249-7253, May 2013.
  14. T. Nakatani, M. Delcroix, and M. Fujimoto, "Speech enhancement in a car using spatial and spectral models for speaker and noise," in Proc. of The 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems, September 2013.
  15. K. Kinoshita, M. Souden, and T. Nakatani, "Blind source separation using spatially distributed microphones based on microphone-location dependent source activities," Interspeech2013, pp. 822-826, August 2013.
  16. K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, S. Gannot, and B. Raj, "The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech," WASPAA, October 2013.
  17. K. Kinoshita and T. Nakatani, "Microphone-location dependent mask estimation for BSS using spatially distributed asynchronous microphones," 2013 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), pp. 326-331, November 2013.
  18. N. Ito, S. Araki, and T. Nakatani, "Permutation-free convolutive blind source separation via full-band clustering based on frequency-independent source presence priors," ICASSP2013, pp. 3238-3242, May 2013.
  19. M. Souden, K. Kinoshita, and T. Nakatani, "An integration of source location cues for speech clustering in distributed microphone arrays," ICASSP2013, pp. 111-115, May 2013.
  20. R. Maas, W. Kellermann, A. Sehr, T. Yoshioka, M. Delcroix, K. Kinoshita, and T. Nakatani, "Formulation of the REMOS concept from an uncertainty decoding perspective," in Proc. of the international conference on digital signal processing (IEEE), pp. 1-6, July 2013.
  21. A. Sehr, T. Yoshioka, M. Delcroix, K. Kinoshita, T. Nakatani, R. Maas, and W. Kellermann, "Conditional emission densities for interconnecting speech enhancement and recognition systems," Interspeech2013, pp. 3502-3506, 2013.
  22. I. Jafari, N. Ito, M. Souden, S. Araki, and T. Nakatani, "Source number estimation based on clustering of speech activity sequences for microphone array processing," Proc. IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP), September 2013.
  23. Y. Uezu, K. Kinoshita, M. Souden, and T. Nakatani, "On the robustness of distributed EM based BSS in asynchronous distributed microphone array scenarios," Interspeech2013, pp. 3298-3302, August 2013.
  24. N. Ono, Z. Koldovsky, S. Miyabe, and N. Ito, "The 2013 signal separation evaluation campaign," Proc. IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP), September 2013.
  25. J. Muramatsu, "Equivalence between inner regions for broadcast channel coding," The Proceedings of the 2013 IEEE Information Theory Workshop, pp. 164-168, 2013.
  26. J. Muramatsu, "Channel code using a constrained random number generator," The Proceedings of the 2013 IEEE International Symposium on Information Theory, pp. 2463-2467, 2013.
  27. J. Muramatsu, "Lossy source code using a constrained random number generator," The Proceedings of the 2013 IEEE International Symposium on Information Theory, pp. 2354-2358, 2013.
  28. K. Yoshimura, J. Muramatsu, K. Arai, S. Shinohara, and A. Uchida, "Synchronization of semiconductor lasers by injection of common broadband random light," Proc. of the 2013 International Symposium on Nonlinear Theory and Its Applications, pp. 449-452, 2013.
  29. K. Yoshimura, "Existence and stability of discrete breathers in Fermi-Pasta-Ulam lattices," Proc. of the 2013 International Symposium on Nonlinear Theory and Its Applications, pp. 274-277, 2013.
  30. K. Arai, S. Shinohara, S. Sunada, K. Yoshimura, T. Harayama, and A. Uchida, "Noise effects on generalized chaos synchronization in semiconductor lasers," Proc. of the 2013 International Symposium on Nonlinear Theory and Its Applications, pp. 413-416, 2013.
  31. S. Shinohara, T. Fukushima, S. Sunada, T. Harayama, K. Arai, and K. Yoshimura, "Nonlinear modal dynamics in two-dimensional cavity microlasers," Proc. of the 2013 International Symposium on Nonlinear Theory and Its Applications, pp. 409-412. 2013.
  32. T. Fukushima, S. Shinohara, S. Sunada, T. Harayama, K. Sakaguchi, and Y. Tokuda, "Ray dynamical simulation of penrose unilluminable room cavity," Frontiers in Optics 2013/Laser Science XXIX, October 2013.
  33. I. Kakesu, H. Koizumi, S. Morikatu, H. Aida, T. Nozawa, A. Uchida, K. Yoshimura, J. Muramatsu, and P. Davis, "Secure key distribution using common-signal-induced synchronization in cascaded semiconductor lasers," Proc. of Frontiers in Optics, 2013.
  34. R. Takahashi, Y. Akizawa, A. Uchida, T. Harayama, K. Tsuzuki, S. Sunada, K. Yoshimura, K. Arai, and Peter Davis, "Physical random number generation using photonic integrated circuit with mutually coupled semiconductor lasers," Frontiers in Optics 2013, October 8-12, 2013.

Other Conference Papers

  1. T. Yoshioka and M. J. F. Gales, "An investigation of single-microphone automatic meeting transcription," present at the 2nd UKSpeech Conference, September 2013.

2012

Journal Papers

  1. T. Hori, S. Araki, T. Yoshioka, M. Fujimoto, S. Watanabe, T. Oba, A. Ogawa, K. Otsuka, D. Mikami, K. Kinoshita, T. Nakatani, A. Nakamura, and J. Yamato, "Low-latency real-time meeting recognition and understanding using distant microphones and omni-directional camera," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 499-513, February 2012.
  2. A. Ogawa and A. Nakamura, "Joint estimation of confidence and error causes in speech recognition," Speech Communication, vol. 54, no. 9, pp. 1014-1028, November 2012.
  3. M. Fujimoto, S. Watanabe, and T. Nakatani, "Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection," Speech Communication, vol. 54, no. 2, pp. 229-244, February 2012.
  4. Y. Kubo, S. Watanabe, T. Hori, and A. Nakamura, "Structural classification methods based on weighted finite-state transducers for automatic speech recognition," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, issue 8, pp. 2240-2251, October 2012.
  5. T. Oba, T. Hori, A. Nakamura, and A. Ito, "Round-robin duel discriminative language models," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, Issue 4, pp. 1244-1255, May 2012.
  6. T. Oba, T. Hori, A. Nakamura, and A. Ito, "Model shrinkage for discriminative language models," IEICE Transactions on Information and Systems, vol. E95-D, No. 5, pp. 1465-1474, May 2012.
  7. T. Yoshioka, and T. Nakatani, "Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening", IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 10, pp. 2707-2720, December 2012.
  8. M. Souden, M. Delcroix, K. Kinoshita, T, Yoshioka, and T. Nakatani, "Noise power spectral density tracking: A maximum likelihood perspective," IEEE Signal Processing Letters, vol. 19, no. 8, pp. 495-498, August 2012.
  9. K. Ishiguro, T. Yamada, S. Araki, T. Nakatani, and H. Sawada, "Probabilistic speaker diarization with bag-of-words representations of speaker angle information," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 2, pp. 447-460, 2012.
  10. E. Vincent, S. Araki, F. Theis, G. Nolte, P. Bofill, H. Sawada, A. Ozerov, V. Gowreesunker, D. Lutter, and N. Q. K. Duong, "The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges," Signal Processing vol. 92, issue 8, pp. 1928-1936, August 2012.
  11. J. Muramatsu and S. Miyake, "Construction of codes for wiretap channel and secret key agreement from correlated source outputs based on hash property," IEEE Transactions on Information Theory, vol. IT-58, no. 2, pp. 671-692, February 2012.
  12. J. Muramatsu and S. Miyake, "Corrections to "Hash property and fixed-rate universal coding theorems,"" IEEE Transactions on Information Theory, vol.IT-58, no. 5, pp. 3305-3307, May 2012.
  13. K. Yoshimura, J. Muramatsu, P. Davis, T. Harayama, A. Uchida, H. Okumura, S. Morikatsu, H. Aida, and A. Uchida, "Secure key distribution using correlated randomness in lasers driven by common random light," Physical Review Letters, vol. 108, 070602, February 2012.
  14. K. Yoshimura, "Stability of discrete breathers in diatomic nonlinear oscillator chains," Nonlinear theory and its applications, IEICE 3, pp. 52-66, 2012.
  15. K. Yoshimura, "Stability of discrete breathers in nonlinear Klein-Gordon type lattices with pure anharmonic couplings," Journal of Mathematical Physics 53, 102701, 2012.
  16. K. Arai, S, Sunada, T, Harayama, and P, Davis, "The randomness in galton board from viewpoint of predictability : sensitivity and statistical bias of output states," Physical Review, E 86, 056216, 2012.
  17. H. Aida, M. Arahata, H. Okumura, H. Koizumi, A. Uchida, K. Yoshimura, J. Muramatsu, and P. Davis, "Experiment on synchronization of semiconductor lasers by common injection of constant-amplitude random-phase light," Optics Express, vol. 20, no. 11, pp. 11813-11829, May 2012.
  18. T. Harayama, S. Sunada, K. Yoshimura, J. Muramatsu, K. Arai, A. Uchida, and P. Davis, "Theory of fast non-deterministic physical random bit generation with chaotic lasers," Physical Review E, vol. 85, 046215, April 2012.
  19. S. Sunada, T. Harayama, P. Davis, K. Tsuzuki, K. Arai, K. Yoshimura, and A. Uchida, "Noise amplification by chaotic dynamics in a delayed feedback laser system and its application to nondeterministic random bit generation," Chaos 22, 047513, 2012.
  20. Y. Akizawa, T. Yamazaki, A. Uchida, T. Harayama, S. Sunada, K. Arai, K. Yoshimura, and P. Davis, "Fast random number generation with bandwidth-enhanced chaotic semiconductor lasers at 8×40 Gb/s," IEEE Photonics Technolgy Letters 24, pp. 1042-1044, 2012.
  21. T. Mikami, K. Kanno, K. Aoyama, A. Uchida, T. Ikeguchi, T. Harayama, S. Sunada, K. Arai, K. Yoshimura, and P. Davis, "Estimation of entropy rate in a fast physical random-bit generator using a chaotic semiconductor laser with intrinsic noise," Physical Review, E 85, 016211, January 2012.
  22. S. Sunada, T. Harayama, P. Davis, K. Tsuzuki, K. Arai, K. Yoshimura, and A. Uchida, "Noise amplification in high dimensional chaotic laser systems and its application to nondeterministic physical random bit generation," Chaos: Interdisciplinary Journal of Nonlinear Science vol. 22, 047513, 2012.
  23. T. Hirayama, S. Arakawa, K. Arai, and M. Murata, "Dynamics of feedback-induced packet delay in ISP router-level topologies," IEICE Transactions on Communications, vol. E95-B, no. 9, pp. 2785-2793, 2012.
  24. J.-W. Ryu, J. Cho, C.-M. Kim, S. Shinohara, and S. W. Kim, "Terahertz beat frequency generation from two-mode lasing operation of coupled microdisk laser," Optics Letters 37, pp. 3210-3213, 2012.

Book Chapter, Tutorial Papers

  1. T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann, "Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 114-126, November 2012.
  2. K. Yoshimura, S. Shinohara, and K. Arai, "Fast Physical Random Number Generation Using Semiconductor Laser Chaos," NTT Technical Review 10, no.11, 2012.

Peer-reviewed Conference Papers

  1. T. Hori, K. Kinoshita, S. Araki, A. Ogawa, T. Yoshioka, M. Fujimoto, T. Oba, M. Delcroix, M. Souden, Y. Kubo, S.-J. Hahm, D. Mikami, K. Otsuka, T. Nakatani, A. Nakamura, and J. Yamato, "Real-time audio-visual meeting recognition and understanding using distant microphone array," ICASSP2012, Show & Tell session.
  2. A. Ogawa, T. Hori and A. Nakamura, "Recognition rate estimation based on word alignment network and discriminative error type classification," SLT, 2012.
  3. A. Ogawa, T. Hori, and A. Nakamura, "Error type classification and word accuracy estimation using alignment information in word confusion network," ICASSP2012, pp. 4925-4928, March 2012.
  4. M. Fujimoto and T. Nakatani, "A reliable data selection for model-based noise suppression using unsupervised joint speaker adaptation and noise model estimation," ICSPCC 2012, pp. 4713-4716, Aug 2012. (invited talk)
  5. M. Fujimoto, S. Watanabe, and T. Nakatani, "Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation," ICASSP2012, pp. 4713-4716, March 2012.
  6. M. Espi, M. Fujimoto, D. Saito, N. Ono, and S. Sagayama, "A tandem connectionist model using combination of multi-scale spectro-temporal features for acoustic event detection," ICASSP2012, pp. 4293-4296, March 2012.
  7. M. Delcroix, A. Ogawa, T. Nakatani, and A. Nakamura, "Dynamic variance adaptation using differenced maximum mutual information," MLSLP, 2012.
  8. M. Delcroix, A. Ogawa, S. Watanabe, T. Nakatani, and A. Nakamura, "Discriminative feature transforms using difference maximum mutual information," ICASSP2012, pp. 4753-4756, March 2012.
  9. Y. Kubo, T. Hori, and A. Nakamura, "Integrating deep neural networks into structured classification approach based on weighted finite-state transducers," Interspeech2012, September 2012.
  10. T. Oba, T. Hori, A. Nakamura, and A. Ito, "Spoken document retrieval by discriminative modeling in a high dimensional feature space," ICASSP2012, pp.5153-5156, March 2012.
  11. S.-J. Hahm, A. Ogawa, M. Fujimoto, T. Hori, and Atsushi Nakamura, "Speaker adaptation using variational Bayesian linear regression in normalized feature space," Interspeech2012, September 2012.
  12. S.-J. Hahm, S. Watanabe, M. Fujimoto, T. Hori, and A. Nakamura, "Normalization and adaptation by consistently employing MAP estimation," IWSML 2012.
  13. S. Watanabe, Y. Kubo, T. Oba, T. Hori, and A. Nakamura, "Bag of arcs: new representation of speech segment features based on finite state machines," ICASSP2012, pp. 4201-4204, March 2012.
  14. S. Kobashikawa, T. Hori, Y. Yamaguchi, T. Asami, H. Masataki, and S. Takahashi, "Efficient beam width control to suppress excessive speech recognition computation time based on prior score range normalization," Interspeech2012, September 2012.
  15. S. Kobashikawa, T. Hori, Y. Yamaguchi, T. Asami, H. Masataki, and S. Takahashi, "Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation," SLT, 2012.
  16. S. Yamahata, Y. Yamaguchi, A. Ogawa, H. Masataki, O. Yoshioka, and S. Takahashi, "Automatic vocabulary adaptation based on semantic similarity and speech recognition confidence measure," Interspeech2012, September 2012.
  17. E. Chuangsuwanich, S. Watanabe, T. Hori, T. Iwata, and J. Glass, "Handling uncertain observations in unsupervised topic-mixture language model adaptation," ICASSP2012, pp. 5033-5036, March 2012.
  18. M. Suzuki, T. Yoshioka, S. Watanabe, N. Minematsu, and K. Hirose, "MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments," ICASSP2012, pp. 4109-4112, March 2012.
  19. R. Roller, S. Watanabe and T. Iwata, "Effect of dialog acts on word use in polylogue," ICASSP2012, pp. 4969-4972, March 2012.
  20. T. Nakatani, T. Yoshioka, S. Araki, M. Delcroix, and M. Fujimoto, "Logmax observation model with mfcc-based spectral prior for reduction of highly nonstationary ambient noise," ICASSP2012, pp. 4029-4032, March 2012.
  21. S. Araki and T. Nakatani, "Sparse vector factorization for underdetermined BSS using wrapped-phase GMM and source log-spectral prior," ICASSP2012, pp. 265-268, March 2012.
  22. S. Araki, F. Nesta, E. Vincent, Z. Koldovsky, G. Nolte, A. Ziehe, and A. Benichoux, "SiSEC2011 overview: Audio source separation," in Proc. LVA/ICA2012, pp. 414-422, March 2012.
  23. K. Kinoshita, M. Delcroix, M. Souden, and T. Nakatani, "Example-based speech enhancement with joint utilization of spatial, spectral & temporal cues of speech and noise," Interspeech2012.
  24. T. Yoshioka and T. Nakatani, "Time-varying residual noise feature model estimation for multi-microphone speech recognition," ICASSP2012, pp. 4913-4916, March 2012.
  25. T. Yoshioka, and D. Sakaue, "Log-normal matrix factorization with application to speech-music separation," SAPA-SCALE 2012, pp. 80-85, September 2012.
  26. T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann, "Survey on approaches to speech recognition in reverberant environments," APSIPA, 2012. (Invited paper)
  27. M. Souden, K. Kinoshita, M. Delcroix, and T. Nakatani, "Distributed microphone array processing for speech source separation with classifier fusion," MLSP, September 2012.
  28. M. Souden, S. Araki, K. Kinoshita, T. Nakatani, and H. Sawada, "A multichannel MMSE-based framework for joint blind source separation and noise reduction," ICASSP2012, pp. 109-112, March 2012.
  29. T. Maruyama, S. Araki, T. Nakatani, S. Miyabe, T. Yamada, S. Makino, and A. Nakamura, "New analytical update rule for TDOA inference for underdetermined BSS in noisy environments," ICASSP2012, pp. 269-272, March 2012.
  30. Y. Iwata and T. Nakatani, "Introduction of speech log-spectral priors into dereverberation based on Itakura-Saito distance minimization," ICASSP2012, pp. 245-248, March 2012.
  31. H. Sawada, H. Kameoka, S. Araki, and N. Ueda, "Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization," ICASSP2012, pp. 261-264, March 2012.
  32. G. Nolte, D. Lutter, A. Ziehe, F. Nesta, E. Vincent, Z. Koldovsky, A. Benichoux, and S. Araki, "SiSEC2011 overview: biomedical data analysis," in Proc. LVA/ICA2012, pp. 423-429, March 2012.
  33. T. Maruyama, S. Araki, T. Nakatani, S. Miyabe, T. Yamada, S. Makino, and A. Nakamura, "New analytical calculation and estimation for TDOA inference for underdetermined BSS in noisy environments," APSIPA, 2012.
  34. J. Muramatsu, "Information theoretic security based on bounded observability," DIMACS Workshop on Information-Theoretic Network Security, November 2012.
  35. J. Muramatsu and S. Miyake, "Uniform random number generation by using sparse matrix," Proceedings of the 2012 IEEE Information Theory Workshop, pp. 612-616, 2012.
  36. K. Yoshimura, J. Muramatsu, P. Davis, A. Uchida, and T. Harayama, "Secure key distribution using correlated randomness in optical devices," Proceedings of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp.336-339, 2012.
  37. K. Yoshimura, "Existence and stability of localized modes in one-dimensional nonlinear lattices," The 19th International Symposium on Nonlinear Acoustics, AIP Conference Proceedings 1474, pp. 59-62, 2012.
  38. K. Yoshimura, "Stability of discrete breathers in nonlinear Klein-Gordon type lattices," Proc. of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 403-406, 2012.
  39. K. Arai, T. Harayama, P. Davis, J. Muramatsu, and S. Sunada, "Multi-bit sampling from chaotic time series in random number generation," Proceedings of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 268-271, 2012.
  40. S. Sunada, T. Harayama, P. Davis, K. Arai, K. Yoshimura, K. Tsuzuki, M. Adachi, and A. Uchida, "Noise amplification based on dynamical instabilities in semiconductor laser systems and its application to nondeterministic random bit generators," Proc. of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 263-267, 2012.
  41. S. Miyake and J. Muramatsu, "Universal codes on continuous alphabet using sparse matrices," Proceedings of the 2012 International Symposium on Information Theory and its Applications, pp. 493-497, 2012.
  42. S. Miyake and J. Muramatsu, "On a construction of universal network code using LDPC matrices," The Proceedings of the 2012 IEEE International Symposium on Information Theory, pp. 1306-1310, 2012.
  43. H. Koizumi, S. Morikatsu, H. Aida, M. Arahata, T. Nozawa, A. Uchida, K. Yoshimura, J. Muramatsu, and P. Davis, "Experiment on secure key distribution using correlated random phenomenon in semiconductor lasers," Proceedings of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 340-343, 2012.
  44. T. Yamazaki, Y. Akizawa, A. Uchida, K. Yoshimura, K. Arai, and P. Davis, "Fast random number generation with bandwidth-enhanced chaos and post-processing," Proc. of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 142-145, 2012.
  45. R. Takahashi, Y. Akizawa, T. Yamazaki, A. Uchida, T. Harayama, K. Tsuzuki, S. Sunada, K. Yoshimura, K. Arai, and P. Davis, "Random number generation with a photonic integrated circuit for fast chaos generation," Proc. of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 138-141, 2012.
  46. Y. Akizawa, R. Takahashi, H. Aida, T. Yamazaki, A. Uchida, T. Harayama, K. Tsuzuki, S. Sunada, K. Yoshimura, K. Arai, and P. Davis, "Nonlinear dynamics in a photonic integrated circuit for fast chaos generation," Proc. of the 2012 International Symposium on Nonlinear Theory and Its Applications, pp. 134-137, 2012.
  47. T. Hirayama, S. Arakawa, K. Arai, and M. Murata, "On the power-law characteristic of link capacity distribution in ISP router-level topologies," 21st International Conference on Computer Communications and Networks (ICCCN12), July 30 - August 2, 2012.

2011

Journal Papers

  1. T. Yoshioka, T. Nakatani, M. Miyoshi, and H. G. Okuno, “New method for blind separation and dereverberation of highly reverberant mixtures,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 69-84, January 2011.
  2. A. Ogawa, S. Takahashi, and A. Nakamura, “Efficient combination of likelihood recycling and batch calculation for fast acoustic likelihood calculation,” IEICE TRANS. INF. & SYST., VOL.E94-D, NO.3 March 2011.
  3. S. Araki, T. Nakatani, and H. Sawada, “Sparse source separation based on simultaneous clustering of source locational and spectral features,” Acoustical Science and Technology, Acoustic Letter, (in press), 2011.
  4. T. Harayama, S. Sunada, K. Yoshimura, K. Tsuzuki, P. Davis, and A. Uchida, “Fast non-deterministic random bit generation with on-chip chaos lasers,” Physical Review A Vol. 83 031803(R), 2011.
  5. S. Sunada, T. Harayama, K. Arai, K. Yoshimura, P. Davis, K. Tsuzuki, and A. Uchida, “Chaos laser chips with delayed optical feedback using a passive ring waveguide,” Optics Express, Vol. 19 pp. 5713-5724, 2011.
  6. S. Sunada, T. Harayama, K. Arai, K. Yoshimura, K. Tsuzuki, A. Uchida, and P. Davis, “Random optical pulse generation with bistable semiconductor ring lasers,” Optics Express Vol. 19, pp. 7439-7450, 2011.
  7. K. Yoshimura, “Existence and stability of discrete breathers in diatomic Fermi-Pasta-Ulam type lattices,” Nonlinearity 24, 293-317, 2011.
  8. S. Watanabe, T. Iwata, T. Hori, A. Sako, and Y. Ariki, “Topic Tracking Language Model for Speech,” Computer Speech and Language, vol. 25, issue 2, pp. 440-461, 2011.

Book Chapter, Tutorial Papers

  1. M. Fujimoto, “Chapter 1: Integration of statistical model-based voice activity detection and noise suppression for noise robust speech recognition,” in "Advances in Robust Speech Recognition Technology,'' Bentham Publishing Services, March 2011.

Peer-reviewed Conference Papers

  1. M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.-J. Hahm and A. Nakamura, ``Speech Recognition in the Presence of Highly Non-Stationary Noise Based on Spatial, Spectral and Temporal Speech/Noise Modeling Combined with Dynamic Variance Adaptation,'' in Proc. CHiME Workshop, pp. 12-17, 2011.
  2. S. Sunada, T. Harayama, K. Arai, K. Yoshimura, K. Tsuzuki, A. Uchida, and P. Davis, “Theory and experiment of fast non-deterministic random bit generation with on-chip chaos lasers,” Dynamics Days 2011, pp. 31-32, January 2011.
  3. S. Araki, T. Hori, T. Yoshioka, M. Fujimoto, S. Watanabe, T. Oba, A. Ogawa, K. Otsuka, D. Mikami, M. Delcroix, K. Kinoshita, T. Nakatani, A. Nakamura, J. Yamato, “Low-latency meeting recognition and understanding using distant microphones,” to appear in Proceedings of the 3rd Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2011), May 2011, presented in the Demo Session.
  4. M. Fujimoto, S. Watanabe, and T. Nakatani, “Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition,” Proc. of ICASSP '11, May 2011. (accepted)
  5. A. Ogawa, S. Takahashi, and A. Nakamura, “Machine and acoustical condition dependency analyses for fast acoustic likelihood calculation techniques,” Proc. ICASSP, May 2011, to appear.
  6. T. Yoshioka, and T. Nakatani, “A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation,” to appear in Proceedings of the Third Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2011), May 2011.
  7. T. Yoshioka, and T. Nakatani, “Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation,” to appear in Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), May 2011.
  8. N. Yasuraoka, H. Kameoka, T.Yoshioka, and H. G. Okuno, “I-divergence-based dereverberation method with auxiliary function approach,” to appear in Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), May 2011.
  9. T. Nakatani, S. Araki, T. Yoshioka, and M. Fujimoto, “Joint unsupervised learning of hidden Markov source models and source location models for multichannel source separation,” to appear in Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2011), May 2011.
  10. Y. Kubo, S. Wiesler, R. Schlueter, H. Ney, S. Watanabe, A. Nakamura, T. Kobayashi, “Subspace Pursuit Method for Kernel-Log-Linear Models,” Proc. ICASSP 2011, Prague, Chez, May 2011.
  11. S. Araki, T. Hori, T. Yoshioka, M. Fujimoto, S. Watanabe, T. Oba, A. Ogawa, K. Otsuka, D. Mikami, M. Delcroix, K. Kinoshita, T. Nakatani, A. Nakamura, and J. Yamato, “Demonstration on low-latency meeting recognition and understanding using distant microphones,” HSCMA2011, (accepted).
  12. M. Delcroix, S. Watanabe, T Nakatani, and A Nakamura, “Discriminative approach to dynamic variance adaptation for noisy speech recognition,” Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2011 (to appear).
  13. S. Araki and T. Nakatani, “Hybrid Approach for Multichannel Source Separation Combining Time-frequency Mask with Multi-channel Wiener Filter,” ICASSP2011, (accepted).
  14. H. Sawada, H. Kameoka, S. Araki, and N. Ueda, “FORMULATIONS AND ALGORITHMS FOR MULTICHANNEL COMPLEX NMF,” ICASSP 2011, (accepted)
  15. K. Iso, S. Araki, S. Makino, T. Nakatani, H. Sawada, T. Yamada, and A. Nakamura, “BLIND SOURCE SEPARATION OF MIXED SPEECH IN A HIGH REBERBERATION ENVIRONMENT,” HSCMA2011, (accepted)
  16. T. Oba, T. Hori, A. Ito, and A. Nakamura, “Round-Robin Duel Discriminative Language Models in One-pass Decoding with On-the-fly Error Correction,” Proceedings of ICASSP, 2011.
  17. S. Watanabe, D. Mochihashi, T. Hori, and A. Nakamura, “Gibbs Sampling Based Multi-Scale Mixture Model for Speaker Clustering,” Proc. ICASSP'11.
  18. D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, “High Accurate Model-Integration-Based Voice Conversion Using Dynamic Features and Model Structure Optimization,” Proc. ICASSP'11.
  19. T. Maekawa and S. Watanabe, “Modeling Activities with User's Physical Characteristics Data,” Proc. ISWC'11.

2010

Journal Papers

  1. T. Yoshioka, T. Nakatani, M. Miyoshi, and H. G. Okuno, “New method for blind separation and dereverberation of highly reverberant mixtures,” accepted for publication in IEEE Transactions on Audio, Speech, and Language Processing, now available on IEEE Xplore, January 2010.
  2. T. Oba, T. Hori, and A. Nakamura, “Improved Sequential Dependency Analysis Integrating Labeling-based Sentence Boundary Detection,” IEICE, Vol.E93-D,No.5,pp.-, May 2010.
  3. J. Muramatsu, and S. Miyake “Hash property and coding theorems for sparce matrices and maximal-likelihood coding,” IEEE Transactions on Information Theory, vol. IT-56, no. 5, pp. 2143-2167, May 2010.
  4. J. Muramatsu, and S. Miyake “Hash property and fixed-rate universal coding theorems,” IEEE Transactions on Information Theory, vol. IT-56, no. 6, pp. 2688-2698, Jun. 2010.
  5. J. Muramatsu, and S. Miyake, “Construction of broadcast channel code based on hash property,” in Proceedings of the 2010 IEEE International Symposium on Information Theory, pp. 575-579, 2010.
  6. K. Ishizuka, S. Araki, and T. Kawahara, “Speech activity detection for muti-party conversation analyses based on likelihood ratio test on spatial magnitude,” IEEE Transaction on Audio, Speech, and Language Processing (in press).
  7. K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazaki, “Noise robust voice activity detection based on periodic to aperiodic component ratio,” Speech Communication, Vol.52, No.1, pp. 41-60, 2010.
  8. S. Araki, H. Sawada, and S. Makino, “Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers,” IEEE Trans. Audio, Speech, and Language Processing, (submitting)
  9. S. Watanabe and A. Nakamura, “Predictor-Corrector Adaptation based on a Macroscopic Time Evolution System,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, issue 2, pp. 395-406, 2010.

Book Chapter, Tutorial Papers

  1. T. Yoshioka, T. Nakatani, K. Kinoshita, and M. Miyoshi, “Speech dereverberation and denoising based on time varying speech model and autoregressive reverberation model,” to appear in Speech Processing in Modern Communication: Challenges and Perspectives, Israel Cohen, Jacob Benesty, and Sharon Gannot (eds.), Springer, pp. 151-182, February 2010.
  2. M. Fujimoto, K. Takeda, and S. Nakamura, “Chapter 4.4.2: An evaluation database for in-car speech recognition and its common evaluation framework,” in "Resources and Standards of Spoken Language Systems - Advances in Oriental Spoken Language Processing, " World Scientific Publishing Co., March 2010.
  3. M. Miyoshi, M. Delcroix, K. Kinoshita, T. Yoshioka, T. Nakatani, and T. Hikichi, “Inverse-filtering for speech dereverberation without the use of room acoustics information,” to appear in Speech Dereverberation, Patrik A. Naylor and Nikolay Gaubitch (eds.), Springer.
  4. M. Fujimoto, “Chapter 1: Integration of statistical model-based voice activity detection and noise suppression for noise robust speech recognition,” in "Advances in Robust Speech Recognition Technology," Bentham Publishing Services. (in publishing)

Peer-reviewed Conference Papers

  1. T. Yoshioka, T. Nakatani, and H. G. Okuno, “Noisy speech enhancement based on prior knowledge about spectral envelope and harmonic structure,” in Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp. 4270-4273, March 2010.
  2. N. Yasuraoka, T. Yoshioka, T. Nakatani, A. Nakamura, and Hiroshi G. Okuno, “Music dereverberation using harmonic structure source model and Wiener filtering,” in Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp. 53-56, March 2010.
  3. T. Hori, S. Watanabe, and A. Nakamura, “Search Error Risk Minimization in Viterbi Beam Search for Speech Recognition,” in Proc. ICASSP2010, pp. 4934-4937, March 2010.
  4. T. Oba, T. Hori and A. Nakamura, “A Comparative Study on Methods of Weighted Language Model Training for Reranking LVCSR N-best Hypotheses,” in Proc. ICASSP2010, pp. 5126-5129, March 2010.
  5. S. Watanabe, T. Hori, E. McDermott, and A. Nakamura, “A Discriminative Model for Continuous Speech Recognition Based on Weighted Finite State Transducers,” in Proc. ICASSP2010, pp. 4922-4925, March 2010.
  6. A. Ogawa and A. Nakamura, “Discriminative confidence and error cause estimation for extended speech recognition function,” Proc. ICASSP, pp. 4454-4457, March 2010.
  7. A. Ogawa and A. Nakamura, “A novel confidence measure based on marginalization of jointly estimated error cause probabilities,” Proc. Interspeech, September 2010.
  8. J. Muramatsu, K. Yoshimura, K., and P. Davis, “Information theoretic security based on bounded observability,” Proceedings of the 4th International Conference on Information Theoretic Security, Lecture Notes on Computer Science (LNCS), vol.5973, pp.128-139, Splinger (in press).
  9. D. Cournapeau, S. Watanabe, A. Nakamura, and T. Kawahara, “Using Online Model Comparison In The Variational Bayes Framework For Online Unsupervised Voice Activity Detection,” ICASSP 2010, pp. 4462-4465, 2010.
  10. E. McDermott, S. Watanabe, and A. Nakamura, “Discriminative Training Based On An Integrated View Of MPE And MMI In Margin And Error Space,” ICASSP 2010, pp. 4894-4897, 2010.
  11. H. Watanabe, S. Katagiri, K. Yamada, E. McDermott, A. Nakamura, S. Watanabe, and M. Ohsaki, “Minimum Error Classification With Geometric Margin Control,” ICASSP 2010, pp. 2170-2173, 2010.
  12. K. Aoyama, S. Watanabe, H. Sawada, Y. Minami, N. Ueda, and K. Saito, “Fast Similarity Search On A Large Speech Data Set With Neighborhood Graph Indexing,” ICASSP 2010, pp. 5358-5361, 2010.
  13. S. Araki, T. Nakatani and H. Sawada, “Simultaneous clustering of mixing and spectral model parameters for blind sparse source separation,” ICASSP2010, 2010.
  14. T. Hori, S. Watanabe, and A. Nakamura, “Search Error Risk Minimization In Viterbi Beam Search For Speech Recognition,” ICASSP 2010, pp. 4934-4937, 2010.
  15. T. Nakatani and S. Araki, “SINGLE CHANNEL SOURCE SEPARATION BASED ON SPARSE SOURCE OBSERVATION MODEL WITH HARMONIC CONSTRAINT,” ICASSP2010, 2010.
  16. Y. Ansai, S. Araki, S. Makino, T. Nakatani, T. Yamada, A. Nakamura and N. Kitawaki, “Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation,” ISCAS2010, (to appear)

2009

Journal Papers

  1. T. Yoshioka, T. Nakatani, and M. Miyoshi, “Integrated speech enhancement method using noise suppression and dereverberation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 2, pp. 231-246, February. 2009.
  2. S. Miyake, and J. Muramatsu, “A Construction of Channel Code, Joint Source-Channel Code, and Universal Code for Arbitrary Stationary Memoryless Channels using Sparse Matrices,” IEICE Transactions on Fundamentals, vol.E92-A, no.9, pp.2333-2344, September. 2009.
  3. H. K. Solvang, Y. Nagahara, S. Araki, H. Sawada and S. Makino, “Frequency-Domain Pearson Distribution Approach for Independent Component Analysis (FD-Pearson-ICA) in Blind Source Separation,” IEEE Trans. Speech & Language Processing, vol, 17, no. 4, pp. 639-649, 2009.
  4. K. Kinoshita, M. Delcroix, T. Nakatani and M. Miyoshi, “Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction” IEEE Transactions on Audio, Speech and Language processing
  5. M. Delcroix, T. Nakatani, and S. Watanabe, “Static and dynamic variance compensation for recognition of reverberant speech with dereverberation pre-processing,” IEEE transactions on Audio, Speech, and Language Processing, vol. 17, issue 2, pp. 324-334, 2009.
  6. S. Araki, H. Sawada, R. Mukai and S. Makino, “DOA estimation for multiple sparse sources with arbitrarily arranged multiple sensors,” Journal of Signal Processing Systems, doi:10.1007/s11265-009-0413-9, 2009.

Book Chapter, Tutorial Papers

  1. T. Hori, K. Sudoh, H. Tsukada, and A. Nakamura, “World-Wide Media Browser--Multilingual Audio-visual Content Retrieval and Browsing System,” NTT Technical Review, Vol. 7, No. 2, February 2009.
  2. S. Makino, S. Araki, S. Winter, H. Sawada, “Underdetermined Blind Source Separation using Acoustic Arrays,” Handbook on Array Processing and Sensor Networks, S. Haykin, and K. J. R. Liu Eds., Wiley, 2009 (in press).

Peer-reviewed Conference Papers

  1. T. Yoshioka, H. Tachibana, T. Nakatani, and M. Miyoshi, “Adaptive dereverberation of speech signals with speaker-position change detection,” in Proceedings of the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), pp. 3733-3736, April 2009.
  2. H. Kameoka, T. Nakatani, and T. Yoshioka, “Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms,” in Proceedings of the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), pp. 45-48, April 2009.
  3. T. Nakatnai, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H. Juang, “Real-time speech enhancement in noisy reverberant multi-talker environments based on a localtion-independent room acoustics model,” to appear in Proceedings of the 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), pp. 137-140, April 2009.
  4. A. Ogawa, S. Takahashi, and A. Nakamura, “Efficient combination of likelihood recycling and batch calculation based on conditional fast processing and acoustic back-off,” Proc. ICASSP, pp. 4164-4164, April 2009.
  5. T. Yoshioka, T. Nakatani, and M. Miyoshi, “Fast algorithm for conditional separation and dereverberation,” in Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009), CD-ROM Proceedings, August 2009.
  6. A. Ogawa and A. Nakamura, “Simultaneous estimation of confidence and error cause in speech recognition using discriminative model,” Proc. Interspeech, pp. 1199-1202, September 2009.
  7. S. Kobashikawa, A. Ogawa, Y. Yamaguchi, and S. Takahashi, “Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models,” Proc. Interspeech, pp.1615-1618, September 2009.
  8. T. Yoshioka, H. Kameoka, T. Nakatani, and H. G. Okuno, “Statistical models for speech dereverberation,” in Proceedings of the 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2009), pp. 145-148, October 2009.
  9. A. Nakamura, E. McDermott, S. Watanabe, S. Katagiri, “A unified view for discriminative objective functions based on negative exponential of difference measure between strings,” Proc. ICASSP 2009, pp. 1633-1636, 2009.
  10. E. McDermott, S. Watanabe, and A. Nakamura, “Margin-Space Integration of MPE Loss via Differencing of MMI Functionals for Generalized Error-Weighted Discriminative Training,” Proc. Interspeech 2009 Eurospeech, pp. 224-227, 2009.
  11. E. Vincent (IRISA-INRIA), S. Araki, and P. Bofill (カタロニア工科大), “The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation,” ICA2009, pp. 734-741, 2009.
  12. J. Muramatsu, and S. Miyake, “Coding theorem for general stationary memryless channel based on hash property,” Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, Korea, pp.541-545, 2009.
  13. J. Muramatsu, and S. Miyake, “Construction of wiretap channel codes by using sparse matrices,” Proceedings of the 2009 IEEE Information Theory Workshop, Taormina, Italy, pp.105-109, 2009.
  14. K. Ishiguro, T. Yamada S. Araki and T. Nakatani, “A PROBABILISTIC SPEAKER CLUSTERING FOR DOA-BASED DIARIZATION,,” WASPAA2009, 2009.
  15. K. Ishizuka, S. Araki, K. Otsuka, T. Nakatani and M. Fujimoto, “A Speaker Diarization Method based on the Probabilistic Fusion of Audio-Visual Location Information,” ICMI-MLMI 2009, 2009.
  16. K. Ishizuka, S. Araki, K. Otsuka, T. Nakatani, and M. Fujimoto, “A speaker diarization method based on the probabilistic fusion of audio-visual location information,” Proceedings of the 11th International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multi-modal Interaction (ICMI-MLMI2009), pp.55-62, 2009.
  17. K. Otsuka, S. Araki, D. Mikami, K. Ishizuka, M. Fujimoto, and J. Yamato, “Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors,” Proceedings of the 11th International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multi-modal Interaction (ICMI-MLMI2009), pp.219-220, 2009.
  18. M. Fujimoto, K. Ishizuka, and T. Nakatani, “A study of mutual front-end processing method based on statistical model for noise robust speech recognition,” Proc. of Interspeech '09, pp. 1235-1238, September 2009.
  19. M. Fujimoto, K. Ishizuka, and T. Nakatani, “A study of mutual front-end processing method based on statistical model for noise robust speech recognition,” Proceedings of the 10th Interspeech (Interspeech2009), pp. 1235-1238, 2009.
  20. R. Mugitani, K. Ishizuka, T. Kondo, and S. Amano, “Acquisition of durational control of vocalic and consonantal intervals in speech production,” The 34th Boston University Conference on Language Development (BUCLD34), 2009.
  21. S. Araki, T. Nakatani, H. Sawada, and S. Makino, “Blind sparse source separation for unknown number of sources using Gaussian mixture model fitting with Dirichlet prior,” ICASSP2009, pp.33-36, 2009.
  22. S. Araki, T. Nakatani, H. Sawada, and S. Makino, “Stereo source separation and source counting with MAP estimation with Dirichlet prior considering spatial aliasing problem,” ICA2009, pp. 742-750, 2009.
  23. S. Watanabe and A. Nakamura, “Speech recognition with incremental tracking and detection of changing environments based on a macroscopic time evolution system,” Proc. ICASSP 2009, pp. 4373-4376, 2009.
  24. T. Iwata, S. Watanabe, T. Yamada, and N. Ueda, “Topic tracking model for analyzing consumer purchase behavior,” IJCAI 2009, pp. 1427-1432, 2009.
  25. Y. Izumi, K. Nishiki, S. Watanabe, T. Nishimoto, N. Ono, and S. Sagayama, “Stereo-input Speech Recognition using Sparseness-based Time-frequency Masking in a Reverberant Environment,” Proc. Interspeech 2009 Eurospeech , pp. 1955-1958, 2009.
  26. S. Kobashikawa, A. Ogawa, Y. Yamaguchi, and S. Takahashi, “Rapid unsupervised adaptation using context independent phoneme model,” The 13th IEEE International Symposium on Consumer Electronics (ISCE'09), 2009.

Other Conference Papers

  1. K. Kinoshita, T. Nakatani, M. Miyoshi and T. Kubota, “Blind upmix of stereo music signal using multi-step linear prediction based reverberation extraction,” International Conference on Acoustics, Speech, and Signal Processing(ICASSP), pp49-52, 2009

2008

Journal Papers

  1. J. Muramatsu, “Effect of random permutation of symbols in a sequence,” IEEE Transactions on Information Theory, vol.IT-54, no.1, pp.78-86, January. 2008.
  2. J. Muramatsu, K. Yoshimura, K. Arai, and P. Davis, “Some results on secret key agreement using correlated sources,” NTT Technical Review, vol.6, No.2, February. 2008.
  3. M. Fujimoto and K. Ishizuka, “Noise Robust Voice Activity Detection Based on Switching Kalman Filter,” IEICE Transactions on Information and Systems, Vol. E91-D, No. 3, pp. 467-477, March. 2008.
  4. S. Miyake, and J. Muramatsu, “A construction of lossy source code using LDPC matrices, IEICE Transactions on Fundamentals,” vol.E91-A, no.6, pp.1488-1501, June 2008.
  5. T. Oba, T. Hori, and A. Nakamura, “Sequential Dependency Analysis for Online Spontaneous Speech Processing,” Speech Communication, Volume 50, Issue 7, pp. 616-625, July 2008.
  6. T. Nakatani, B.-H. Juang, T. Yoshioka, K. Kinoshita, M. Delcroix, and M. Miyoshi, “Speech dereverberation based on maximum likelihood estimation with time-varying Gaussian source model,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 8, pp. 1512-1527, November 2008.
  7. K. Yoshimura, J. Muramatsu, and P. Davis, “Conditions for common-noise-induced synchronization in time-delay systems,” Physica D, vol. 237, no. 23, pp.3146-3152, December. 2008.
  8. H. K. Solvang, K. Ishizuka, and M. Fujimoto, “Voice activity detection based on adjustable linear prediction and GARCH models,” Speech Communication, Vol.50, No.6, pp.476-486, 2008.
  9. T. Nakatani, S. Amano, T. Irino, K. Ishizuka, and T. Kondo, “A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments,” Speech Communication, Vol.50, No.3, pp.203-214, 2008.

Book Chapter, Tutorial Papers

  1. S. Makino, S. Araki, and H. Sawada, “Underdetermined Blind Source Separation using Acoustic Arrays,” in Handbook on Array Processing and Sensor Networks, S. Haykin and K.J. Ray Liu, Eds, Wiley, 2008.

Peer-reviewed Conference Papers

  1. T. Yoshioka, T. Nakatani, T. Hikichi, and M. Miyoshi, “Maximum likelihood approach to speech enhancement for noisy reverberant signals,” in Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 4585-4588, March 2008.
  2. T. Yoshioka and M. Miyoshi, “Adaptive suppression of non-stationary noise by using variational Bayesian method,” in Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 4889-4892, March 2008.
  3. T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H., Juang, “Blind speech dereverberation with multi-channel linear prediction based on short time Fourier transform representation,” in Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 85-88, March 2008.
  4. M. Fujimoto and K. Ishizuka, and T. Nakatani, “A Voice Activity Detection Based on the Adaptive Integration of Multiple Speech Features and a Signal Decision Scheme,” Proc. ICASSP '08, pp. 4441-4444, March 2008.
  5. T. Oba, T. Hori, and A. Nakamura, “Efficient Discriminative Training of Error Corrective Models Using High-WER Competitors,” Asian Workshop on Speech Science and Technology, IEICE Technical Report SP2007-185-214, pp. 99-104, March 2008.
  6. A. Ogawa and S. Takahashi, “Weighted distance measures for efficient reduction of Gaussian mixture components in HMM-based acoustic model,” Proc. ICASSP, pp. 4173-4176, March 2008.
  7. T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H., Juang, “Speech dereverberation in short time Fourier transform domain with cross band effect compensation,” in Proceedings of the 2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2008), pp. 220-223, May 2008.
  8. T. Yoshioka, T. Nakatani, and M. Miyoshi, “An integrated method for blind separation and dereverberation of convolutive audio mixtures,” in Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008), CD-ROM Proceedings, August 2008.
  9. T. Yoshioka, T. Nakatani, and M. Miyoshi, “Enhancement of noisy reverberant speech by linear filtering followed by nonlinear noise suppression,” in Proceedings of the 2008 International Workshop on Acoustic Echo and Noise Control (IWAENC 2008), CD-ROM Proceedings, September 2008.
  10. T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B.-H. Juang, “Incremental estimation of reverberation with uncertainty using prior knowledge of room acoustics for speech dereverberation,” in Proceedings of the 2008 International Workshop on Acoustic Echo and Noise Control (IWAENC 2008), CD-ROM Proceedings, September 2008.
  11. M. Fujimoto, K. Ishizuka, and T. Nakatani, “Study of Integration of Statistical Model-Based Voice Activity Detection and Noise Suppression,” Proc. Interspeech '08, September 2008.
  12. M. Miyoshi, K. Kinoshita, T. Nakatani, and T. Yoshioka, “Principles and applications of dereverberation for noisy and reverberant audio signals,” in Proceedings of the 2008 Asilomar Conference on Signals, Systems, and Computers, CD-ROM Proceedings, October 2008.
  13. S. Miyake, and J. Muramatsu, “A construction of channel code, joint source-channel code, and universal code for arbitrary stationary memoryless channels using sparse matrices,” Proceedings of the 2008 IEEE International Symposium on Information Theory, Toronto, Canada, pp.1193-1197, 2008.
  14. D. Kolossa (TU Berlin), S. Araki , M. Delcroix, T. Nakatani, R. Orglmeister (TU Berlin), S. Makino, “Missing Feature Speech Recognition in a Meeting Situation with Maximum SNR Beamforming,” ISCAS2008, pp. 3218 -3221, 2008.
  15. J. Muramatsu, and S. Miyake, “Hash property and multi-terminal source coding theorems for sparse matrices and maximal-likelihood coding,” Proceedings of the 2008 IEEE International Symposium on Information Theory, Toronto, Canada, pp.424-428, 2008.
  16. J. Muramatsu, and S. Miyake, “Lossy source coding algorithm using lossless multi-terminal source codes,” Proceedings of the 2008 International Symposium on Information Theory and its Applications, Auckland, New Zealand, pp.606-611, 2008.
  17. K. Ishizuka, S. Araki, and T. Kawahara, “Statistical speech activity detection based on spatial power distribution for analyses of poster presentations,” Proceedings of the 10th International Conference on Spoken Language Processing (Interspeech2008 - ICSLP), pp.99-102, 2008.
  18. K. Ishizuka, S. Araki, T. Kawahara, “Statistical Speech Activity Detection based on Spatial Power Distribution for Analyses of Poster Presentations,” Interspeech2008, pp.99-102, 2008.
  19. K. Otsuka, S. Araki, K. Ishizuka, M. Fujimoto, M. Heinrich, J. Yamato, “A Realtime Multimodal System for Analyzing Group Meetings by Combining Face Pose Tracking and Speaker Diarization,” ICMI2008, pp. 257-264, 2008.
  20. K. Otsuka, S. Araki, K. Ishizuka, M. Fujimoto, M. Hinrich, and J. Yamato, “A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization,” Proceedings of the 10th International Conference on Multimodal Interfaces (ICMI2008), pp. 257-264, 2008.
  21. M. Delcroix, T. Nakatani, and S. Watanabe, “Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer,” Proc. ICASSP 2008 pp. 4073-4076, 2008.
  22. M. Fujimoto, K. Ishizuka, and T. Nakatani, “A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme,” Proceedings of the 33rd International Conference on Acoustics, Speech and Signal Processing (ICASSP2008), pp.4441-4444, 2008.
  23. M. Fujimoto, K. Ishizuka, and T. Nakatani, “Study of integration of statistical model-based voice activity detection and noise suppression,” Proceedings of the 10th International Conference on Spoken Language Processing (Interspeech2008 - ICSLP), pp.2008-2011, 2008.
  24. S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, and S. Makino, “A DOA based speaker diarization system for real meetings,” HSCMA2008, pp.29-32, 2008.
  25. S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, and S. Makino, “A DOA based speaker diarization system for real meetings,” Proceedings of the Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA2008), pp.29-32, 2008.
  26. S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, and S. Makino, “Speaker indexing and speech enhancement in real meetings / conversations,” Proceedings of the 33rd International Conference on Acoustics, Speech and Signal Processing (ICASSP2008), pp.93-96, 2008.
  27. S. Watanabe and A. Nakamura, “A unified interpretation of adaptation techniques based on a macroscopic time evolution system with indirect/direct approaches,” Proc. ICASSP 2008 pp. 4285-4286, 2008.
  28. T. Hager, S. Araki, K. Ishizuka, M. Fujimoto, T. Nakatani, and S. Makino, “Handling speaker position changes in a meeting diarization system by combining DOA clustering and speaker identification,” Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control (IWAENC2008), 2008.
  29. T. Hager, S. Araki, K. Ishizuka, M. Fujimoto, T. Nakatani, S. Makino, “Handling speaker position changes in a meeting diarization system by combining DOA clustering and speaker identification,” IWAENC2008 CD-ROM proceedings, 2008.
  30. T. Kawahara, H. Setoguchi, K. Takanashi, K. Ishizuka, and S. Araki, “Multi-modal recording, analysis and indexing of poster sessions,” Proceedings of the 10th International Conference on Spoken Language Processing (Interspeech2008 - ICSLP), pp.1622-1625, 2008.
  31. T. Kawahara, H. Setoguchi, K. Takanashi, K. Ishizuka, S. Araki, “Multi-Modal Recording, Analysis and Indexing of Poster Sessions,” Interspeech2008, pp. 1622-1625, 2008.

Other Conference Papers

  1. K. Kinoshita, T. Nakatani, M. Miyoshi and T. Kubota, “A new audio post-production tool for speech dereverberation,” Audio Engineering Society (AES) 125th Convention, San Francisco, 2008.

2007

Journal Papers

  1. S. Araki, H. Sawada, R. Mukai and S. Makino, “Underdetermined Blind Sparse Source Separation for Arbitrarily Arranged Multiple Sensors,” Signal Processing, vol. 87, pp. 1833-1847, February. 2007. doi:10.1016/j.sigpro.2007.02.003.
  2. M. Knaak (Technical University Berlin), S. Araki and S. Makino, “Geometrically Constrained Independent Component Analysis,” IEEE Trans. Audio, Speech and Language Processing, vol. 15, No. 2, pp. 715-726, February, 2007.
  3. T. Yamamoto, I. Oowada, H. Yip, A. Uchida, S. Yoshimori, K. Yoshimura, J. Muramatsu, S. Goto, and P. Davis, “Common-chaotic-signal induced synchronization in semiconductor lasers,” Opt. Express, vol.15, no.7, pp.3974-3980, April 2007.
  4. Hiroko Kato Solvang, Kentaro Ishizuka, and Masakiyo Fujimoto, “A voice activity detection based on an AR-GARCH model,” IEICE Transaction on Information Systems, Vol.J90-D, No.12, pp.3210-3220, 2007 (in Japanese).
  5. H. Sawada, S. Araki, R. Mukai and S. Makino, “Grouping Separated Frequency Components with Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation,” IEEE Trans. Audio, Speech & Language Processing, vol. 15, no. 5, pp. 1592-1604, July 2007.
  6. K. Ishizuka, R. Mugitani, H. Kato, and S. Amano, “Longitudinal developmental changes in spectral peaks of vowels produced by Japanese infants,” The Journal of the Acoustical Society of America, Vol.121, No.11, pp.2272-2282, 2007.
  7. K. Kinoshita, T. Nakatani and M. Miyoshi, “Fast estimation of a precise dereverberation filter based on the harmonic structure of speech,” Acoustical Science and Technology (AST)
  8. T. Yoshioka, T. Hikichi, and M. Miyoshi, “Dereverberation by using time-variant nature of speech production system,” EURASIP Journal on Advances in Signal Processing, vol. 2007, article ID 65698, doi:10.1155/2007/65698, 2007.
  9. T. Hori, C. Hori, Y. Minami, and A. Nakamura, “Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition,” IEEE Trans., Audio, Speech and Language Processing, Vol. 15, pp. 1352-1365, 2007.

Book Chapter, Tutorial Papers

  1. H. Sawada, S. Araki, and S. Makino, “Frequency-Domain Blind Source Separation,” in Blind Speech Separation, S. Makino T.-W. Lee and H. Sawada, Eds., Springer, 2007.
  2. S. Araki, H. Sawada and S. Makino, “K-means based Underdetermined Blind Speech Separation,” in Blind Speech Separation, S. Makino T.-W. Lee and H. Sawada, Eds., Springer, 2007.

Peer-reviewed Conference Papers

  1. T. Nakatani, B.-H. Juang, T. Hikichi, T. Yoshioka, K. Kinoshita, M. Delcroix, and M. Miyoshi, “Study on speech dereverberation with autocorrelation codebook,” in Proceedings of the 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), vol. 1, pp. 193-196, April 2007.
  2. M. Fujimoto, K. Ishizuka, and H. Kato, “Noise Robust Voice Activity Detection Based on Statistical Model and Parallel Non-linear Kalman filtering,” Proc. ICASSP '07, Vol. IV, pp. 797-800, April 2007.
  3. S. Araki, H. Sawada, and S. Makino, “Blind speech separation in a meeting situation with maximum SNR beamformers,” ICASSP2007, vol. 1, pp. 41-44, April 2007.
  4. J. Cermak, S. Araki, H. Sawada and S. Makino, “Blind Source Separation Based on Beamformer Array and Time Frequency Binary Masking,” in Proc. ICASSP2007, vol. I, pp. 145 -148, April 2007.
  5. J. E. Rubio, K. Ishizuka, H. Sawada, S. Araki, T. Nakatani and M. Fujimoto, “Two-Microphone Voice Activity Detection Based on the Homogeneity of the Direction of Arrival Estimates,” in Proc. ICASSP2007, vol.4, pp. 385-388, April 2007.
  6. T. Nakatani, T. Hikichi, K. Kinoshita, T. Yoshioka, M. Delcroix, M. Miyoshi, and Biing-Hwang Juang, “Robust blind dereverberation of speech signals based on characteristics of short-time speech segments,” in Proceedings of the 2007 IEEE International Symposium on Circuits and Systems (ISCAS 2007), pp. 2986-2989, May. 2007.
  7. H. Sawada, S. Araki and S. Makino, “Measuring Dependence of Bin-wise Separated Signals for Permutation Alignment in Frequency-domain BSS,” in Proc. ISCAS2007, pp. 3247 - 3250, May 2007.
  8. M. Fujimoto and K. Ishizuka, “Noise Robust Voice Activity Detection Based on Switching Kalman Filtering,” Proc. Eurospeech '07, pp. 2933-2936, August 2007.
  9. T. Oba, T. Hori, and A. Nakamura, “A Study of Efficient Discriminative Word Sequences for Reranking of Recognition Results based on N-gram Counts,” Interspeech2007, pp. 1753-1756, August 2007.
  10. T. Yoshioka, T. Nakatani, T. Hikichi, and M. Miyoshi, “Overfitting-resistant speech dereverberation,” in Proceedings of the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007), pp. 163-166, October 2007.
  11. T. Nakatani, B.-H. Juang, T. Yoshioka, K. Kinoshita, and M. Miyoshi, “Importance of energy and spectral features in Gaussian source model for speech dereverberation,” in Proceedings of the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007), pp. 299-302, October 2007.
  12. Y. Minami, M. Sawaki, K. Dohsaka, R. Higashinaka, K. Ishizuka, H. Isozaki, T. Matsubayashi, M. Miyoshi, A. Nakamura, T. Oba, H. Sawada, T. Yamada, and E. Maeda, “The world of Mushrooms: Human-computer interaction prototype systems for ambient intelligence,” Proceedings of the 9th International Conference on Multimodal Interfaces (ICMI2007), 2007.
  13. I. Oowada, Y. Yamamoto, H. Yip, H. Arizumi, A. Uchida, S. Yoshimori, K. Yoshimura, J. Muramatsu, S. Goto, and P. Davis, “Synchronization in semiconductor lasers subject to a common a common chaotic drive signal,” Proceedings of the 15th IEEE International Workshop on Nonlinear Dynamics of Electronic Systems Tokushima, Japan, pp.149-152, 2007.
  14. H. Sawada, S. Araki, and S. Makino, “A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures,” WASPAA2007.
  15. H. Sawada, S. Araki, and S. Makino, “MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/and audio,” MLSP2007, 2007.
  16. J. E. Rubio, K. Ishizuka, H. Sawada, S. Araki, T. Nakatani, and M. Fujimoto, “Two-microphone voice activity detection based on the homogeneity of the direction of arrival estimate,” Proceedings of the 32nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP2007), Vol.4, pp.385-388, 2007.
  17. J. Muramatsu, “Effect of random permutation of symbols in a sequence,” Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, pp.1486-1490, 2007.
  18. K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazaki, “Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio,” Proceedings of the 10th European Conference on Speech Communication and Technology (Interspeech2007 - Eurospeech), pp.230-233, 2007.
  19. M. Fujimoto and K. Ishizuka, “Noise robust voice activity detection based on switching Kalman filter,” Proceedings of the 10th European Conference on Speech Communication and Technology (Interspeech2007 - Eurospeech), pp.2933-2936, 2007.
  20. M. Fujimoto, K. Ishizuka, and H. Kato, “Noise robust voice activity detection based on statistical model and parallel non-linear Kalman filtering,” Proceedings of the 32nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP2007), Vol.4, pp.797-800, 2007.
  21. R. Mugitani, T. Kobayashi, and K. Ishizuka, “Perceptual development of phonemic categories for Japanese single/geminate obstruents,” The 32nd Boston University Conference on Language Development (BUCLD32), 2007.
  22. S. Miyake, and J. Muramatsu, “Constructions of a lossy source code using LDPC matrices,” Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, pp.1106-1110, 2007.
  23. S. Watanabe and A. Nakamura, “Incremental adaptation based on a macroscopic time evolution system,” Proc. ICASSP 2007, vol. 4, pp. 769-772, 2007.
  24. Y. Minami, M. Sawaki, K. Dohsaka, R. Higashinaka, K. Ishizuka, H. Isozaki, T. Matsubayashi, M. Miyoshi, A. Nakamura, T. Oba, H. Sawada, T. Yamada, and E. Maeda, “The world of Mushrooms: Human-computer interaction prototype systems for ambient intelligence,” Proceedings of the 9th International Conference on Multimodal Interfaces (ICMI2007), pp.366-373, 2007.

Other Conference Papers

  1. K. Kinoshita, M. Delcroix, T. Nakatani and M. Miyoshi, “Dereverberation of real recordings using linear prediction-based microphone array,” Audio Engineering Society (AES) 13th Regional Convention, Tokyo, 2007
  2. K. Kinoshita, M. Delcroix, T. Nakatani and M. Miyoshi, “Multi-step linear prediction based speech enhancement in noisy reverberant environment,” Proc. of Interspeech, pp.854-857, 2007

2006

Journal Papers

  1. T. Yoshioka, T. Hikichi, M. Miyoshi, and H. G. Okuno, “Common acoustical pole estimation from multi-channel musical audio signals,” IEICE Transactions on Fundamentals, vol. E89-A, no. 1, pp. 240-247, January 2006.
  2. J. Muramatsu, “Secret key agreement from correlated source outputs using low density parity check matrices,” IEICE Transactions on Fundamentals, vol.E89-A, no.7, pp.2036-2046, July 2006.
  3. J. Muramatsu, K. Yoshimura, and P. Davis, “Secret key capacity and advantage distillation capacity,” IEICE Transactions on Fundamentals, vol.E89-A, no.10, pp.2589-2596, October 2006.
  4. J. Muramatsu, K. Yoshimura, K. Arai, and P. Davis, “Secret key capacity for optimally correlated sources under sampling attack,” IEEE Transactions on Information Theory, vol.IT-52, no.11, pp.5140-5151, November 2006.
  5. H. Sawada, S. Araki, R. Mukai, S. Makino, “Blind extraction of dominant target sources using ICA and time-frequency masking,” IEEE Trans. Audio, Speech, and Language Processing, vol.14, no.6, pp.2165-2173, November 2006.
  6. K. Ishizuka and T. Nakatani, “A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition,” Speech Communication, Vol.48, No.11, pp.1447-1457, 2006.
  7. K. Ishizuka, T. Nakatani, Y. Minami, and N. Miyazaki, “Speech feature extraction method using subband-based periodicity and nonperiodicity decomposition,” The Journal of the Acoustical Society of America, Vol.120, No.1, pp.443-452, 2006.
  8. R. Mugitani, T. Kobayashi, K. Ishizuka, S. Amano, and K. Hiraki, “Audiovisual matching in lips and voice on vowel /i/ by Japanese infants,” The Journal of the Phonetic Society of Japan, Vol.10, No.1, pp.96-108, 2006 (in Japanese).
  9. R. Mukai, H. Sawada, S. Araki, S. Makino, “Frequency Domain Blind Source Separation of Many Speech Signals Using Near-field and Far-field Models,” EURASIP Journal on Applied Signal Processing, vol. 2006, Article ID 83683, 13 pages, 2006. doi:10.1155/ASP/2006/83683.
  10. S. Watanabe and A. Nakamura, “Speech recognition based on Student's t-distribution derived from total Bayesian framework,” IEICE D-II, vol. E89-D, no. 3, pp. 970-980, 2006
  11. S. Watanabe, A. Sako, and A. Nakamura, “Automatic Determination of Acoustic Model Topology using Variational Bayesian Estimation and Clustering for large vocabulary continuous speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 14, issue 3, pp. 855-872, 2006.

Book Chapter, Tutorial Papers

  1. A. Nakamura, S. Watanabe, T. Hori, E. McDermott, and S. Katagiri, “Advanced Computational Models and Learning Theories for Spoken Language Processing,” IEEE Computational Intelligence Magazine, vol. 1, issue 2, pp. 5-9, 2006.
  2. S. Makino, H. Sawada, R. Mukai, and S. Araki, “Blind source separation of convolutive mixtures of audio signals in frequency domain,” in Topics in Acoustic Echo and Noise Control, E. Haensler and G. Schmidt, Eds., Springer, 2006.

Peer-reviewed Conference Papers

  1. T. Yoshioka, T. Hikichi, and M. Miyoshi, “Second-order statistics based dereverberation by using nonstationarity of speech,” in Proceedings of the 2006 International Workshop on Acoustic Echo and Noise Control (IWAENC 2006), CD-ROM Proceedings, September 2006.
  2. T. Yoshioka, T. Hikichi, M. Miyoshi, and H. G. Okuno, “Robust decomposition of inverse filter of channel and prediction error filter of speech signal for dereverberation,” in Proceedings of the 2006 European Signal Processing Conference (EUSIPCO 2006), CD-ROM Proceedings, September 2006.
  3. T. Oba, T. Hori, and A. Nakamura, “Sentence Boundary Detection Using Sequential Dependency Analysis Combined with CRF-based Chunking,” ICSLP2006, pp. 284-289, September 2006.
  4. H. Sawada, S. Araki, R. Mukai and S. Makino, “Blind separation and localization of speeches in a meeting situation,” Asilomar 2006, pp. 1407-1411, October 2006.
  5. R. Mukai, H. Sawada, S. Araki and S. Makino, “Frequency Domain Blind Source Separation in a Noisy Environment,” Joint meeting of ASA and ASJ 2006, November 2006, (invited).
  6. H. Kato, Y. Nagahara, S. Araki, H. Sawada and S. Makino, “Parametric Pearson Approach based Independent Component Analysis for Frequency Domain Blind Speech Separation,” EUSIPCO2006, 2006.
  7. H. Sawada, S. Araki, R. Mukai and S. Makino, “On Calculating the Inverse of Separation Matrix in Frequency-Domain BSS,” ICA2006, pp. 691-699, 2006.
  8. H. Sawada, S. Araki, R. Mukai and S. Makino, “Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing,” ICASSP2006, vol. 5, pp. 77-80, 2006.
  9. J. Cermak, S. Araki, H. Sawada and S. Makino, “Blind Speech Separation by Combining Beamformers and a Time Frequency Binary Mask,” IWAENC2006, 2006.
  10. J. Cermak, S. Araki, H. Sawada and S. Makino, “Musical Noise Reduction in Time-frequency-binary-masking-based Blind Source Separation Systems,” 16th Czech-German Workshop, 2006.
  11. J. Muramatsu, K. Yoshimura, and P. Davis, “Secret key capacity and advantage distillation capacity,” Proceedings of the 2006 IEEE International Symposium on Information Theory, pp.2147-2151, 2006.
  12. J. Muramatsu, K. Yoshimura, K. Arai, and P. Davis, “Some results on secret key agreement from correlated sources,” Proceedings of the 5th Asian-European Workshop on Information Theory, Jeju, Korea, pp.10-13, 2006.
  13. K. Ishizuka and H. Kato, “A feature for voice activity detection derived from speech analysis with the exponential autoregressive model,” Proceedings of the 31st International Conference on Acoustics, Speech, and Signal Processing (ICASSP2006), Vol.1, pp.789-792, 2006.
  14. K. Ishizuka and Tomohiro Nakatani, “Study of noise robust voice activity detection based on periodic compnent to aperiodic component ratio,” Proceedings of ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition (SAPA2006), pp.65-70, 2006.
  15. K. Yoshimura, J. Muramatsu, and P. Davis, “Conditions for consistency in time-delay systems,” Proceedings of the International Workshop on Synchronization Phenomena and Analyses, p.135, 2006.
  16. K. Yoshimura, J. Muramatsu, and P. Davis, “Consistency in time-delay systems with periodic feedback functions,” Proceedings of the 2006 International Symposium on Nonlinear Theory and its Applications, pp.287-290, 2006.
  17. R. Mukai, H. Sawada, S. Araki, S. Makino, “Blind Source Separation of Many Signals in the Frequency Domain,” ICASSP2006, vol.5, pp.969-972, 2006.
  18. S. Araki, H. Sawada, R. Mukai and S. Makino, “Blind sparse source separation with spatially smoothed time-frequency masking,” IWAENC2006, 2006.
  19. S. Araki, H. Sawada, R. Mukai and S. Makino, “Performance evaluation of sparse source separation and DOA estimation with observation vector clustering in reverberant environments,” IWAENC2006, 2006.
  20. S. Araki, H. Sawada, R. Mukai and S. Makino, “DOA estimation for multiple sparse sources with normalized observation vector clustering,” ICASSP2006, vol. 5, pp. 33-36, 2006.
  21. S. Araki, H. Sawada, R. Mukai and S. Makino, “Normalized Observation Vector Clustering Approach for Sparse Source Separation,” EUSIPCO2006, (invited).
  22. S. Araki, H. Sawada, R. Mukai and S. Makino, “Underdetermined Sparse Source Separation of Convolutive Mixtures with Observation Vector Clustering,” ISCAS2006, pp. 3594-3597, 2006.
  23. S. Mizutani, J. Muramatsu, K. Arai, and P. Davis, “Noise-assisted quantization,” Proceedings of the 2006 International Symposium on Nonlinear Theory and its Applications, pp.843-846, 2006.
  24. S. Watanabe and A. Nakamura, “Acoustic model adaptation based on coarse/fine training of transfer vector using directional statistics,” Proc. ICASSP 2006 , vol. 1, pp. 1005-1008, 2006.
  25. T. Hori and A. Nakamura, “An extremely large vocabulary approach to named entity extraction from speech,” in Proc. ICASSP2006, Vol. 1, pp. 973-976, 2006.
  26. T. Hori, I. L. Hetherington, T. J. Hazen, and J. R. Glass, “Open-vocabulary spoken utterance retrieval using confusion networks,” in Proc. ICASSP2007, Vol. 1, pp. 973-976, 2006.

Other Conference Papers

  1. K. Kinoshita, T. Nakatani and M. Miyoshi, “Spectral subtraction steered by multi-step forward linear prediction for single channel speech dereverberation,” Proc. Of International Conference on Acoustics, Speech, and Signal Processing(ICASSP), I, pp.817-820.

2005

Journal Papers

  1. A. Blin, S. Araki, and S. Makino, “Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation,” IEICE Trans. Fundamentals, Vol.E88-A, No.7, pp.1693-1700, 2005.
  2. H. Sawada, R. Mukai, S. Araki, and S. Makino, “Estimating the number of sources using independent component analysis,” Acoustical Science and Technology, vol. 26, no. 5, pp.450-452, 2005.
  3. K. Kinoshita, T. Nakatani and M. Miyoshi, “Harmonicity based dereverberation for improving automatic speech recognition performance and speech intelligibility,” IEICE,2005.
  4. S. Araki, S. Makino, R. Aichner(Univ. Erlangen-Nuremberg), T. Nishikawa(NAIST) and H. Saruwatari(NAIST), “Subband-based Blind Separation for Convolutive Mixtures of Speech,” IEICE Trans. Fundamentals, E88-A(12), pp. 3593-3603, 2005.
  5. S. Makino, H. Sawada, R. Mukai, and S. Araki, “Blind source separation of convolutive mixtures of speech in frequency domain,” IEICE Trans. Fundamentals, Vol.E88-A, No.7, pp.1640-1655, 2005. (invited)

Book Chapter, Tutorial Papers

  1. S. Araki, S. Makino, “Subband Based Blind Source Separation,” In J. Benesty, S. Makino, and J. Chen, editors, Speech Enhancement, pp. 329-352, Springer, March 2005.
  2. H. Sawada, R. Mukai, S. Araki and S. Makino, “Frequency-domain blind source separation,” In J. Benesty, S. Makino, and J. Chen, editors, Speech Enhancement, pp.299-327, Springer, March 2005.
  3. R. Mukai, H. Sawada, S. Araki and S. Makino, “Real-time blind source separation for moving speech signals,” In J. Benesty, S. Makino, and J. Chen, editors, Speech Enhancement, pp.353-369, Springer, March 2005.

Peer-reviewed Conference Papers

  1. S. Araki, S. Makino, H. Sawada and R. Mukai, “Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask,” ICASSP2005, vol. III, pp. 81-84, March 2005.
  2. S. Araki, S. Makino, H. Sawada, and R. Mukai, “Source extraction from speech mixtures with null-directivity pattern based mask,” Proc. of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA 2005), pp. d1-d2, March 2005.
  3. H. Sawada, S. Araki, R. Mukai, S. Makino, “Blind Extraction of a Dominant Source Signal from Mixtures of Many Sources,” ICASSP2005, vol. III, pp. 61-64, March 2005.
  4. H. Sawada, R. Mukai, S. Araki, and S. Makino, “Frequency-domain blind source separation without array geometry information,” Proc. of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA 2005), pp.d13-d14, March 2005.
  5. R. Mukai, H. Sawada, S. Araki, and S. Makino, “Blind source separation and {DOA} estimation using small 3-D microphone array,” Proc. of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA 2005), pp. d9-d10, March 2005.
  6. M. Schuster and T. Hori, “Efficient generation of high-order context-dependent weighted finite state transducers for speech recognition,” in Proc. ICASSP2005, Vol I, pp. 201-204, March 2005.
  7. T. Yoshioka, T. Hikichi, M. Miyoshi, and H. G. Okuno, “Blind estimation of room resonances using popular, classical, and jazz Music,” in Proceedings of the 118th Audio Engineering Society Convention (AES 118), article ID 6632, May. 2005
  8. H. Sawada, S. Araki, R. Mukai, and S. Makino, “Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking,” Proc. of 2005 IEEE International Symposium on Circuits and Systems (ISCAS 2005), pp. 5882-5885, May 2005.
  9. H. Sawada, R. Mukai, S. Araki, and S. Makino, “Multiple source localization using independent component analysis,” Proc. of 2005 IEEE AP-S International Symposium and USNC/URSI National Radio Science Meeting, July 2005.
  10. H. Kato, Y. Nagahara (Meiji Univ.), S. Araki, and H. Sawada, “Pearson distribution system applied to blind speech separation,” 25th European Meeting of Statisticians (EMS2005), p.394, July 2005.
  11. T. Hori and A. Nakamura, “Generalized fast on-the-fly composition algorithm for WFST-based speech recognition,” in Proc. Interspeech2005-Eurospeech, pp. 557-560, September 2005.
  12. M. Schuster, T. Hori, and A. Nakamura, “Experiments with Probabilistic Principal Component Analysis in LVCSR,” in Proc. Interspeech2005-Eurospeech, pp. 1685-1688, September 2005.
  13. R. Mukai, H. Sawada, S. Araki, and S. Makino, “Blind Source Separation of 3-D Located Many Speech Signals,” in Proc. WASPAA2005, pp. 9-12, October 2005.
  14. T. Oba, T. Hori, and A. Nakamura, “Dependency modeling for integrated spontaneous speech processing,” in Proc. ASRU2005, pp. 284-289, November 2005.
  15. M. Schuster, and T. Hori, “Construction of weighted finite state transducers for very wide context-dependent acoustic models,” in Proc. ASRU2005, pp. 162-167, November 2005.
  16. T. Oba, T. Hori, and A. Nakamura, “Sequential Dependency Analysis for Spontaneous Speech Understanding,” ASRU2005, pp. 284-289, November 2005.
  17. F. Flego, S. Araki, H. Sawada, T. Nakatani, and S. Makino, “Underdetermined blind separation for speech in real environments with F0 adaptive comb filtering,” IWAENC2005, pp. 93-96, 2005.
  18. H. Sawada, R. Mukai, S. Araki, and S. Makino, “Real-time blind extraction of dominant target sources from many background interferences,” IWAENC2005, pp. 73-76, 2005.
  19. K. Ishizuka and T. Nakatani, “Robust speech feature extraction using subband based periodicity and aperiodicity decomposition in the frequency domain,” Proceedings of the Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA2005), pp.a13-a14, 2005.
  20. K. Ishizuka, H. Kato, and T. Nakatani, “Speech signal analysis with exponential autoregressive model,” Proceedings of the 30th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2005), Vol.1, pp.225-228, 2005.
  21. K. Ishizuka, R. Mugitani, H. Kato, and S. Amano, “A longitudinal analysis of the spectral peaks of vowels for a Japanese infant,” Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech2005 - Eurospeech) pp.1169-1172, 2005.
  22. R. Mugitani, K. Ishizuka, and S. Amano, “Longitudinal development of mora-timed rhythmic structure in Japanese,” The 30th Boston University Conference on Language Development BUCLD30, p.52, 2005.
  23. R. Mukai, H. Sawada, S. Araki, and S. Makino, “Real-Time Blind Source Separation and DOA Estimation Using Small 3-D Microphone Array,” IWAENC2005, pp. 45-48, 2005.
  24. S. Araki, H. Sawada, R. Mukai and S. Makino, “A novel blind source separation method with observation vector clustering,” , IWAENC2005, pp.117-120, 2005.
  25. S. Watanabe and A. Nakamura, “Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition,” Proc. Interspeech '2005 Eurospeech, pp. 1105-1108, 2005.

Other Conference Papers

  1. K. Kinoshita, T. Nakatani and M. Miyoshi, “ Fast estimation of a precise dereverberation filter based on speech harmonicity,” Proc. Of International Conference on Acoustics, Speech, and Signal Processing(ICASSP), 2005
  2. K. Kinoshita, T. Nakatani and M. Miyoshi, “Efficient blind dereverberation framework for automatic speech recognition,” Proc. of Interspeech, 2005

2004

Journal Papers

  1. R. Mukai, S. Araki, H. Sawada, S. Makino, “Evaluation of Separation and Dereverberation Performance in Frequency Domain Blind Source Separation,” Acoustical Science and Technology, Vol.25, No.2, pp.119-126, March. 2004.
  2. H. Sawada, R. Mukai, S. Araki, S. Makino, “Convolutive Blind Source Separation for more than Two Sources in the Frequency Domain,” Acoustical Science and Technology, the Acoustical Society of Japan, vol.25, no.4, pp. 296-298, July 2004.
  3. R. Mukai, H. Sawada, S. Araki, S. Makino, “Blind Source Separation for Moving Speech Signals using Blockwise ICA and Residual Crosstalk Subtraction,” IEICE Trans. Fundamentals, Special Section on Digital Signal Processing, vol.E87-A, no.8, pp.1941-1948, August, 2004.
  4. H. Sawada, R. Mukai, S. Araki, S. Makino, “A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation,” IEEE Trans. Speech and Audio Processing, vol.12, no.5, pp.530-538, September 2004.
  5. S. Watanabe, Y. Minami, A. Nakamura and N. Ueda, “Variational Bayesian Estimation and Clustering for Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 12, pp. 365-381, 2004.

Peer-reviewed Conference Papers

  1. S. Araki, S. Makino, A. Blin, R. Mukai, and H. Sawada, “Underdetermined Blind Separation for Speech in Real Environments with Sparseness and ICA,” ICASSP2004, vol. III, pp. 881-884, May 2004 (invited).
  2. A. Blin, S. Araki and S. Makino, “A Sparseness-Mixing Matrix Estimation (SMME) Solving the Underdetermined BSS for Convolutive Mixtures,” ICASSP2004, vol. IV, pp. 85-88, May 2004.
  3. R. Mukai, H. Sawada, S. Araki, S. Makino, “Near-Field Frequency Domain Blind Source Separation for Convolutive Mixtures,” ICASSP2004, vol. IV, pp. 49-52, May 2004.
  4. H. Sawada, R. Mukai, S. Araki, S. Makino, “Convolutive Blind Source Separation for more than Two Sources in the Frequency Domain,” ICASSP2004, vol. III, pp. 885-888, May 2004 (invited).
  5. S. Makino, S. Araki, R. Mukai, and H. Sawada, “Audio source separation based on independent component analysis,” in Proc. ISCAS2004 (International Symposium on Circuits and Systems), vol. V, pp. 668-671, May 2004 (invited).
  6. R. Mukai, H. Sawada, S. Araki and S. Makino, “Frequency Domain Blind Source Separation using Small and Large Spacing Sensor Pairs,” ISCAS2004, vol. V, pp. 1-4, May 2004.
  7. S. Araki, S. Makino, H. Sawada and R. Mukai, “Underdetermined Blind Speech Separation with Directivity Pattern based Continuous Mask and ICA,” EUSIPCO2004, pp.1991-1994, September 2004.
  8. S. Araki, S. Makino, H. Sawada and R. Mukai, “Underdetermined Blind Separation of Convolutive Mixtures of Speech with Directivity Pattern based Mask and ICA,” ICA2004, pp.898-905, September 2004.
  9. H. Sawada, S. Winter, S. Araki, R. Mukai, S. Makino, “Estimating the Number of Sources for Frequency-Domain Blind Source Separation,” ICA2004 (5th International Conference on Independent Component Analysis and Blind Signal Separation), pp.610-617, September 2004.
  10. S. Winter, H. Sawada, S. Araki, S. Makino, “Overcomplete BSS for convolutive mixtures based on hierarchical clustering,” ICA2004, pp.652-660, September 2004.
  11. R. Mukai, H. Sawada, S. Araki, S. Makino, “Frequency Domain Blind Source Separation for Many Speech Signals,” ICA2004, pp.461-469, September 2004.
  12. S. Winter, H. Sawada, S. Araki, S. Makino, “Hierarchical Clustering Applied to Overcomplete BSS for Convolutive Mixtures,” SAPA2004 (ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing), Session I-3, October 2004.
  13. A. Blin, S. Araki, and S. Makino, “Underdetermined blind source separation for convolutive mixtures exploiting a sparseness-mixing matrix estimation (SMME),” in Proc. ICA2004 (International Congress on Acoustics), vol. IV, pp. 3139-3142, 2004.
  14. H. Sawada, R. Mukai, S. Araki, S. Makino, “Solving the Permutation and the Circularity Problem of Frequency-Domain Blind Source Separation,” ICA2004 (International Congress on Acoustics), vol. I, pp. 89-92, 2004 (invited).
  15. K. Ishizuka and N. Miyazaki, “Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition," Proceedings of the 29th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2004), Vol.1, pp.141-144, 2004.
  16. K. Ishizuka and N. Miyazaki, “Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition," The 2nd NTT Workshop on Communication Scene Analysis (CSA2004) Poster presentation, 2004.
  17. K. Ishizuka, N. Miyazaki, T. Nakatani and Y. Minami, “mprovement in robustness of speech feature extraction method using sub-band based periodicity and aperiodicity decomposition," Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech2004 - ICSLP), Vol.2, pp.937-940, 2004.
  18. P. Zolfaghari, H. Kato, S. Watanabe and S. Katagiri, “Speech Spectral Modelling using Mixture of Gaussians,” Proc. SWIM , 2004
  19. P. Zolfaghari, S. Watanabe, A. Nakamura and S. Katagiri, “Bayesian Modelling of the Speech Spectrum Using Mixture of Gaussians,” Proc. ICASSP'04, vol. 1, pp. 553-556, 2004.
  20. R. Mukai, H. Sawada, S. Araki, S. Makino, “A Solution for the Permutation Problem in Frequency Domain BSS using Near- and Far-field Models,” ICA2004 (International Congress on Acoustics), vol. IV, pp. 3135-3138, 2004.
  21. S. Araki, S. Makino, A. Blin, R. Mukai, and H. Sawada, “Underdetermined blind separation of convolutive mixtures of speech by combining time-frequency masks and ICA,” in Proc. ICA2004 (International Congress on Acoustics), vol. I, pp.321-324, 2004.
  22. S. Watanabe and A. Nakamura, “Acoustic model adaptation based on coarse-fine training of transfer vectors and its application to speaker adaptation task,” Proc. ICSLP'04 , vol. 4, 2933-2936, 2004.
  23. S. Watanabe and A. Nakamura, “Robustness of acoustic model topology determined by Variational Bayesian Estimation and Clustering for speech recognition for different speech data sets,” Proc. Workshop on statistical modeling approach for speech recognition - Beyond HMM, pp. 55-60, 2004.
  24. S. Watanabe, A. Sako (Ryukoku Univ.) and A. Nakamura, “Automatic Determination of Acoustic Model Topology using Variational Bayesian Estimation and Clustering,” Proc. ICASSP'04, vol. 1, pp. 813-816, 2004.
  25. T. Hori, C. Hori, and Y. Minami, “Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous-speech recognition,” in Proc. ICSLP2004, Vol. 1, pp. 289-292, 2004.

Other Conference Papers

  1. H. Sawada, R. Mukai, S. Araki, S. Makino, “Blind Source Separation for Convolutive Mixtures in the Frequency Domain,” CSA2004.
  2. K. Kinoshita, T. Nakatani and M. Miyoshi, “Improving automatic speech recognition performance and speech intelligibility with harmonicity based dereverberation,” Proc. Of Interspeech, 2004
  3. K. Kinoshita, T. Nakatani and M. Miyoshi, “Speech dereverberation based on harmonic structure using a single microphone,” Poster presentation at 2004 NTT Workshop on Communication Scene Analysis, 2004
  4. R. Mukai, H. Sawada, S. Araki, S. Makino, “A Solution for the Permutation Problem in Frequency Domain BSS using Near- and Far-field Models,” CSA2004.
  5. S. Araki, S. Makino, H. Sawada and R. Mukai, “Blind Separation of More Speech than Sensors using Time-frequency Masks and ICA,” Proceedings of 2004 NTT Workshop on Communication Scene Analysis (CSA2004), (invited)
  6. S. Winter, H. Sawada,S. Araki, S. Makino, “Underdetermined Blind Source Separation for Convolutive Mixtures of Sparse Signals,” CSA2004

2003

Journal Papers

  1. H. Sawada, R. Mukai, S. Araki, S. Makino, “Polar Coordinate based Nonlinear Function for Frequency Domain Blind Source Separation,” IEICE Trans. Fundamentals, vol.E86-A, no.3, pp. 590-596, March 2003.
  2. S. Araki, R. Mukai, S. Makino, T. Nishikawa(NAIST) and H. Saruwatari(NAIST), “The Fundamental Limitation of Frequency Domain Blind Source Separation for Convolutive Mixtures of Speech,” IEEE Trans. Speech Audio Processing, Vol. 11, No. 2, pp. 109-116, 2003.
  3. S. Araki, S. Makino, Y. Hinamoto(NAIST), R. Mukai, T. Nishikawa(NAIST) and H. Saruwatari(NAIST), “Equivalence between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Beamforming for Convolutive Mixtures,” EURASIP Journal on Applied Signal Processing, vol. 2003, no. 11, pp. 1157-1166, 2003.

Peer-reviewed Conference Papers

  1. R. Mukai, H. Sawada, S. Araki, S. Makino, “Real-Time Blind Source Separation for Moving Speakers using Blockwise ICA and Residual Crosstalk Subtraction,” ICA2003, pp. 975-980, April 2003.
  2. H. Sawada, R. Mukai, S. Araki, S. Makino, “A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation,” ICA 2003, pp. 505-510, April 2003.
  3. R. Mukai, H. Sawada, S. Araki, S. Makino, “Robust Real-Time Blind Source Separation for Moving Speakers in a Room,” ICASSP2003, pp. 469-472, April 2003.
  4. H. Sawada, R. Mukai, S. Araki, S. Makino, “A Robust Approach to the Permutation Problem of Frequency-Domain Blind Source Separation,” ICASSP 2003, pp. 381-384, April 2003.
  5. T. Hori, D. Willett, and Y. Minami, “Language model adaptation using WFST-based speaking-style translation,” in Proc. ICASSP2003, Vol. 1, pp. 228-231, April 2003.
  6. C. Hori, T. Hori, H. Isozaki, E. Maeda, S. Katagiri, and S. Furui, “Deriving Disambiguous Queries in a Spoken Interactive ODQA System,” in Proc. ICASSP2003, Vol.1, pp. 384-387 April, 2003.
  7. T. Hori, D. Willett, and Y. Minami, “Paraphrasing spontaneous speech using weighted finite-state transducers,” in Proc. SSPR2003, pp.219-222 April, 2003.
  8. C. Hori, T. Hori, H. Isozaki, E. Maeda, S. Katagiri, and S. Furui, “Study on Spoken Interactive Open Domain Question Answering,” in Proc. SSPR2003, pp.111-113 April, 2003.
  9. S. Araki, S. Makino, H. Sawada, A. Blin and R. Mukai, “Underdetermined Blind Separation of Convolutive Mixtures of Speech with Binary Masks and ICA,” NIPS 2003 workshop on ICA: Sparse Representations in Signal Processing, December, 2003. (We did not have the proceedings in the workshop).
  10. A. Blin, S. Araki and S. Makino, “Blind Source Separation when Speech Signals Outnumber Sensors using a Sparseness-Mixing Matrix Combination,” IWAENC2003, pp. 211-214, 2003.
  11. H. Sawada, R. Mukai, S. de la Kethulle, S. Araki and S. Makino, “Spectral Smoothing for Frequency-Domain Blind Source Separation,” IWAENC2003, pp.311-314, 2003.
  12. M. Knaak, S. Araki , S. Makino, “Geometrically Constraint ICA for a Convolutive Mixtures of Sound,” ICASSP2003, Vol. II, pp. 725-728, 2003.
  13. M. Knaak, S. Araki, S. Makino, “Geometrically Constraint ICA for a Robust Separation of Sound Mixtures,” ICA2003, pp. 951-956, 2003.
  14. R. Aichner, H. Buchner, S. Araki, S. Makino, “On-line Time-domain Blind Source Separation of Nonstationary Convoluved Signals,” ICA2003, pp. 987-992, 2003.
  15. R. Mukai, H. Sawada, S. de la Kethulle, S. Araki and S. Makino, “Array Geometry Arrangement for Frequency Domain Blind Source Separation,” IWAENC2003, pp.219-222, 2003.
  16. S. Araki, S. Makino, A. Blin, R. Mukai and H. Sawada, “Blind Separation of More Speech than Sensors with Less Distortion by Combining Sparseness and ICA,” IWAENC2003, pp.271-274, 2003.
  17. S. Araki, S. Makino, R. Aichner, T. Nishikawa(NAIST), and H. Saruwatari(NAIST), “Subband Based Blind Source Separation for Convolutive Mixtures of Speech,” ICASSP2003, Vol. V, pp. 509-512, 2003.
  18. S. Araki, S. Makino, R. Aichner, T. Nishikawa(NAIST), and H. Saruwatari(NAIST), “Subband Based Blind Source Separation with Appropriate Processing for Each Frequency Band,” ICA2003, pp. 499-504, 2003 .
  19. S. Watanabe, Y. Minami, A. Nakamura and N. Ueda, “Application of Variational Bayesian Estimation and Clustering to Acoustic Model Adaptation,” Proc. ICASSP'03. vol. 1, pp. 568-571, 2003.
  20. S. Watanabe, Y. Minami, A. Nakamura and N. Ueda, “Bayesian Acoustic Modeling for Spontaneous Speech Recognition,” Proc. SSPR'03. pp. 47-50, 2003.
  21. T. Nishikawa, H. Saruwatari, K. Shikano, S. Araki , S. Makino, “Multistage ICA for Blind Source Separation of Real Acoustic Convolutive Mixture,” ICA2003, pp. 523-528, 2003
  22. C. Hori, T. Hori, H. Tsukada, H. Isozaki, Y. Sasaki, and E. Maeda, “Spoken Interactive ODQA System: SPIQA,” in Proc. ACL2003, Companion Volume to the Proceedings of the Conference, pp.153-156, 2003.
  23. T. Hori, C. Hori, and Y. Minami, “Speech summarization using weighted finite-state transducers,” in Proc. Eurospeech2003, pp.2817-2820, 2003.
  24. C. Hori, T. Hori, and S. Furui, “Evaluation methods for automatic speech summarization,” in Proc. Eurospeech2003, pp. 2825-2828, 2003.

2002

Peer-reviewed Conference Papers

  1. S. Araki, S. Makino, R. Mukai, Y. Hinamoto, T. Nishikawa and H. Saruwatari, “Equivalence between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Beamforming,” ICASSP2002, vol. II, pp. 1785-1788, May 2002.
  2. Y. Hinamoto(NAIST), T. Nishikawa(NAIST), H. Saruwatari(NAIST), S. Araki , S. Makino, and R. Mukai, “Equivalence between Frequency Domain Blind Source Separation and Adaptive Beamforming,” Proc. ICFS2002 (The International Conference on Fundamentals of Electronics, Communications and Computer Sciences), R-1, pp. 13-18, March 2002.
  3. R. Mukai, S. Araki, H. Sawada, S. Makino, “Removal of Residual Cross-talk Components in Blind Source Separation using Time-delayed Spectral Subtraction,” ICASSP2002, vol. II, pp.1789-1792, May 2002.
  4. H. Sawada, R. Mukai, S. Araki, S. Makino, ” “Polar Coordinate based Nonlinear Function for Frequency-Domain Blind Source Separation, ICASSP2002, vol. I, pp. 1001-1004, May 2002.
  5. S. Araki, S. Makino, R. Aichner, T. Nishikawa(NAIST), and H. Saruwatari(NAIST), “Blind Source Separation for Convolutive Mixtures of speech using subband processing,” SMMSP2002(Second International Workshop on Spectral Methods and Multirate Signal Processing), pp.195-202, September 2002.
  6. H. Sawada, S. Araki, R. Mukai, S. Makino, “Blind Source Separation with Different Sensor Spacing and Filter Length for Each Frequency Range,” NNSP2002, pp. 465-474, 2002.
  7. R. Aichner, S. Araki, S. Makino, T. Nishikawa(NAIST), and H. Saruwatari(NAIST), “Time domain Blind Source Separation of non-stationary convolved signals by utilizing geometric beamforming,” NNSP2002, pp. 445-454, 2002.
  8. R. Mukai, S. Araki, H. Sawada, S. Makino, “Removal of Residual Cross-talk Components in Blind Source Separation using LMS Filters,” NNSP2002, pp. 435-444, 2002.
  9. S. Makino, S. Araki, R. Mukai, H. Sawada, H. Saruwatari (NAIST), “ICA-Based Source Separation of Sounds,” Proc. of 2002 China-Japan Joint Conference on Acoustics, Vol.21, pp. 83-86, 2002.
  10. S. Watanabe, Y. Minami, A. Nakamura and N. Ueda, “Application of Variational Bayesian Approach to Speech Recognition,” NIPS'02 MIT Press 2002.
  11. S. Watanabe, Y. Minami, A. Nakamura and N. Ueda, ] “Constructing Shared-State Hidden Markov Models Based on a Bayesian Approach,” Proc. ICSLP'02, vol. 4, pp. 2669-2672, 2002.

2001

Peer-reviewed Conference Papers

  1. S. Araki, S. Makino, T. Nishikawa, and H. Saruwatari, “Limitation of Frequency Domain Blind Source Separation for Convolutive Mixture of Speech,” International Workshop on Hands-Free Speech Communication, April 2001.
  2. S. Araki, S. Makino, T. Nishikawa, and H. Saruwatari, “Fundamental Limitation of Frequency Domain Blind Source Separation for Convolutive Mixture of Speech,” IEEE International Conference on Acoustics, Speech, and Signal (ICASSP2001), pp.2737-2740, May, 2001
  3. S. Araki, S. Makino, R. Mukai, and H. Saruwatari, “Equivalence between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Beamformers,” Consistent & Reliable Acoustic Cues for Sound Analysis (CRAC), September 2001.
  4. S. Araki, S. Makino, R. Mukai, and H. Saruwatari, “Equivalence between Frequency Domain Blind Source Separation and Frequency Domain Adaptive Null Beamformers,” 7th European Conference on Speech Communication and Technology (Eurospeech2001), vol.4, pp 2595-2598, September 2001.
  5. R. Mukai, S. Araki and S. Makino, “Separation and Dereverberation Performance of Frequency Domain Blind Source Separation for Speech in a Reverberant Environment,” Eurospeech 2001, pp. 2599-2603, September 2001.
  6. R. Mukai, S. Araki and S. Makino, “Separation and Dereverberation Performance of Frequency Domain Blind Source Separation in a Reverberant Environment,” IWAENC 2001, pp. 127-130, September 2001.
  7. S. Araki, S. Makino, R. Mukai, T. Nishikawa, and H. Saruwatari, “Fundamental limitation of frequency domain Blind Source Separation for convolved mixture of speech,” 3rd International Conference on INDEPENDENT COMPONENT ANALYSIS and BLIND SIGNAL SEPARATION (ICA2001) pp.132-137, December 2001.
  8. R. Mukai, S. Araki and S. Makino, “Separation and Dereverberation Performance of Frequency Domain Blind Source Separation,” ICA2001, pp. 230-235, December 2001.
  9. H. Sawada, R. Mukai, S. Araki, S. Makino, “A Polar-Coordinate based Activation Function for Frequency Domain Blind Source Separation,” ICA2001, pp. 663-668, December 2001.

Conferences(We organized)

Software tools (distributed for sole purpose of evaluation)

Members

Last Update: 1/14/2025

Related Research Groups