2021
Journal Papers
- T. Nakatani, C. R. Haeb-Umbach, J. Heymann, L.Drude, S. Watanabe, M. Delcroix, T. Nakatani, "Far-Field Automatic Speech Recognition," Proceedings of the IEEE, Volume: 109, Issue: 2, pp. 124-148, Feb. 2021.
- N. Ito, R. Ikeshita, H. Sawada and T. Nakatani, "A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021.
- R. Ikeshita, T. Nakatani and S. Araki, "Block Coordinate Descent Algorithms for Auxiliary-Function-Based Independent Vector Extraction," IEEE Transactions on Signal Processing, 2021.
- R. Ikeshita and T. Nakatani, "Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation," IEEE Signal Processing Letters, vol. 28, pp. 972-976, 2021.
- R. Ikeshita, N. Kamo and T. Nakatani, "Blind Signal Dereverberation Based on Mixture of Weighted Prediction Error Models," IEEE Signal Processing Letters, vol. 28, pp. 399-403, 2021.
Peer-reviewed Conference Papers
- C. Li, Y. Luo, C. Han, J. Li, T. Yoshioka, T. Zhou, M. Delcroix, K. Kinoshita, C. Boeddeker, Y. Qian, S. Watanabe, and Z. Chen, "Dual-Path RNN for Long Recording Speech Separation," in Proc. 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 865-872.
- K. Zmolikova, M. Delcroix, L. Burget, T. Nakatani, and J. H. Černocky, "Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation," in Proc. 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 889-896.
- H. Sato, T. Ochiai, K. Kinoshita, M. Delcroix, T. Nakatani, S. Araki, "Multimodal Attention Fusion for Target Speaker Extraction," in Proc. IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784.
- C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, D. Kolossa, "Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization," in Proc. 2020 28th European Signal Processing Conference (EUSIPCO) , 2021, pp. 231-235.
- S. Watanabe, F. Boyer, X. Chang, P. Guo, T. Hayashi, Y. Higuchi, T. Hori, W. -C Huang, H. Inaguma, N. Kamo, S. Karita, C. Li, J. Shi, A. S. Subramanian, W. Zhang, "The 2020 ESPnet Update: New Features, Broadened Applications, Performance Improvements, and Future Plans," in Proc. 2021 IEEE Data Science & Learning Workshop (DSLW), 2021.
- J. Wissing, B. Boenninghoff, D. Kolossa, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and C. Schymura "Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4705-4709.
- C. Li, Z. Chen, Y. Luo, C. Han, T. Zhou, K. Kinoshita, M. Delcroix, S. Watanabe, and Y. Qian, "Dual-Path Modeling for Long Recording Speech Separation in Meetings," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5739-5743.
- P. Guo, F. Boyer, X. Chang, T. Hayashi, Y. Higuchi, H. Inaguma, N. Kamo, C. Li, D. Garcia-Romero, J. Shi, J. Shi, S. Watanabe, K. Wei, W. Zhang, and Y. Zhang, "Recent Developments on ESPnet Toolkit Boosted by Conformer," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5874-5878.
- M. Delcroix, K. Zmolikova, T. Ochiai, K. Kinoshita, and T. Nakatani, "Speaker Activity Driven Neural Speech Extraction," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6099-6103.
- T. Ochiai, M. Delcroix, T. Nakatani, R. Ikeshita, K. Kinoshita, and S. Araki, "Neural Network-Based Virtual Microphone Estimator," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6114-6118.
- A. Ogawa, N. Tawara, T. Kano, and M. Delcroix, "BLSTM-Based Confidence Estimation for End-to-End Speech Recognition," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6383-6387.
- N. Tawara, A. Ogawa, Y. Kitagishi, and H. Kamiyama, "Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6963-6967.
- K. Kinoshita, M. Delcroix and N. Tawara, "Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7198-7202.
- T. Moriya, T. Ashihara, T. Tanaka, T. Ochiai, H. Sato, A. Ando, Y. Ijima, R. Masumura, and Y. Shinohara, "SimpleFlat: A simple whole-network pre-training approach for RNN transducer-based end-to-end speech recognition," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5664-5668.
- C. Boeddeker, W. Zhang, T. Nakatani, K. Kinoshita, T. Ochiai, M. Delcroix, N. Kamo, Y. Qian, R. Haeb-Umbach, "Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 8428-8432.
- W. Zhang, C. Boeddeker, S. Watanabe, T. Nakatani, M. Delcroix, K. Kinoshita, T. Ochiai, N. Kamo, R. Haeb-Umbach, Y. Qian, "End-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend," in Proc. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6898-6902.