### 2020

### Journal Papers

- K. Yamamoto, T. Irino, S. Araki, K. Kinoshita, T. Nakatani, "Speech intelligibility prediction using a multi-resolution Gammachirp envelope distortion index with common parameters for different noise conditions," Acoust. Sci. & Tech., Vol. 41 (1), pp. 396-399, Jan 2020.
- T Nakatani, C Boeddeker, K Kinoshita, R Ikeshita, M Delcroix, "Jointly optimal denoising, dereverberation, and source separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 28, pp. 2267-2282, 2020.
- S. Emura, H. Sawada, S. Araki, N. Harada, "Multi-delay sparse approach to residual crosstalk reduction for blind source separation, "IEEE Signal Processing Letters, vol. 27, pp.1630—1634, Sept. 2020.
- N. Ito and S. Godsill, "A Multi-Target Track-Before-Detect Particle Filter Using Superpositional Data in Non-Gaussian Noise," IEEE Signal Processing Letters, vol. 27, pp. 1075-1079, 2020.

### Peer-reviewed Conference Papers

- K. Kinoshita, T. Ochiai, M. Delcroix, and T. Nakatani, "Improving noise robust automatic speech recognition with single-channel time-domain enhancement network," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7009-7013.
- T. von Neumann, K. Kinoshita, L. Drude, C. Boeddeker, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "End-to-end training of time domain audio separation and recognition," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7004-7008.
- N. Tawara, A. Ogawa, T. Iwata, M. Delcroix, and T. Ogawa, "Frame-level phoneme-invariant speaker embedding for text-independent speaker recognition on extremely short utterances," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6799-6803.
- N. Tawara, H. Kamiyama, S. Kobashikawa, and A. Ogawa, "Improving speaker-attribute estimation by voting based on speaker cluster information," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6594-6598.
- T. Nakatani, R. Takahashi, T. Ochiai, K. Kinoshita, R. Ikeshita, M. Delcroix, and S. Araki, "DNN-supported mask-based convolutional beamforming for simultaneous denoising, dereverberation, and source separation," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6399-6403.
- T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, and S. Araki, "Beam-Tasnet: Time-domain audio separation network meets frequency-domain beamformer," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6384-6388.
- M. Delcroix, T. Ochiai, K. Zmolikova, K. Kinoshita, N. Tawara, T. Nakatani, and S. Araki, "Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 691-695.
- T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, "Convergence-guaranteed independent positive semidefinite tensor analysis based on student's t distribution," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 681-685.
- R. Ikeshita, T. Nakatani, and S. Araki, "Overdetermined independent vector analysis," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 591-595.
- C. Schymura, T. Ochiai, M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, and D. Kolossa, "A dynamic stream weight backprop Kalman filter for audiovisual speaker tracking," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 581-585.
- K. Kinoshita, M. Delcroix, S. Araki, and T. Nakatani, "Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 381-385.
- C. Boeddeker, T. Nakatani, K. Kinoshita, and R. Haeb-Umbach, "Jointly optimal dereverberation and beamforming," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 216-220.
- Y. Koizumi, K. Yatabe, M. Delcroix, Y. Masuyama, and D. Takeuchi, "Speech enhancement using self-adaptation and multi-head self-attention," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 181-185.
- S. Emura, H. Sawada, S. Araki, and N. Harada, "A frequency-domain BSS method based on l1 norm, unitary constraint, and cayley transform," in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 111-115.
- A. Aroudi, M. Delcroix, T. Nakatani, K. Kinoshita, S. Araki, and S. Doclo, "Cognitive-Driven Convolutional Beamforming Using EEG-Based Auditory Attention Decoding," in Proc. 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), 2020, pp. 1-6.
- T. Nakatani, R. Ikeshita, K. Kinoshita, H. Sawada, and S. Araki, "Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 91-95.
- K. Arai, S. Araki, A. Ogawa, K. Kinoshita, T. Nakatani, and T. Irino, "Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System" in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 1156-1160.
- T. Ochiai, M. Delcroix, Y. Koizumi, H. Ito, K. Kinoshita, and S. Araki, "Listen to What You Want: Neural Network-based Universal Sound Selector" in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 1441-1445.
- K. Kinoshita, T. von Neumann, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "Multi-path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 2652-2656.
- T. von Neumann, C. Boeddeker, L. Drude, K. Kinoshita, M. Delcroix, T. Nakatani, and R. Haeb-Umbach, "Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 3097-3101.
- A. Ogawa, N. Tawara, and M. Delcroix, "Language Model Data Augmentation Based on Text Domain Transfer," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 4926-4930.
- T. Moriya, T. Ochiai, S. Karita, H. Sato, T. Tanaka, T. Ashihara, R. Masumura, Y. Shinohara, and M. Delcroix, "Self-distillation for improving CTC-Transformer-based ASR systems," in Proc. the 21th Annual Conference of the International Speech Communication Association (Interspeech), 2020, pp. 546–550.