Media Information Laboratory

[Japanese|English]

Message

Dr.Akisato Kimura Executive Manager

Dr.Akisato Kimura
Executive Manager

The Media Information Laboratory is organized into five research groups: media recognition, signal processing, computational modeling, biomedical informatics, and computing theory. We are promoting basic researches on information processing technology and fundamental principles related to "media", which is a medium for transmitting information in communication.

"Media" is a medium for transmitting information in communication among people or between people and computers. It can also be regarded as data obtained by observing various information in the real world and virtual world. Based on this idea, not only sounds and images that can be observed through sight and hearing, various observable data from real and cyber worlds can be subject to media information processing.

In this way, we take a broader view of the state of media information processing. We are aiming to approach the fundamental principle of communication and to develop technologies that enrich our lives in the real world and virtual world --- by bringing together the experience and knowledge of experts in a wide range of fields such as real-world measurement, modeling, signal processing, media recognition understanding, media generation, and the basic mathematical theories and algorithms that support them.

News

  • March 2024

    The following two papers have been accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2024).
    ・Yu Mitsuzumi, Akisato Kimura, Hisashi Kashima, "Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective"
    ・Takuhiro Kaneko, "Improving Physics Augmented Continuum Neural Radiance Fileds-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization"

  • March 2024

    Our paper “Geometrically-regularized fast independent vector extraction by pure majorization-minimization” has been accepted to IEEE Transactions on Signal Processing.
    https://ieeexplore.ieee.org/document/10466407

  • February 2024

    The following 5 papers have been accepted to satellite workshops in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2024).
    ・Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Jan Cernocky, "Probing Self-supervised Learning Models with Target Speech Extraction"
    ・Thilo von Neumann, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix, Reinhold Haeb-Umbach, "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization"
    ・Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, Tetsuya Ueda, Shoji Makino, "Diffusion model-based MIMO speech denoising and dereverberation"
    ・Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani and Shoko Araki, "ENSEMBLE INFERENCE FOR DIFFUSION MODEL-BASED SPEECH ENHANCEMENT"
    ・Bo He, Shiqi Zhang, Xianrui Wang, Zheng Qiu, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada, Shoji Makino, “Light Gated Multi Mini-patch Extractor for Audio Classification”
    Also, the following 2 papers have been accepted to Show and Tell Demos in ICASSP2024.
    ・Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino “Target Speech Spotting and Extraction Based on ConceptBeam”
    ・Thilo von Neumann, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach, "MeetEval, Show Me the Errors! Interactive Visualization of Transcript Alignments for the Analysis of Conversational ASR"

  • February 2024

    Our paper “Warped diffusion for laten differentiation inference” has been accepted to International Conference on Artificial Intelligence and Statistics (AISTATS2024).
    https://proceedings.mlr.press/v238/nakano24a.html

  • January 2024

    Our paper “A motivic construction of the de Rham-Witt complex” has been accepted to Journal of Pure and Applied Algebra. This is a joint work with the University of Tokyo.
    https://www.sciencedirect.com/science/article/pii/S0022404923002840

  • December 2023

    Our paper “Efficient algorithm for K-multiple-means” has been accepted to ACM SIGMOD International Conference on Management of Data (SIGMOD2024). This is a joint work with NTT Computer and Data Science Laboratories and NTT Human Informatics Laboratories.
    https://dl.acm.org/doi/10.1145/3639273

  • December 2023

    The following 13 papers have been accepted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2024).
    ・Naohiro Tawara, Marc Delcroix, Atsushi Ando, Atsunori Ogawa, “NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization”
    ・Dominik Klement, Mireia Diez, Federico Landini, Lukas Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara, “Discriminative Training of VBx Diarization”
    ・Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan Cernocky, “Target Speech Extraction with Pre-Trained Self-Supervised Learning Models”
    ・William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe, “Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing”
    ・Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino, “Neural Network-Based Virtual Microphone Estimation with Virtual Microphone and Beamformer-Level Multi-Task Loss”
    ・Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri, “How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts?”
    ・Keigo Wakayama, Tsubasa Ochiai, Marc Delcroix, Masahiro Yasuda, Shoichiro Saito, Shoko Araki, Akira Nakayama, “Online Target Sound Extraction with Knowledge Distillation from Partially Non-Causal Teacher”
    ・Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami, Yusuke Ijima, “What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis”
    ・Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima, “Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters”
    ・Shiqi Zhang, Daiki Takeuchi, Noboru Harada, Shoji Makino, “Unrestricted Global-Phase-Bias Aware Single-channel Speech Enhancement with Conformer-based Metric GAN”
    ・Yuto Kondo, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, “Selecting N-Lowest Scores for Training MOS Prediction Models“
    ・Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, “Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator”
    ・Masahiro Nakano, Ryohei Shibue, Kunio Kashino, “Sunflower Strategy for Bayesian Relational Data Analysis”

  • December 2023

    Our paper “blind and spatially-regularized online joint optimization of source seperation, dereverberation, and noise reduction” has been accepted to IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
    https://ieeexplore.ieee.org/document/10384838

  • December 2023

    Our paper “Variational autoencoder-based neural electrocardiogram synthesis trained by FEM-based heart simulator” has been accepted to Cardiovascular Digital Health Journal.
    https://www.cvdigitalhealthjournal.com/article/S2666-6936(23)00110-X/fulltext

  • December 2023

    Our paper “Gene correction and overexpression of TNNI3 improve impaired relaxation in engineered heart tissue model of pediatric restrictive cardiomyopathy” has been accepted to Developemtn, Growth & Differentiation. This is a joint work with Osaka University.
    https://onlinelibrary.wiley.com/doi/10.1111/dgd.12909

  • December 2023

    Our paper “Probabilistic state synthesis based on optimal convex approximation” has been accepted to Quantum Information.
    https://www.nature.com/articles/s41534-023-00793-7

  • December 2023

    Our paper “Fidelity-estimation method for graph states with depolarizing noise” has been accepted to Physical Review Research. This is a joint work with Chuo University.
    https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.5.043260

Research groups

Research Index

Publications

2024

Journal Papers

  1. Seiseki Akibue, Go Kato & Seiichiro Tani (2024). Probabilistic state synthesis based on optimal convex approximation. Quantum Information, 10 (1).
  2. Koizumi Junnosuke & Miyazaki Hiroyasu (2024). A motivic construction of the de Rham-Witt complex. Journal of Pure and Applied Algebra, 228 (6), 107602.
  3. Ryo Nishikimi, Masahiro Nakano, Kunio Kashino & Shingo Tuskada (2024). Variational Autoencoder-Based Neural Electrocardiogram Synthesis Trained by FEM-Based Heart Simulator. Cardiovascular Digital Health Journal, 5 (1), 19-28.
  4. Moyu Hasegawa, Kenji Miki, Takuji Kawamura, Ikue Sasozaki, Yuki Hikashiyama, Masaru Tuchida, Kunio Kashino, Masaki Taira, Emiko Ito, Maki Tkeda, Hidekazu Ishida, Shuichiro Higo, Yasushi Sakata & Shigeru Miyagawa (2024). Gene correction and overexpression of TNNI3 improve impaired relaxation in engineered heart tissue model of pediatric restrictive cardiomyopathy. Development, Growth & Differentiation, 66 (2), 119-132.
  5. Yu Mitsuzumi, Go Irie, Akisato Kimura & Atsushi Nakazawa (2024). Phase Randomization: A Data Augmentation for Domain Adaptation in Human Action Recognition. Pattern Recognition, 146.
  6. Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki & Shoji Makino (2024). Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 32, 1157-1172.
  7. Rintaro Ikeshita & Tomohiro Nakatani (2024). Geometrically-Regularized Fast Independent Vector Extraction by Pure Majorization-Minimization. IEEE Transactions on Signal Processing, 72, 1560-1575.

Peer-reviewed Conference Papers

  1. Masahiro Nakano, Ryohei Shibue & Kunio Kashino (2024). Sunflower Strategy for Bayesian Relational Data Analysis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  2. Masahiro Nakano, Hiroki Sakuma, Ryo Nishikimi, Ryohei Shibue, Takashi Sato & Kunio Kashino (2024). Warped Diffusion for Latent Differentiation Inference. International Conference on Artificial Intelligence and Statistics (AISTATS). Valencia, Spain.
  3. Yasuhiro Fujiwara, Atsutoshi Kumagai, Yasutoshi Ida, Masahiro Nakano, Makoto Nakatsuji & Akisato Kimura (2024). Efficient Algorithm for K-Multiple-Means. ACM SIGMOD International Conference on Management of Data. Santiago, Chile.
  4. Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada & Kunio Kashino (2024). Target Speech Spotting and Extraction Based on ConceptBeam. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  5. Yu Mitsuzumi, Akisato Kimura & Hisashi Kashima (2024). Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA.
  6. Takuhiro kaneko (2024). Improving Physics Augmented Continuum Neural Radiance Fileds-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA.
  7. Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix & Naohiro Tawara (2024). Discriminative Training of VBx Diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  8. Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki & Jan Cernocky (2024). Target Speech Extraction with pre-trained self-supervised learning models. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  9. William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2024). Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  10. Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada & Shoji Makino (2024). Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  11. Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki & Shigeru Katagiri (2024). How does end-to-end speech recognition training impact speech enhancement artifacts?. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  12. Keigo Wakayama, Tsubasa Ochiai, Marc Delcroix, Masahiro Yasuda, Shoichiro Saito, Shoko Araki & Akira Nakayama (2024). Online Target Sound Extraction with Knowledge Distillation from Partially Non-Causal Teacher. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  13. Takanori Ashihara, Marc Delcroix, Takafumi Moriya, Kohei Matsuura, Taichi Asami & Yusuke Ijima (2024). What do self-supervised speech and speaker models learn? New findings from a cross model layer-wise analysis. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  14. Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya & Yusuke Ijima (2024). Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  15. Naohiro Tawara, Marc Delcroix, Atsushi Ando & Atsunori Ogawa (2024). NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  16. Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki & Jan Cernocky (2024). Probing Self-supervised Learning Models with Target Speech Extraction. ICASSP2024 Satellite Workshop on Self-supervision in Audio, Speech, and Beyond (SASB). Seoul, Korea.
  17. Thilo von Neumann, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2024). MeetEval, Show Me the Errors! Interactive Visualization of Transcript Alignments for the Analysis of Conversational ASR. Show & Tell Demo, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Seoul, Korea.
  18. Thilo von Neumann, Christoph Cord-Landwehr Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2024). Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization. ICASSP2024 Satellite Workshop on Hands-Free Speech Communication and Microphone Array (HSCMA). Seoul, Korea.
  19. Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Delcroix Marc, Shoko Araki, Tetsuya Ueda & Shoji Makino (2024). Diffusion model-based MIMO speech denoising and dereverberation. ICASSP2024 Satellite Workshop on Hands-Free Speech Communication and Microphone Array (HSCMA) Workshop. Seoul, Korea.
  20. Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani & Shoko Araki (2024). ENSEMBLE INFERENCE FOR DIFFUSION MODEL-BASED SPEECH ENHANCEMENT. ICASSP2024 Satellite Workshop on Hands-Free Speech Communication and Microphone Array (HSCMA). Seoul, Korea.
  21. Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada & Shoji Makino (2024). Unrestricted Global-Phase-Bias Aware Single-channel Speech Enhancement with Conformer-based Metric GAN. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  22. Yuto Kondo, Hirokazu Kameoka, Kou Tanaka & Takuhiro Kaneko (2024). SELECTING N-LOWEST SCORES FOR TRAINING MOS PREDICTION MODELS. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  23. Takuhiro Kaneko, Hirokazu Kameoka & Kou Tanaka (2024). Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea.
  24. Bo He, Shiqi Zhang, Xianrui Wang, Zheng Qiu, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada & Shoji Makino (2024). Light Gated Multi Mini-patch Extractor for Audio Classification. ICASSP2024 Satellite Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2024).

2023

Journal Papers

  1. Hiroaki Matsunaga, Tomohiro Yendo, Wataru Kihara, Yoshifumi Shiraki, Takashi G. Sato & Takehiro Moriya (2023). I/Q Demodulator based Optical Camera Communicatio. IEEE Photonics Journal, 153, 1138-1146.
  2. Akihiro Mizutani, Yuki Takeuchi & Kiyoshi Tamaki (2023). Finite-key Security Analysis of Differential-Phase-Shift Quantum Key Distribution. Physical Review Research, 5 (2).
  3. Cid Reyes-Bustos & Masato Wakayama (2023). Covering families of the asymmetric quantum Rabi model: η-shifted non-commutative harmonic oscillators. Communications in Mathematical Physics, 403, 1429-1476.
  4. Cid Reyes-Bustos (2023). The heat kernel of the asymmetric quantum Rabi model. Journal of Physics A: Mathematical and Theoretical, 56 (42).
  5. Shane Kelly & Hiroyasu Miyazaki (2023). Hodge cohomology with a ramification filtration, I. Mathematische Zeitschrift, 305 (70).
  6. Shuji Horinaga & Hiroaki Narita (2023). Cuspidal components of Siegel modular forms for large discrete series representations of Sp_4(R). Manuscripta Mathematica, (13).
  7. Kazuma Takeda, Yasutomo Kawanishi, Takatsugu Hirayama, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase & Kunio Kashino (2023). Estimation of Targets' Locations and Attention Degrees by Spatio-temporal Integration of Audiences' Facial Orientations. IEICE Transactions on Information and Systems, J106-A (3), 58-69.
  8. Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana & Seiichi Uchida (2023). Deep attentive time warping. Pattern Recogntiion, 136.
  9. Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura & Naonori Ueda (2023). Efficient Network Representation Learning via Cluster Similarity. Data Science and Engineering, 8, 279-291.
  10. Naoki Chihara, Tadafumi Takata, Yasuhiro Fujiwara, Koki Noda, Keisuke Toyoda, Kaito Higuchi & Makoto Onizuka (2023). Effective Detection of Variable Celestial Objects Using Machine Learning-based Periodic Analysis. Astronomy and Computing, 45.
  11. Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Cernocky & Dong Yu (2023). Neural Rarget Speech Extraction: An Overview. IEEE Signal Processing Magazine, 40 (3), 8-29.
  12. Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani & Shoko Araki (2023). Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 31, 835-848.
  13. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2023). Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection. IEEE Access, 11, 13906-13917.
  14. Phuc Duc Nguyen, Yoshifumi Shiraki, Kenji Ishikawa, Jun Muramatsu, Noboru Harada & Takehiro Moriya (2023). Distribution Matching for Dimming Control in Visible-Light Region-of-Interest Signaling. IEEE Photonics Journal, 15 (1), 1-14.
  15. Denny Hermawanto, Kenji Ishikawa, Kohei Yatabe & Yasuhiro Oikawa (2023). Determination of Microphone Acoustic Center from Sound Field Projection Measured by Optical Interferometry. The Journal of the Acoustical Society of America, -.
  16. Shogo Seki, Hirokazu Kameoka, Takuhiro Kaneko & Kou Tanaka (2023). Non-parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder. IEEE Access, 11, 44590-44599.
  17. Samuel A. Verburg, Kenji Ishikawa, Efren Fernandez-Grande & Yasuhiro Oikawa (2023). A Century of Acousto-Optics: From Early Discoveries to Modern Sensing of Sound with Light. Acoustics Today, 19 (3), 54-62.
  18. Ryosuke Sugiura, Yutaka Kamamoto & Takehiro Moriya (2023). General form of almost instantaneous fixed-to-variable-length codes and optimal code tree construction. IEEE Transactions on Information Theory, 69 (12).
  19. Kenji Ishikawa, Yoshifumi Shiraki, Takehiro Moriya, Atsushi Ishizawa, Kenichi Hitachi & Katsuya Oguri (2023). Comprehensive Noise Analysis for Acousto-optic Measurement of Airborne Sound. IEEE Trans on Instrumentation and Measurement, 73 (7000309).

Peer-reviewed Conference Papers

  1. Shuji Horinaga (2023). Cuspidal Components of Siegel Modular Forms for Large Discrete Series Representations. π∞. Sendai, Japan.
  2. Ryo Hiromasa, Akihiro Mizutani, Yuki Takeuchi & Seiichiro Tani (2023). Rewindable Quantum Computation and Its Equivalence to Cloning and Adaptive Postselection. Proc. Theory of Quantum Computation, Communication and Cryptography (TQC). Aveiro, Portugal.
  3. Yuki Takeuchi, Yasuhiro Takahashi, Tomoyuki Morimae & Seiichiro Tani (2023). Divide-and-Conquer Verification Method for Noisy Intermediate-Scale Quantum Computation. Proc. Asian Quantum Information Science Conference (AQIS). Seoul, Korea.
  4. Hiroto Kasai, Yuki Takeuchi, Hideaki Hakoshima, Yuichiro Matsuzaki & Yasuhiro Tokura (2023). Anonymous Quantum Sensing. Proc. The Seventeenth International Conference on Quantum, Nano/Bio, and Micro Technologies(ICQNM 2023). Porto, Portugal.
  5. Ryosuke Nakahama (2023). Holographic and symmetry breaking operators of holomorphic discrete series representations for (SU(3,3), SO*(6)). Proc. Geometric and Harmonic Analysis on Homogeneous Spaces and Applications. Monastir, Tunisia.
  6. Seiseki Akibue, Go Kato & Seiichiro Tani (2023). Optimal convex approximation of quantum superposition and its application in reshaping compilation errors. Proc. Quantum Innovation. Tokyo, Japan.
  7. Yuki Takeuchi (2023). Quantum Computation and Sensing on Network. Proc. The International Symposium on Wireless Personal Multimedia Communications(WPMC2023). Tampa, USA.
  8. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2023). Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes,Greek.
  9. Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura & Naonori Ueda (2023). Efficient Network Representation Learning via Cluster Similarity. Proc. International Conference on Database Systems for Advanced Applications (DASFAA). Tianjin, China.
  10. Xiaomeng Wu, Yongqing Sun & Akisato Kimura (2023). Deep Quantigraphic Image Enhancement via Comparametric Equations. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes,Greek.
  11. Yuto Shibata, Yutaka Kawashima, Mariko Isogawa, Go Irie, Akisato Kimura & Yoshimitsu Aoki (2023). Listening Human Behavior: 3D Human Pose Estimation with Acoustic Signals. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada.
  12. Shogo Sato, Yasuhiro Yao, Taiga Yoshida, Takuhiro Kaneko, Shingo Ando & Jun Shimamura (2023). Unsupervised Intrinsic Image Decomposition with LiDAR Intensity. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada.
  13. Shohei Matsugu, Yasuhiro Fujiwara & Hiroaki Shiokawa (2023). Uncovering the Largest Community in Social Networks at Scale. Proc. International Joint Conference on Artificial Intelligence (IJCAI). Cape Town, South Africa.
  14. Takuhiro Kaneko (2023). MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields. Proc. IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France.
  15. Ayaka Ideno, Takuhiro Kaneko & Tatsuya Harada (2023). Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion. Proc. ACM International Conference on Multimodal Interaction (ICMI). Paris, France.
  16. Rentaro Kataoka, Akisato Kimura & Seiichi Uchida (2023). Towards defensive letter design. Proc. Asian Conference on Pattern Recognition (ACPR). Kitakyushu, Japan.
  17. Hayato Mitani, Akisato Kimura & Seiichi Uchida (2023). Selective scene text removal. Proc. British Machine Vision Conference (BMVC). Aberdeen, Britain.
  18. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura & Shinji Watanabe (2023). Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  19. Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara & Marc Delcroix (2023). Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  20. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix & Ryo Masumura (2023). Leveraging Large Text Corpora for End-to-End Speech Summarization. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  21. Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix & Reinhold Haeb-Umbach (2023). On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
  22. Taishi Nakashima, Rintaro Ikeshita, Nobutaka Ono, Shoko Araki & Tomohiro Nakatani (2023). Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
  23. Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget & Shoko Araki (2023). Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. Proc. Interspeech. Dublin, Ireland.
  24. Naoyuki Kamo, Marc Delcroix & Tomohiro Nakatani (2023). Target Speaker Extraction with Conditional Diffusion Model. Proc. Interspeech. Dublin, Ireland.
  25. Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani & Toshio Irino (2023). Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine. Proc. Interspeech. Dublin, Ireland.
  26. Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka & Nobukatsu Hojo (2023). Downstream Task Agnostic Speech Enhancement Conditioned on Self-Supervised Representation Loss. Proc. Interspeech. Dublin, Ireland.
  27. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa & Taichi Asami (2023). Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data. Proc. Interspeech. Dublin, Ireland.
  28. Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix & Yukinori Honma (2023). SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?. Proc. Interspeech. Dublin, Ireland.
  29. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa & Marc Delcroix (2023). Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization. Proc. Interspeech. Dublin, Ireland.
  30. Hikaru Yanagida, Yusuke Ijima & Naohiro Tawara (2023). Influence of Personal Traits on Impressions of One's Own Voice. Proc. Interspeech. Dublin, Ireland.
  31. Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura & Taichi Asami (2023). What are differences? Comparing DNN and human by their performance and characteristics in speaker age estimation. Proc. Interspeech. Dublin, Ireland.
  32. Yuki Kitagishi, Hosana Kamiyama, Naohiro Tawara, Atsunori Ogawa, Noboru Miyazaki & Taichi Asami (2023). Coarse-age loss: A new training method using coarse-age labeled data for speaker age estimation. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  33. Koharu Horii, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa & Norihide Kitaoka (2023). Language modeling for spontaneous speech recognition based on disfluency labeling and generation of disfluent text. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  34. Keigo Hojo, Daiki Mori, Yukoh Wakabayashi, Kengo Ohta, Atsunori Ogawa & Norihide Kitaoka (2023). Combining multiple end-to-end speech recognition models based on density ratio approach. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  35. Tatsunari Takagi, Atsunori Ogawa, Norihide Kitaoka & Yukoh Wakabayashi (2023). Streaming end-to-end speech recognition using a CTC decoder with substituted linguistic information. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  36. Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko & Shogo Seki (2023). Distilling sequence-to-sequence voice conversion models for streaming conversion applications. Proc. IEEE Spoken Language Technology Workshop (SLT). Doha, Qatar.
  37. Shogo Seki, Hirokazu Kameoka, Kou Tanaka & Takuhiro Kaneko (2023). JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes,Greek.
  38. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka & Shogo Seki (2023). Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes,Greek.
  39. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka & Shogo Seki (2023). iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN. Proc. Interspeech. Dublin, Ireland.
  40. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2023). Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation. Proc. Interspeech. Dublin, Ireland.
  41. Kou Tanaka, Takuhiro Kaneko, Hirokazu Kameoka & Shogo Seki (2023). CFVC: Conditional Filtering for Controllable Voice Conversion. Proc. Interspeech. Dublin, Ireland.
  42. Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi & Masahiro Yasuda (2023). First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline. Proc. European Signal Processing Conference(EUSIPCO). Helsinki, Finland.
  43. Shogo Seki, Kanami Imamura, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka & Noboru Harada (2023). W2N-AVSC: Audiovisual Extension for Whisper-to-Normal Speech Conversion. Proc. European Signal Processing Conference(EUSIPCO). Helsinki, Finland.
  44. Kou Tanaka, Hirokazu Kameoka & Takuhiro Kaneko (2023). PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder. Proc.ISCA Speech Synthesis Workshop(SSW). Grenoble, France.
  45. Boxin Liu, Shiqi Zhang, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada & Shoji Makino (2023). Masked modeling duo vision transformer with multi-layer feature fusion on respiratory sound classification. Proc. Detection and Classification of Acoustic Scenes and Events(DCASE) Workshop. Tampere, Finland.
  46. Chihiro Watanabe & Hirokazu Kameoka (2023). DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  47. Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo & Yohei Kawaguchi (2023). Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring. Proc. Detection and Classification of Acoustic Scenes and Events(DCASE) Workshop. Tampere, Finland.
  48. Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada & Kunio Kashino (2023). Similarity-discrepancy disentanglement for audio difference captioning. Proc. Detection and Classification of Acoustic Scenes and Events(DCASE) Workshop. Tampere, Finland.
  49. Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi & Masahiro Yasuda (2023). ToyADMOS2+: New Toyadmos Data and Benchmark Results of the First-Shot Anomalous Sound Event Detection Baseline. Proc. Detection and Classification of Acoustic Scenes and Events(DCASE) Workshop. Tampere, Finland.
  50. Keisuke Takazawa, Hirokazu Kameoka & Masahiro Yukawa (2023). Multiple Sound Source Tracking Based on Generative Modeling and Recursive Bayesian Filtering of Spatial Gradient Spectra. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Taipei, Taiwan.
  51. Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi & Masahiro Yasuda (2023). First-shot anomaly sound detection for machine condition monitoring: A Domain Generalization baseline. Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC). Helsinki, Finland.
  52. Haruka Nozawa, Mayuko Imanishi, Yasuhiro Oikawa & Kenji Ishikawa (2023). Physical-model-based reconstruction of three-dimensional sound field from multi-directional measurement by parallel phase-shift interferometry. Proc. The Australian Acoustical Society(Acoustics2023). Sydney, Australia.

2022

Journal Papers

  1. Ken Mano, Hideki Sakurada & Yasuyuki Tsukada (2022). Quality and quantity pair as trust metric. IEICE Transactions on Information and Systems.
  2. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations. EEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  3. Wangyou Zhang, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani, Shinji Watanabe & Yanmin Qian (2022). End-to-end dereverberation, beamforming, and speech recognition in a cocktail party. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 30, 3173-3188.
  4. Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi & Shoko Araki (2022). Soundbeam: target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  5. Kenji Ishikawa, Kohei Yatabe, Yasuhiro Oikawa, Yoshifumi Shiraki & Takehiro Moriya (2022). Speckle holographic imaging of sound field using fresnel lens. Optics Letters.
  6. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). BYOL for audio: Exploring pre-trained general-purpose audio representations. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP).
  7. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations. Proceedings of Machine Learning Research (PMLR).
  8. Li Li, Kohei Yatabe, Hirokazu Kameoka & Shoji Makino (2022). FastMVAE2: On improving and accelerating the fast variational autoencoder-based source separation algorithm for determined mixtures. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  9. X. Wu, Y. Sun, A. Kimura, and K. Kashino, "Contrast enhancement based on reflectance-oriented probabilistic equalization," Signal Processing, vol. 194, 2022.
  10. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo & Shoko Araki (2022). Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1032-1047.
  11. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2022). Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 576-589.
  12. Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Hiroto Ashihara, Tetsunori Kobayashi & Tetsuji Ogawa (2022). Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models. IEICE Transactions on Information and Systems, E105.D (1), 150-160.
  13. Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj & Sanjeev Khudanpur (2022). Joint speaker diarization and speech recognition based on region proposal networks. Computer Speech & Language, 72, 101316.

Peer-reviewed Conference Papers

  1. Masato Wakayama (2022). Quantum Interaction and number theory, representation theory - modular forms a bit beyond, infinite symmetric group, Fuchsian ODE. Painlevé Seminar.
  2. Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada & Kunio Kashino (2022). ConceptBeam: Concept driven target speech extraction. Proc. ACM International Conference on Multimedia(ACMMM). Lisbon, Portugal.
  3. Seiya Matsuda, Akisato Kimura & Seiichi Uchida (2022). Font generation with missing impression labels. in Proc. International Conference on Pattern Recognition (ICPR). Montreal Quebec, Canada.
  4. Kana Goto, Tetsuya Ueda, Li Li, Takeshi Yamada & Shoji Makino (2022). Geometrically constrained independent vector analysis with auxiliary function approach and iterative source steering. in Proc. European Signal Processing Conference (EUSIPCO). Belgrade, Serbia.
  5. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). Composing general audio representation by fusing multi-layer features of pre-trained model. in Proc. European Signal Processing Conference (EUSIPCO). Belgrade, Serbia.
  6. Natsuki Ueno & Hirokazu Kameoka (2022). Multiple sound source localization based on stochastic modeling of spatial gradient spectra. in Proc. European Signal Processing Conference (EUSIPCO). Belgrade, Serbia.
  7. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka & Shogo Seki (2022). MISRNet: Lightweight neural vocoder using multi-input single shared residual blocks. in Proc. Interspeech. Incheon, Korea.
  8. Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki & Kou Tanaka (2022). CAUSE: Crossmodal action unit sequence estimation from speech. in Proc. Interspeech. Incheon, Korea.
  9. Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada & Kunio Kashino (2022). Introducing auxiliary text query-modifier to content-based audio retrieval. in Proc. Interspeech. Incheon, Korea.
  10. Takashi Shibata, Masatoshi Okutomi & Masayuki Tanaka (2022). Robustizing object detection networks using augmented feature pooling. in Proc. Asian Conference on Computer Vision (ACCV). Macau SAR, China.
  11. Yu Moriyasu, Takashi Shibata, Masayuki Tanaka & Masatoshi Okutomi (2022). Top-K ensemble for semantic segmentation robust against unexpected degradation. Proc. IEEE International Conference on Consumer Electronics(ICCE). Bordeaux,France.
  12. Yasuhiro Fujiwara, Masahiro Nakano, Atsutoshi Kumagai, Yasutoshi Ida, Akisato Kimura & Naonori Ueda (2022). Fast binary network hashing via graph clustering. Proc. IEEE BigData. Osaka, Japan.
  13. Denny Hermawanto, Kenji Ishikawa, Kohei Yatabe & Yasuhiro Oikawa (2022). Visualization of microphone's acoustic center using phase-shifting interferometry. Proc. International Congress on Acoustics (ICA). Gyeongju,Korea.
  14. M. Nakano, R. Nishikimi, Y. Fujiwara, A. Kimura, T. Yamada, and N. Ueda, "Nonparametric relational models with superrectangulation," in Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
  15. G. Irie, T. Shibata, and A. Kimura, "Co-attention-guided bilinear model for echo-based depth estimation," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  16. T. Kaneko, K. Tanaka, H. Kameoka, and S. Seki, "Fastening and lightening convolutional mel-spectrogram vocoder using inverse short-time fourier transform," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  17. S. Seki, H. Kameoka, and L. Li, "Exploring and improving multichannel variational autoencoder for underdetermined source separation," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  18. L. Li, H. Kameoka, and S. Seki, "HBP: An efficient block permutation solver using hungarian algorithm and spectrogram inpainting for multichannel audio source separation," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  19. H. Kameoka, S. Seki, L. Li, and C. Watanabe, "AttentionPIT: Soft permutation invariant training for audio source separation with attention mechanism," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  20. T. Kaneko, "AR-NeRF: Unsupervised learning of depth and defocus effects from natural images with aperture rendering neural radiance fields," in Proc. Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  21. S. Yoneda, G. Irie, T. Shibata, M. Nishiyama, and I. Yoshio, "Deep segmentation network without mask image supervision for 2D image registration," in Proc. International Workshop on Frontiers of Computer Vision (IW-FCV), 2022.
  22. M. Ueda, A. Kimura, and S. Uchida, "Font shape-to-impression translation," in Proc. International Workshop on Document Analysis Systems (DAS), 2022.
  23. C. Kabore, M. Tsuchida, I. Suzuki, S. Sugaya, A. Kimura, and N. Harada, "Prototyping of low-cost color enhancement lighting using multicolor LEDs," in Proc. International Symposium on Electronic Imaging (EI), 2022.
  24. Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita & Tomohiro Nakatani (2022). Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  25. Naoyuki Kamo, Rintaro Ikeshita, Keisuke Kinoshita & Tomohiro Nakatani (2022). Importance of Switch Optimization Criterion in Switching WPE Dereverberation. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  26. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo & Takafumi Moriya (2022). Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  27. Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2022). Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  28. Atsunori Ogawa, Naohiro Tawara, Marc Delcroix & Shoko Araki (2022). Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  29. Keisuke Kinoshita, Marc Delcroix & Tomoharu Iwata (2022). Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  30. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2022). SA-SDR: A Novel Loss Function for Separation of Meeting Style Data. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  31. Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix & Takahiro Shinozaki (2022). Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  32. Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki & Shigeru Katagiri (2022). How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR. Proc. Interspeech 2022.
  33. Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato & Tomohiro Nakatani (2022). Listen only to me! How well can target speech extraction handle false alarms?. Proc. Interspeech 2022.
  34. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka & Ryo Masumura (2022). Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. Proc. Interspeech 2022.
  35. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2022). Streaming Target-Speaker ASR with Neural Transducer. Proc. Interspeech 2022.
  36. Martin Kocour, Katerina Zmolikova, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukas Burget & Jan Cernocky (2022). Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. Proc. Interspeech 2022.
  37. Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa & Norihide Kitaoka (2022). End-to-End Spontaneous Speech Recognition Using Disfluency Labeling. Proc. Interspeech 2022.
  38. Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker & Reinhold Haeb-Umbach (2022). Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. Proc. Interspeech 2022.
  39. Rintaro Ikeshita & Tomohiro Nakatani (2022). ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis. 2022 30th European Signal Processing Conference (EUSIPCO).
  40. Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner & Jan Honza Černocký (2022). Analysis of Impact of Emotions on Target Speech Extraction and Speech Separation. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC).
  41. Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada & Shoji Makino (2022). Neural Virtual Microphone Estimator: Application to Multi-Talker Reverberant Mixtures. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  42. Naoyuki Kamo, Kenichi Arai, Atsunori Ogawa, Shoko Araki, Tomohiro Nakatani, Keisuke Kinoshita, Marc Delcroix, Tsubasa Ochiai & Toshio Irino (2022). Speech Intelligibility Prediction through Direct Estimation of Word Accuracy Using Conformer. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  43. Kenichi Arai, Atsunori Ogawa, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, Naoyuki Kamo & Toshio Irino (2022). Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  44. Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Arai, Atsunori Ogawa, Keisuke Kinoshita & Tomohiro Nakatani (2022). Effective data screening technique for crowdsourced speech intelligibility experiments: Evaluation with IRM-based speech enhancement. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  45. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo & Shoko Araki (2022). Switching Independent Vector Extraction and Its Joint Optimization with Weighted Prediction Error Dereverberation. Proc.~of 24th INTERNATIONAL congress on acoustics (ICA2022).
  46. Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2021). Attention-Based Multi-Hypothesis Fusion for Speech Summarization. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
  47. Naohiro Tawara, Atsunori Ogawa, Yuki Kitagishi, Hosana Kamiyama & Yusuke Ijima (2021). Robust speech-age estimation using local maximum mean discrepancy under mismatched recording condition. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

Members

Executive Manager

Fellow

Senior Distinguished Researchers

Recognition Research Group

Signal Processing Research Group

Computing Theory Research Group

Computational Modeling Research Group

Biomedical Informatics Research Group

Access

Last Update: 11/14/2024