Media Information Laboratory

[Japanese|English]

Message

Dr.Akisato Kimura Executive Manager

Dr.Akisato Kimura
Executive Manager

The Media Information Laboratory is organized into five research groups: media recognition, signal processing, computational modeling, biomedical informatics, and computing theory. We are promoting basic researches on information processing technology and fundamental principles related to "media", which is a medium for transmitting information in communication.

"Media" is a medium for transmitting information in communication among people or between people and computers. It can also be regarded as data obtained by observing various information in the real world and virtual world. Based on this idea, not only sounds and images that can be observed through sight and hearing, various observable data from real and cyber worlds can be subject to media information processing.

In this way, we take a broader view of the state of media information processing. We are aiming to approach the fundamental principle of communication and to develop technologies that enrich our lives in the real world and virtual world --- by bringing together the experience and knowledge of experts in a wide range of fields such as real-world measurement, modeling, signal processing, media recognition understanding, media generation, and the basic mathematical theories and algorithms that support them.

News

  • July 2023

    Our paper “MIMO-NeRF: Fast rendering with multi-input multi-output neural radiance fields” has been accepted to IEEE/CVF International Conference on Computer Vision (ICCV2023).
    Takuhiro Kaneko, “MIMO-NeRF: Fast Neural Rendering with Multi-Input Multi-Output Neural Radiance Fields,” IEEE/CVF International Conference on Computer Vision (ICCV2023), 2023.
    https://openaccess.thecvf.com/content/ICCV2023/html/Kaneko_MIMO-NeRF_Fast_Neural_Rendering_with_Multi-input_Multi-output_Neural_Radiance_Fields_ICCV_2023_paper.html

  • July 2023

    Our paper “Frame-level event representation learning for semantic-level generation and editing of avatar motion” has been accepted to ACM International Conference on Multimodal Interaction (ICMI2023).
    Ayaka Ideno, Takuhiro Kaneko, Tatsuya Harada, “Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion” ACM International Conference on Multimodal Interaction (ICMI), 2023.
    https://dl.acm.org/doi/abs/10.1145/3577190.3614175

  • July 2023

    Our paper “Divide-and-conquer verification method for noisy intermediate-scale quantum computation” has been accepted to Asian Quantum Information Science Conference (AQIS2023).
    Yuki Takeuchi, Yasuhiro Takahashi, Tomoyuki Morimae, and Seiichiro Tani , “Divide-and-conquer verification method for noisy intermediate-scale quantum computation,” Asian Quantim Information Science Conference (AQIS), 2023.
    https://doi.org/10.22331/q-2022-07-07-758

  • June 2023

    Our paper “First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline” has been accepted to European Signal Processing Conference (EUSIPCO2023).
    Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda, “First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline,” European Signal Processing Conference (EUSIPCO), 2023.
    DOI:10.23919/EUSIPCO58844.2023.10289721
    https://ieeexplore.ieee.org/document/10289721

  • June 2023

    Our paper “W2N-AVSC: Audiovisual Extension For Whisper-To-Normal Speech Conversion” has been accepted to European Signal Processing Conference (EUSIPCO2023).
    Shogo Seki, Kanami; Imamura, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Noboru Harada, “W2N-AVSC: Audiovisual Extension For Whisper-To-Normal Speech Conversion,” European Signal Processing Conference (EUSIPCO), 2023
    DOI:10.23919/EUSIPCO58844.2023.10289823
    https://ieeexplore.ieee.org/document/10289823

  • June 2023

    Our paper “PRVAE-VC: Non-parallel many-to-many voice conversion with perturbation-resistant variational autoencoder” has been accepted to ISCA Speech Synthesis Workshop (SSW2023).
    Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, “PRVAE-VC: Non-parallel many-to-many voice conversion with perturbation-resistant variational autoencoder,” ISCA Speech Synthesis Workshop (SSW), 2023.
    https://www.isca-archive.org/ssw_2023/tanaka23_ssw.html
    DOI:10.21437/SSW.2023-14

  • May 2023

    The following 10 papers have been accepted to Interspeech 2023.
    ・Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget, Shoko Araki, ” Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization”
    ・Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, ” Target Speaker Extraction with Conditional Diffusion Model”
    ・Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino,” Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine”
    ・Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo,” Downstream Task Agnostic Speech Enhancement Conditioned on Self-Supervised Representation Loss”
    ・Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami,” Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data”
    ・Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma, ” SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?”
    ・Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, ” Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization”
    ・Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki,” iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN”
    ・Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino,” Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation”
    ・Kou Tanaka, Takuhiro Kaneko, Hirokazu Kameoka, Shogo Seki,” CFVC: Conditional Filtering for Controllable Voice Conversion”

  • May 2023

    Our paper “Finite-key security analysis of differential-phase-shift quantum key distribution” has been accepted to Physical Review Research.
    Akihiro Mizutani, Yuki Takeuchi, Kiyoshi Tamaki, ”Finite-key security analysis of differential-phase-shift quantum key distribution”, Physical Review Research, 5, 023132 – Published 30 May 2023
    Phys. Rev. Research 5, 023132 (2023) - Finite-key security analysis of differential-phase-shift quantum key distribution (aps.org)

  • April 2023

    Our paper “Uncovering the largest community in social networks at scale” has been accepted to International Joint Conference on Artificial Intelligence (IJCAI2023).
    Shohei Matsugu, Yasuhiro Fujiwara, Hiroaki Shiokawa, “Uncovering the Largest Community in Social Networks at Scale,” International Joint Conference on Artificial Intelligence (IJCAI2023), 2023.
    https://www.ijcai.org/proceedings/2023/0250

  • April 2023

    Our paper “Rewindable Quantum Computation and Its Equivalence to Cloning and Adaptive Postselection” has been accepted to Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2023).
    Ryo Hiromasa, Akihiro Mizutani, Yuki Takeuchi, Seiichiro Tani, “Rewindable Quantum Computation and Its Equivalence to Cloning and Adaptive Postselection”
    https://doi.org/10.48550/arXiv.2206.05434

  • March 2023

    Our paper “Listening human behavior: 3D human pose estimation with acoustic signals” has been accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2023).
    Yuto Shibata, Yutaka Kawashima, Mariko Isogawa, Go Irie, Akisato Kimura, Yoshimitsu Aoki, “Listening human behavior: 3D human pose estimation with acoustic signals,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
    https://openaccess.thecvf.com/content/CVPR2023/html/Shibata_Listening_Human_Behavior_3D_Human_Pose_Estimation_With_Acoustic_Signals_CVPR_2023_paper.html

  • March 2023

    Our paper “Unsupervised intrinsic image decomposition with LiDAR intensity” has been accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2023).
    Shogo Sato, Yasuhiro Yao, Taiga Yoshida, Takuhiro Kaneko, Shingo Ando, Jun Shimamura, “Unsupervised intrinsic image decomposition with LiDAR intensity,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
    https://openaccess.thecvf.com/content/CVPR2023/html/Sato_Unsupervised_Intrinsic_Image_Decomposition_With_LiDAR_Intensity_CVPR_2023_paper.html

  • February 2023

    The following 9 papers have been accepted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2023).
    ・Xiaomeng Wu, Yongqing Sun, Akisato Kimura, “Deep quantigraphic image enhancement via comparametric equations.”
    ・Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara, Marc Delcroix, “Iterative shallow fusion of backward language model for end-to-end speech recognition”
    ・Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura, Shinji Watanabe, “Speech summarization of long spoken document: Improving memory efficiency of speech/text encoders”
    ・Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura, “LEVERAGING LARGE TEXT CORPORA FOR END-TO-END SPEECH SUMMARIZATION”
    ・Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach, “On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems”
    ・Taishi Nakashima, Rintaro Ikeshita, Nobutaka Ono, Shoko Araki, Tomohiro Nakatani, ” Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis”
    ・Shogo Seki, Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, ” JSV-VC: JOINTLY TRAINED SPEAKER VERIFICATION AND VOICE CONVERSION MODELS”
    ・Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino,” Masked modeling duo: Learning Representations by Encouraging Both Networks to Model the Input”
    ・Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki,” Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis”

  • February 2023

    Our paper “Deep attentive time warping” has been accepted to Pattern Recognition.
    Shinnosuke Matsuo, Xiaomeng Wu, Guntag Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida, “Deep attentive time warping,” Pattern Recognition, 2023.
    https://doi.org/10.1016/j.patcog.2022.109201

  • February 2023

    Our paper “Streaming end-to-end target speaker automatic speech recognition and activity detection” has been accepted to IEEE Access.
    T. Moriya, H. Sato, T. Ochiai, M. Delcroix and T. Shinozaki, "Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection," in IEEE Access, 2023. doi: 10.1109/ACCESS.2023.3243690.
    https://ieeexplore.ieee.org/document/10041133

  • February 2023

    Our paper “Determination of microphone acoustica center from sound field projection measured by optical interferometry” has been accepted to The Journal of the Acoustical Society of America (JASA).
    Denny Hermawanto, Kenji Ishikawa, Kohei Yatabe, Yasuhiro Oikawa, “Determination of microphone acoustic center from sound field projection measured by optical interferometry,” The Journal of the Acoustical Society of America, 2023.
    https://doi.org/10.1121/10.0017246 J. Acoust. Soc. Am. 153, 1138–1146 (2023)

  • February 2023

    Our paper “I/Q demodulator based optical camera communication” has been accepted to IEEE Photonics Journal.
    Hiroaki Matsunaga, Tomohiro Yendo, Wataru Kihara, Yoshifumi Shiraki, Takashi G. Sato, Takehiro Moriya, “I/Q Demodulator Based Optical Camera Communications,” IEEE Photonics Journal, 2023.
    June 2022 IEEE Photonics Journal 14(3):1-1
    DOI:10.1109/JPHOT.2022.3166283

  • February 2023

    Our paper “Decoding selective attention from EEG during simultaneous presentation of two melodies” has been accepted to Neuroscience2021.

  • January 2023

    Our paper “Efficient network representation learning via cluster similarity” has been accepted to International Conference on Databased Systems for Adcvanced Applications (DASFAA).
    Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura, Naonori Uede, “Efficient Network Representation Learning via Cluster Similarity,” in Proc. International Conference on Database Systems for Advanced Applications (DASFAA), 2023.

  • January 2023

    Our paper “Segment-less continuous speech separation of meetings: Training and evaluation criteria” has been accepted to IEEE/ACM Transactions on Audio, Speech and Language Processing.
    T. von Neumann, K. Kinoshita, C. Boeddeker, M. Delcroix and R. Haeb-Umbach, "Segment-less Continuous Speech Separation of Meetings: Training and Evaluation Criteria," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, doi: 10.1109/TASLP.2022.3228629.
    https://ieeexplore.ieee.org/abstract/document/9982413

  • January 2023

    Our paper “Neural target speech extraction: An overview” has been accepted to IEEE Signal Processing Maganize.
    Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Cernocky, Dong Yu, "Neural target speech extraction: An overview," IEEE Signal Processing Magazine, 2023. DOI: 10.1109/MSP.2023.3240008.
    https://ieeexplore.ieee.org/abstract/document/10113382

  • January 2023

    Our paper “Mask-based neural beamforming for moving speakers with self-attention-based tracking” has been accepted to IEEE/ACM Transactions on Audio, Speech and Language Processing.
    Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, ”Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking," IEEE/ACM Transactions onAudio Speech and Language Processing, 2023, DOI: 10.1109/TASLP.2023.3237172.
    https://ieeexplore.ieee.org/document/10017367

  • January 2023

    Our paper “Distribution matching for dimming control in visible-light region-of-interest signaling“ has been accepted to IEEE Photonics Journal.
    Phuc Duc Nguyen, Yoshifumi Shiraki, Kenji Ishikawa, Jun Muramatsu, Noboru Harada, Takehiro Moriya, “Distribution matching for dimming control in visible-light region-of-interest signaling,” IEEE Photonics Journal, 2023. DOI: 10.1109/JPHOT.2022.3233092

  • January 2023

    Naohiro Tawara has received the Best Reviewer Award in IEEE Spoken Language Technology Workshop (SLT 2022). https://www.slt2022.org/best-papers.php

Research groups

Research Index

Publications

2023

Journal Papers

  1. Hiroaki Matsunaga, Tomohiro Yendo, Wataru Kihara, Yoshifumi Shiraki, Takashi G. Sato & Takehiro Moriya (2023). I/Q Demodulator based Optical Camera Communicatio. IEEE Photonics Journal, 153, 1138-1146.
  2. Akihiro Mizutani, Yuki Takeuchi & Kiyoshi Tamaki (2023). Finite-key Security Analysis of Differential-Phase-Shift Quantum Key Distribution. Physical Review Research, 5 (2).
  3. Kazuma Takeda, Yasutomo Kawanishi, Takatsugu Hirayama, Daisuke Deguchi, Ichiro Ide, Hiroshi Murase & Kunio Kashino (2023). Estimation of Targets' Locations and Attention Degrees by Spatio-temporal Integration of Audiences' Facial Orientations. IEICE Transactions on Information and Systems, J106-A (3), 58-69.
  4. Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana & Seiichi Uchida (2023). Deep attentive time warping. Pattern Recogntiion, 136.
  5. Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Cernocky & Dong Yu (2023). Neural Rarget Speech Extraction: An Overview. IEEE Signal Processing Magazine, 40 (3), 8-29.
  6. Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani & Shoko Araki (2023). Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 31, 835-848.
  7. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2023). Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection. IEEE Access, 11, 13906-13917.
  8. Phuc Duc Nguyen, Yoshifumi Shiraki, Kenji Ishikawa, Jun Muramatsu, Noboru Harada & Takehiro Moriya (2023). Distribution Matching for Dimming Control in Visible-Light Region-of-Interest Signaling. IEEE Photonics Journal, 15 (1), 1-14.
  9. Denny Hermawanto, Kenji Ishikawa, Kohei Yatabe & Yasuhiro Oikawa (2023). Determination of Microphone Acoustic Center from Sound Field Projection Measured by Optical Interferometry. The Journal of the Acoustical Society of America, -.

Peer-reviewed Conference Papers

  1. Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko & Shogo Seki (2023). Distilling sequence-to-sequence voice conversion models for streaming conversion applications. Proc. IEEE Spoken Language Technology Workshop (SLT). Doha, Qatar.
  2. Shuji Horinaga (2023). Cuspidal Components of Siegel Modular Forms for Large Discrete Series Representations. π∞. Sendai, Japan.
  3. Ryo Hiromasa, Akihiro Mizutani, Yuki Takeuchi & Seiichiro Tani (2023). Rewindable Quantum Computation and Its Equivalence to Cloning and Adaptive Postselection. Proc. Theory of Quantum Computation, Communication and Cryptography (TQC). Aveiro, Portugal.
  4. Yuki Takeuchi, Yasuhiro Takahashi, Tomoyuki Morimae & Seiichiro Tani (2023). Divide-and-Conquer Verification Method for Noisy Intermediate-Scale Quantum Computation. Proc. Asian Quantum Information Science Conference (AQIS). Seoul, Korea.
  5. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2023). Masked modeling duo: Learning Representations by Encouraging Both Networks to Model the Input. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  6. Yasuhiro Fujiwara, Yasutoshi Ida, Atsutoshi Kumagai, Masahiro Nakano, Akisato Kimura & Naonori Ueda (2023). Efficient Network Representation Learning via Cluster Similarity. Proc. International Conference on Database Systems for Advanced Applications (DASFAA). Tianjin, China.
  7. Xiaomeng Wu, Yongqing Sun & Akisato Kimura (2023). Deep Quantigraphic Image Enhancement via Comparametric Equations. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes,Greek.
  8. Yuto Shibata, Yutaka Kawashima, Mariko Isogawa, Go Irie, Akisato Kimura & Yoshimitsu Aoki (2023). Listening Human Behavior: 3D Human Pose Estimation with Acoustic Signals. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada.
  9. Shogo Sato, Yasuhiro Yao, Taiga Yoshida, Takuhiro Kaneko, Shingo Ando & Jun Shimamura (2023). Unsupervised Intrinsic Image Decomposition with LiDAR Intensity. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada.
  10. Shohei Matsugu, Yasuhiro Fujiwara & Hiroaki Shiokawa (2023). Uncovering the Largest Community in Social Networks at Scale. Proc. International Joint Conference on Artificial Intelligence (IJCAI). Cape Town, South Africa.
  11. Takuhiro Kaneko (2023). MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields. Proc. IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France.
  12. Ayaka Ideno, Takuhiro Kaneko & Tatsuya Harada (2023). Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion. Proc. ACM International Conference on Multimodal Interaction (ICMI). Paris, France.
  13. Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan Sharma, Kohei Matsuura & Shinji Watanabe (2023). Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  14. Atsunori Ogawa, Takafumi Moriya, Naoyuki Kamo, Naohiro Tawara & Marc Delcroix (2023). Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  15. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix & Ryo Masumura (2023). Leveraging Large Text Corpora for End-to-End Speech Summarization. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes, Greek.
  16. Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix & Reinhold Haeb-Umbach (2023). On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
  17. Taishi Nakashima, Rintaro Ikeshita, Nobutaka Ono, Shoko Araki & Tomohiro Nakatani (2023). Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). island of Rhodes, Greek.
  18. Marc Delcroix, Naohiro Tawara, Mireia Diez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukas Burget & Shoko Araki (2023). Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. Proc. Interspeech. Dublin, Ireland.
  19. Naoyuki Kamo, Marc Delcroix & Tomohiro Nakatani (2023). Target Speaker Extraction with Conditional Diffusion Model. Proc. Interspeech. Dublin, Ireland.
  20. Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani & Toshio Irino (2023). Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine. Proc. Interspeech. Dublin, Ireland.
  21. Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka & Nobukatsu Hojo (2023). Downstream Task Agnostic Speech Enhancement Conditioned on Self-Supervised Representation Loss. Proc. Interspeech. Dublin, Ireland.
  22. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa & Taichi Asami (2023). Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data. Proc. Interspeech. Dublin, Ireland.
  23. Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix & Yukinori Honma (2023). SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?. Proc. Interspeech. Dublin, Ireland.
  24. Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa & Marc Delcroix (2023). Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization. Proc. Interspeech. Dublin, Ireland.
  25. Shogo Seki, Hirokazu Kameoka, Kou Tanaka & Takuhiro Kaneko (2023). JSV-VC: Jointly Trained Speaker Verification and Voice Conversion Models. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes,Greek.
  26. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka & Shogo Seki (2023). Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Island of Rhodes,Greek.
  27. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka & Shogo Seki (2023). iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN. Proc. Interspeech. Dublin, Ireland.
  28. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2023). Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation. Proc. Interspeech. Dublin, Ireland.
  29. Kou Tanaka, Takuhiro Kaneko, Hirokazu Kameoka & Shogo Seki (2023). CFVC: Conditional Filtering for Controllable Voice Conversion. Proc. Interspeech. Dublin, Ireland.
  30. Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi & Masahiro Yasuda (2023). First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline. Proc. European Signal Processing Conference(EUSIPCO). Helsinki, Finland.
  31. Shogo Seki, Kanami Imamura, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka & Noboru Harada (2023). W2N-AVSC: Audiovisual Extension for Whisper-to-Normal Speech Conversion. Proc. European Signal Processing Conference(EUSIPCO). Helsinki, Finland.
  32. Kou Tanaka, Hirokazu Kameoka & Takuhiro Kaneko (2023). PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder. Proc.ISCA Speech Synthesis Workshop(SSW). Grenoble, France.

2022

Journal Papers

  1. Ken Mano, Hideki Sakurada & Yasuyuki Tsukada (2022). Quality and quantity pair as trust metric. IEICE Transactions on Information and Systems.
  2. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations. EEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  3. Wangyou Zhang, Xuankai Chang, Christoph Boeddeker, Tomohiro Nakatani, Shinji Watanabe & Yanmin Qian (2022). End-to-end dereverberation, beamforming, and speech recognition in a cocktail party. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 30, 3173-3188.
  4. Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi & Shoko Araki (2022). Soundbeam: target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  5. Kenji Ishikawa, Kohei Yatabe, Yasuhiro Oikawa, Yoshifumi Shiraki & Takehiro Moriya (2022). Speckle holographic imaging of sound field using fresnel lens. Optics Letters.
  6. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). BYOL for audio: Exploring pre-trained general-purpose audio representations. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP).
  7. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations. Proceedings of Machine Learning Research (PMLR).
  8. Li Li, Kohei Yatabe, Hirokazu Kameoka & Shoji Makino (2022). FastMVAE2: On improving and accelerating the fast variational autoencoder-based source separation algorithm for determined mixtures. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
  9. X. Wu, Y. Sun, A. Kimura, and K. Kashino, "Contrast enhancement based on reflectance-oriented probabilistic equalization," Signal Processing, vol. 194, 2022.
  10. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo & Shoko Araki (2022). Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 1032-1047.
  11. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2022). Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, 576-589.
  12. Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Hiroto Ashihara, Tetsunori Kobayashi & Tetsuji Ogawa (2022). Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models. IEICE Transactions on Information and Systems, E105.D (1), 150-160.
  13. Zili Huang, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj & Sanjeev Khudanpur (2022). Joint speaker diarization and speech recognition based on region proposal networks. Computer Speech & Language, 72, 101316.

Peer-reviewed Conference Papers

  1. Masato Wakayama (2022). Quantum Interaction and number theory, representation theory - modular forms a bit beyond, infinite symmetric group, Fuchsian ODE. Painlevé Seminar.
  2. Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada & Kunio Kashino (2022). ConceptBeam: Concept driven target speech extraction. Proc. ACM International Conference on Multimedia(ACMMM). Lisbon, Portugal.
  3. Seiya Matsuda, Akisato Kimura & Seiichi Uchida (2022). Font generation with missing impression labels. in Proc. International Conference on Pattern Recognition (ICPR). Montreal Quebec, Canada.
  4. Kana Goto, Tetsuya Ueda, Li Li, Takeshi Yamada & Shoji Makino (2022). Geometrically constrained independent vector analysis with auxiliary function approach and iterative source steering. in Proc. European Signal Processing Conference (EUSIPCO). Belgrade, Serbia.
  5. Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada & Kunio Kashino (2022). Composing general audio representation by fusing multi-layer features of pre-trained model. in Proc. European Signal Processing Conference (EUSIPCO). Belgrade, Serbia.
  6. Natsuki Ueno & Hirokazu Kameoka (2022). Multiple sound source localization based on stochastic modeling of spatial gradient spectra. in Proc. European Signal Processing Conference (EUSIPCO). Belgrade, Serbia.
  7. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka & Shogo Seki (2022). MISRNet: Lightweight neural vocoder using multi-input single shared residual blocks. in Proc. Interspeech. Incheon, Korea.
  8. Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki & Kou Tanaka (2022). CAUSE: Crossmodal action unit sequence estimation from speech. in Proc. Interspeech. Incheon, Korea.
  9. Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada & Kunio Kashino (2022). Introducing auxiliary text query-modifier to content-based audio retrieval. in Proc. Interspeech. Incheon, Korea.
  10. Takashi Shibata, Masatoshi Okutomi & Masayuki Tanaka (2022). Robustizing object detection networks using augmented feature pooling. in Proc. Asian Conference on Computer Vision (ACCV). Macau SAR, China.
  11. Yu Moriyasu, Takashi Shibata, Masayuki Tanaka & Masatoshi Okutomi (2022). Top-K ensemble for semantic segmentation robust against unexpected degradation. Proc. IEEE International Conference on Consumer Electronics(ICCE). Bordeaux,France.
  12. Yasuhiro Fujiwara, Masahiro Nakano, Atsutoshi Kumagai, Yasutoshi Ida, Akisato Kimura & Naonori Ueda (2022). Fast binary network hashing via graph clustering. Proc. IEEE BigData. Osaka, Japan.
  13. Denny Hermawanto, Kenji Ishikawa, Kohei Yatabe & Yasuhiro Oikawa (2022). Visualization of microphone's acoustic center using phase-shifting interferometry. Proc. International Congress on Acoustics (ICA). Gyeongju,Korea.
  14. M. Nakano, R. Nishikimi, Y. Fujiwara, A. Kimura, T. Yamada, and N. Ueda, "Nonparametric relational models with superrectangulation," in Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
  15. G. Irie, T. Shibata, and A. Kimura, "Co-attention-guided bilinear model for echo-based depth estimation," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  16. T. Kaneko, K. Tanaka, H. Kameoka, and S. Seki, "Fastening and lightening convolutional mel-spectrogram vocoder using inverse short-time fourier transform," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  17. S. Seki, H. Kameoka, and L. Li, "Exploring and improving multichannel variational autoencoder for underdetermined source separation," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  18. L. Li, H. Kameoka, and S. Seki, "HBP: An efficient block permutation solver using hungarian algorithm and spectrogram inpainting for multichannel audio source separation," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  19. H. Kameoka, S. Seki, L. Li, and C. Watanabe, "AttentionPIT: Soft permutation invariant training for audio source separation with attention mechanism," in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  20. T. Kaneko, "AR-NeRF: Unsupervised learning of depth and defocus effects from natural images with aperture rendering neural radiance fields," in Proc. Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  21. S. Yoneda, G. Irie, T. Shibata, M. Nishiyama, and I. Yoshio, "Deep segmentation network without mask image supervision for 2D image registration," in Proc. International Workshop on Frontiers of Computer Vision (IW-FCV), 2022.
  22. M. Ueda, A. Kimura, and S. Uchida, "Font shape-to-impression translation," in Proc. International Workshop on Document Analysis Systems (DAS), 2022.
  23. C. Kabore, M. Tsuchida, I. Suzuki, S. Sugaya, A. Kimura, and N. Harada, "Prototyping of low-cost color enhancement lighting using multicolor LEDs," in Proc. International Symposium on Electronic Imaging (EI), 2022.
  24. Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita & Tomohiro Nakatani (2022). Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  25. Naoyuki Kamo, Rintaro Ikeshita, Keisuke Kinoshita & Tomohiro Nakatani (2022). Importance of Switch Optimization Criterion in Switching WPE Dereverberation. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  26. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo & Takafumi Moriya (2022). Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  27. Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2022). Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  28. Atsunori Ogawa, Naohiro Tawara, Marc Delcroix & Shoko Araki (2022). Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  29. Keisuke Kinoshita, Marc Delcroix & Tomoharu Iwata (2022). Tight Integration Of Neural- And Clustering-Based Diarization Through Deep Unfolding Of Infinite Gaussian Mixture Model. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  30. Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix & Reinhold Haeb-Umbach (2022). SA-SDR: A Novel Loss Function for Separation of Meeting Style Data. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  31. Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix & Takahiro Shinozaki (2022). Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  32. Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki & Shigeru Katagiri (2022). How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR. Proc. Interspeech 2022.
  33. Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolikova, Hiroshi Sato & Tomohiro Nakatani (2022). Listen only to me! How well can target speech extraction handle false alarms?. Proc. Interspeech 2022.
  34. Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka & Ryo Masumura (2022). Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. Proc. Interspeech 2022.
  35. Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix & Takahiro Shinozaki (2022). Streaming Target-Speaker ASR with Neural Transducer. Proc. Interspeech 2022.
  36. Martin Kocour, Katerina Zmolikova, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukas Burget & Jan Cernocky (2022). Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. Proc. Interspeech 2022.
  37. Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa & Norihide Kitaoka (2022). End-to-End Spontaneous Speech Recognition Using Disfluency Labeling. Proc. Interspeech 2022.
  38. Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker & Reinhold Haeb-Umbach (2022). Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. Proc. Interspeech 2022.
  39. Rintaro Ikeshita & Tomohiro Nakatani (2022). ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis. 2022 30th European Signal Processing Conference (EUSIPCO).
  40. Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner & Jan Honza Černocký (2022). Analysis of Impact of Emotions on Target Speech Extraction and Speech Separation. 2022 International Workshop on Acoustic Signal Enhancement (IWAENC).
  41. Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada & Shoji Makino (2022). Neural Virtual Microphone Estimator: Application to Multi-Talker Reverberant Mixtures. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  42. Naoyuki Kamo, Kenichi Arai, Atsunori Ogawa, Shoko Araki, Tomohiro Nakatani, Keisuke Kinoshita, Marc Delcroix, Tsubasa Ochiai & Toshio Irino (2022). Speech Intelligibility Prediction through Direct Estimation of Word Accuracy Using Conformer. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  43. Kenichi Arai, Atsunori Ogawa, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, Naoyuki Kamo & Toshio Irino (2022). Intelligibility prediction of enhanced speech using recognition accuracy of end-to-end ASR systems. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  44. Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Arai, Atsunori Ogawa, Keisuke Kinoshita & Tomohiro Nakatani (2022). Effective data screening technique for crowdsourced speech intelligibility experiments: Evaluation with IRM-based speech enhancement. 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
  45. Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo & Shoko Araki (2022). Switching Independent Vector Extraction and Its Joint Optimization with Weighted Prediction Error Dereverberation. Proc.~of 24th INTERNATIONAL congress on acoustics (ICA2022).
  46. Takatomo Kano, Atsunori Ogawa, Marc Delcroix & Shinji Watanabe (2021). Attention-Based Multi-Hypothesis Fusion for Speech Summarization. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
  47. Naohiro Tawara, Atsunori Ogawa, Yuki Kitagishi, Hosana Kamiyama & Yusuke Ijima (2021). Robust speech-age estimation using local maximum mean discrepancy under mismatched recording condition. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

Members

Executive Manager

Fellow

Senior Distinguished Researchers

Recognition Research Group

Signal Processing Research Group

Computing Theory Research Group

Computational Modeling Research Group

Biomedical Informatics Research Group

Access

Last Update: 7/19/2024