• Home
  • About
    • Daniel Michelsanti

      Data Scientist

    • More
    • Email
    • Google Scholar
    • LinkedIn
  • Publications
    • All Publications
    • All Tags
  • CV

A Vision-Assisted Hearing Aid System Based on Deep Learning

Authors

Michelsanti D., Tan Z.-H., Rotger-Griful S., Jensen J.

Workshop

ICASSP 2023 Workshop - AMHAT 2023: Advances in Multimodal Hearing Assistive Technologies

Abstract

Audio-visual speech enhancement (SE) is the task of reducing the acoustic background noise in a degraded speech signal using both acoustic and visual information. In this work, we study how to incorporate visual information to enhance a speech signal using acoustic beamformers in hearing aids (HAs). Specifically, we first trained a deep learning model to estimate a time-frequency mask from audio-visual data. Then, we apply this mask to estimate the inter-microphone power spectral densities (PSDs) of the clean and the noise signal. Finally, we used the estimated PSDs to build acoustic beamformers. Assuming that a HA user wears an add-on device comprising a camera pointing at the target speaker, we show that our method can be beneficial for HA systems especially at low signal to noise ratios (SNRs).

Full text Poster



Audio-visualhearing aidsbeamformingdeep learning Share Tweet +1