Home
About
- Daniel Michelsanti
  
  Data Scientist
- More
- Email
- Google Scholar
- LinkedIn
Publications
- All Publications
- All Tags
CV

A Vision-Assisted Hearing Aid System Based on Deep Learning

Authors

Michelsanti D., Tan Z.-H., Rotger-Griful S., Jensen J.

Workshop

ICASSP 2023 Workshop - AMHAT 2023: Advances in Multimodal Hearing Assistive Technologies

Abstract

Audio-visual speech enhancement (SE) is the task of reducing the acoustic background noise in a degraded speech signal using both acoustic and visual information. In this work, we study how to incorporate visual information to enhance a speech signal using acoustic beamformers in hearing aids (HAs). Specifically, we first trained a deep learning model to estimate a time-frequency mask from audio-visual data. Then, we apply this mask to estimate the inter-microphone power spectral densities (PSDs) of the clean and the noise signal. Finally, we used the estimated PSDs to build acoustic beamformers. Assuming that a HA user wears an add-on device comprising a camera pointing at the target speaker, we show that our method can be beneficial for HA systems especially at low signal to noise ratios (SNRs).

Full text Poster