Skip to main content

Showing 1–1 of 1 results for author: Pilati, L

.
  1. arXiv:2106.03821  [pdf, other

    cs.SD cs.CL cs.CV eess.AS

    Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion

    Authors: Baptiste Pouthier, Laurent Pilati, Leela K. Gudupudi, Charles Bouveyron, Frederic Precioso

    Abstract: It is now well established from a variety of studies that there is a significant benefit from combining video and audio data in detecting active speakers. However, either of the modalities can potentially mislead audiovisual fusion by inducing unreliable or deceptive information. This paper outlines active speaker detection as a multi-objective learning problem to leverage best of each modalities… ▽ More

    Submitted 15 September, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: In INTERSPEECH 2021

    Journal ref: Proc. Interspeech 2021, 2381-2385