Skip to main content

Showing 1–11 of 11 results for author: Haro, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2306.04796  [pdf

    eess.IV

    JDLL: A library to run Deep Learning models on Java bioimage informatics platforms

    Authors: Carlos Garcia Lopez de Haro, Stephane Dallongeville, Thomas Musset, Estibaliz Gomez de Mariscal, Daniel Sage, Wei Ouyang, Arrate Munoz-Barrutia, Jean-Yves Tinevez, Jean-Christophe Olivo-Marin

    Abstract: We present JDLL, an agile Java library that offers a comprehensive toolset/API to unify the development of high-end applications of DL for bioimage analysis and to streamline their installation and maintenance. JDLL provides all the functions required to consume DL models seamlessly, without being burdened by the configuration of the Python-based DL frameworks, within Java bioimage informatics pla… ▽ More

    Submitted 25 September, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: New version with new figure and updated links

  2. arXiv:2306.00489  [pdf, other

    cs.SD cs.AI eess.AS

    Speech inpainting: Context-based speech synthesis guided by video

    Authors: Juan F. Montesinos, Daniel Michelsanti, Gloria Haro, Zheng-Hua Tan, Jesper Jensen

    Abstract: Audio and visual modalities are inherently connected in speech signals: lip movements and facial expressions are correlated with speech sounds. This motivates studies that incorporate the visual modality to enhance an acoustic speech signal or even restore missing audio information. Specifically, this paper focuses on the problem of audio-visual speech inpainting, which is the task of synthesizing… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted in Interspeech23

  3. arXiv:2204.02090  [pdf, other

    cs.CV cs.IR cs.SD eess.AS

    VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

    Authors: Venkatesh S. Kadandale, Juan F. Montesinos, Gloria Haro

    Abstract: In this paper, we address the problem of lip-voice synchronisation in videos containing human face and voice. Our approach is based on determining if the lips motion and the voice in a video are synchronised or not, depending on their audio-visual correspondence score. We propose an audio-visual cross-modal transformer-based model that outperforms several baseline models in the audio-visual synchr… ▽ More

    Submitted 30 June, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Paper accepted to Interspeech 2022; Project Page: https://ipcv.github.io/VocaLiST/

  4. arXiv:2203.04099  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer

    Authors: Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro

    Abstract: This paper presents an audio-visual approach for voice separation which produces state-of-the-art results at a low latency in two scenarios: speech and singing voice. The model is based on a two-stage network. Motion cues are obtained with a lightweight graph convolutional network that processes face landmarks. Then, both audio and motion features are fed to an audio-visual transformer which produ… ▽ More

    Submitted 19 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022

  5. arXiv:2106.00359  [pdf, other

    cs.LG cs.CV eess.IV

    Learning Football Body-Orientation as a Matter of Classification

    Authors: Adrià Arbués-Sangüesa, Adrián Martín, Paulino Granero, Coloma Ballester, Gloria Haro

    Abstract: Orientation is a crucial skill for football players that becomes a differential factor in a large set of events, especially the ones involving passes. However, existing orientation estimation methods, which are based on computer-vision techniques, still have a lot of room for improvement. To the best of our knowledge, this article presents the first deep learning model for estimating orientation d… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted in the AI for Sports Analytics Workshop at ICJAI 2021

  6. arXiv:2104.09946  [pdf, other

    cs.SD cs.LG eess.AS

    A cappella: Audio-visual Singing Voice Separation

    Authors: Juan F. Montesinos, Venkatesh S. Kadandale, Gloria Haro

    Abstract: The task of isolating a target singing voice in music videos has useful applications. In this work, we explore the single-channel singing voice separation problem from a multimodal perspective, by jointly learning from audio and visual modalities. To do so, we present Acappella, a dataset spanning around 46 hours of a cappella solo singing videos sourced from YouTube. We also propose an audio-visu… ▽ More

    Submitted 18 October, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: Paper accepted at The 32nd British Machine Vision Conference, BMVC 2021

  7. arXiv:2006.07931  [pdf, other

    eess.AS cs.DB cs.SD

    Solos: A Dataset for Audio-Visual Music Analysis

    Authors: Juan F. Montesinos, Olga Slizovskaia, Gloria Haro

    Abstract: In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual self-supervised task. These videos, gathered from YouTube, consist of solo musical performances of 13 differ… ▽ More

    Submitted 6 August, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: Rephrased some sentenced. Explanation about OpenPose. Minor grammatical errors

  8. Conditioned Source Separation for Music Instrument Performances

    Authors: Olga Slizovskaia, Gloria Haro, Emilia Gómez

    Abstract: In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem. This paper proposes a source separation method for multiple musical instruments sounding simultaneously and e… ▽ More

    Submitted 7 July, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: 14 pages, 5 figures, under review

  9. arXiv:2004.02541  [pdf, other

    eess.AS cs.CV cs.LG

    Vocoder-Based Speech Synthesis from Silent Videos

    Authors: Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen

    Abstract: Both acoustic and visual information influence human perception of speech. For this reason, the lack of audio in a video sequence determines an extremely low speech intelligibility for untrained lip readers. In this paper, we present a way to synthesise speech from the silent video of a talker using deep learning. The system learns a map** function from raw video frames to acoustic features and… ▽ More

    Submitted 15 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to Interspeech 2020

  10. arXiv:1907.01813  [pdf, other

    cs.SD cs.LG eess.AS

    A Case Study of Deep-Learned Activations via Hand-Crafted Audio Features

    Authors: Olga Slizovskaia, Emilia Gómez, Gloria Haro

    Abstract: The explainability of Convolutional Neural Networks (CNNs) is a particularly challenging task in all areas of application, and it is notably under-researched in music and audio domain. In this paper, we approach explainability by exploiting the knowledge we have on hand-crafted audio features. Our study focuses on a well-defined MIR task, the recognition of musical instruments from user-generated… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: The 2018 Joint Workshop on Machine Learning for Music, The Federated Artificial Intelligence Meeting (FAIM), Joint workshop program of ICML, IJCAI/ECAI, and AAMAS, Stockholm, Sweden, Saturday, July 14th, 2018

  11. arXiv:1811.01850  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Sound Source Separation Conditioned On Instrument Labels

    Authors: Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gomez

    Abstract: Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model? We present an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources. Furthermore, we propose multiplicative conditioning with instrument labels at the bottleneck of the Wave-U-Net and show its effect on the separation… ▽ More

    Submitted 9 May, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: 5 pages, 2 figures, 2 tables, ICASSP 2019