Skip to main content

Showing 1–11 of 11 results for author: Pinquier, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17732  [pdf, other

    cs.SD cs.AI physics.class-ph

    EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal

    Authors: Modan Tailleur, Julien Pinquier, Laurent Millot, Corsin Vogel, Mathieu Lagrange

    Abstract: In this paper, we introduce the Extreme Metal Vocals Dataset, which comprises a collection of recordings of extreme vocal techniques performed within the realm of heavy metal music. The dataset consists of 760 audio excerpts of 1 second to 30 seconds long, totaling about 100 min of audio material, roughly composed of 60 minutes of distorted voices and 40 minutes of clear voice recordings. These vo… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Journal ref: 21st International Conference on Content-based Multimedia Indexing (CBMI), Gylfi Þ{ó}r Guðmundsson; Laurent Amsaleg; Omar Shahbaz Khan; Ralph Gasser; Shin'ichi Satoh; Maria Pegia; Aladine Chetouani; Bj{ö}rn Þ{ó}r J{ó}nsson; Claudio Gennaro; Ewa Kijak; Ilias Gialampoukidis; Liting Zhou; Jenny Benois-Pineau; Stevan Rudinac, Sep 2024, Reykjavik, Iceland

  2. arXiv:2309.00454  [pdf, other

    cs.SD eess.AS

    CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding

    Authors: Étienne Labbé, Thomas Pellegrini, Julien Pinquier

    Abstract: Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content, using encoder-decoder architectures. An audio encoder produces audio embeddings fed to a decoder, usually a Transformer decoder, for caption generation. In this work, we describe our model, which novelty, compared to existing models, lies in the use of a ConvNeXt architecture as audio encoder, adap… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  3. arXiv:2308.15090  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

    Authors: Etienne Labbé, Thomas Pellegrini, Julien Pinquier

    Abstract: Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to find the best matching audio recording(s) for a given textual query (Text-to-Audio) or vice versa (Audio-to-Text). These tasks require different types of systems: AAC employs a sequence-to-sequence model, while ATR utili… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: cam ready version (14/08/23)

    Journal ref: DCASE2023, Sep 2023, Tampere, Finland

  4. arXiv:2305.01482  [pdf, other

    cs.SD eess.AS

    Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer

    Authors: Etienne Labbé, Julien Pinquier, Thomas Pellegrini

    Abstract: In this work, we propose to study the performance of a model trained with a sentence embedding regression loss component for the Automated Audio Captioning task. This task aims to build systems that can describe audio content with a single sentence written in natural language. Most systems are trained with the standard Cross-Entropy loss, which does not take into account the semantic closeness of… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  5. arXiv:2211.08983  [pdf, other

    cs.SD cs.LG eess.AS

    Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

    Authors: Etienne Labbé, Thomas Pellegrini, Julien Pinquier

    Abstract: Automatic Audio Captioning (AAC) is the task that aims to describe an audio signal using natural language. AAC systems take as input an audio signal and output a free-form text sentence, called a caption. Evaluating such systems is not trivial, since there are many ways to express the same idea. For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to comp… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022), Nov 2022, Nancy, France

  6. arXiv:2206.10411  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Audio-video fusion strategies for active speaker detection in meetings

    Authors: Lionel Pibre, Francisco Madrigal, Cyrille Equoy, Frédéric Lerasle, Thomas Pellegrini, Julien Pinquier, Isabelle Ferrané

    Abstract: Meetings are a common activity in professional contexts, and it remains challenging to endow vocal assistants with advanced functionalities to facilitate meeting management. In this context, a task like active speaker detection can provide useful insights to model interaction between meeting participants. Motivated by our application context related to advanced meeting assistant, we want to combin… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  7. arXiv:2103.02899  [pdf, other

    eess.AS cs.SD

    End-to-end acoustic modelling for phone recognition of young readers

    Authors: Lucile Gelin, Morgane Daniel, Julien Pinquier, Thomas Pellegrini

    Abstract: Automatic recognition systems for child speech are lagging behind those dedicated to adult speech in the race of performance. This phenomenon is due to the high acoustic and linguistic variability present in child speech caused by their body development, as well as the lack of available child speech data. Young readers speech additionally displays peculiarities, such as slow reading rate and prese… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: 16 pages, 8 figures

  8. arXiv:2003.04241  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Deep Neural Networks for Automatic Speech Processing: A Survey from Large Corpora to Limited Data

    Authors: Vincent Roger, Jérôme Farinas, Julien Pinquier

    Abstract: Most state-of-the-art speech systems are using Deep Neural Networks (DNNs). Those systems require a large amount of data to be learned. Hence, learning state-of-the-art frameworks on under-resourced speech languages/problems is a difficult task. Problems could be the limited amount of data for impaired speech. Furthermore, acquiring more data and/or expertise is time-consuming and expensive. In th… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  9. arXiv:1910.09458  [pdf, other

    cs.CV cs.LG

    Improving Vehicle Re-Identification using CNN Latent Spaces: Metrics Comparison and Track-to-track Extension

    Authors: Geoffrey Roman-Jimenez, Patrice Guyot, Thierry Malon, Sylvie Chambon, Vincent Charvillat, Alain Crouzil, André Péninou, Julien Pinquier, Florence Sedes, Christine Sénac

    Abstract: This paper addresses the problem of vehicle re-identification using distance comparison of images in CNN latent spaces. Firstly, we study the impact of the distance metrics, comparing performances obtained with different metrics: the minimal Euclidean distance (MED), the minimal cosine distance (MCD), and the residue of the sparse coding reconstruction (RSCR). These metrics are applied using fea… ▽ More

    Submitted 26 September, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: This paper is a postprint of a paper submitted to and accepted for publication in the journal IET Computer Vision and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at the IET Digital Library

  10. Hierarchical Hidden Markov Model in Detecting Activities of Daily Living in Wearable Videos for Studies of Dementia

    Authors: Svebor Karaman, Jenny Benois-Pineau, Vladislavs Dovgalecs, Rémi Mégret, Julien Pinquier, Régine André-Obrecht, Yann Gaëstel, Jean-François Dartigues

    Abstract: This paper presents a method for indexing activities of daily living in videos obtained from wearable cameras. In the context of dementia diagnosis by doctors, the videos are recorded at patients' houses and later visualized by the medical practitioners. The videos may last up to two hours, therefore a tool for an efficient navigation in terms of activities of interest is crucial for the doctors.… ▽ More

    Submitted 14 May, 2014; v1 submitted 8 November, 2011; originally announced November 2011.

    Journal ref: Multimedia Tools and Applications, Volume 69, Issue 3, pp 743-771, June 2012

  11. Activities of Daily Living Indexing by Hierarchical HMM for Dementia Diagnostics

    Authors: Svebor Karaman, Jenny Benois-Pineau, Jean-François Dartigues, Yann Gaëstel, Rémi Mégret, Julien Pinquier

    Abstract: This paper presents a method for indexing human ac- tivities in videos captured from a wearable camera being worn by patients, for studies of progression of the dementia diseases. Our method aims to produce indexes to facilitate the navigation throughout the individual video recordings, which could help doctors search for early signs of the dis- ease in the activities of daily living. The recorded… ▽ More

    Submitted 22 June, 2011; originally announced June 2011.

    Comments: 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), Madrid : Spain (2011)