Skip to main content

Showing 1–15 of 15 results for author: Marxer, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01737  [pdf, other

    eess.AS cs.CL cs.SD

    Transfer Learning from Whisper for Microscopic Intelligibility Prediction

    Authors: Paul Best, Santiago Cuervo, Ricard Marxer

    Abstract: Macroscopic intelligibility models predict the expected human word-error-rate for a given speech-in-noise stimulus. In contrast, microscopic intelligibility models aim to make fine-grained predictions about listeners' perception, e.g. predicting phonetic or lexical responses. State-of-the-art macroscopic models use transfer learning from large scale deep learning models for speech processing, wher… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  2. arXiv:2404.00685  [pdf, other

    eess.AS cs.AI cs.CL cs.NE

    Scaling Properties of Speech Language Models

    Authors: Santiago Cuervo, Ricard Marxer

    Abstract: Speech Language Models (SLMs) aim to learn language from raw audio, without textual resources. Despite significant advances, our current models exhibit weak syntax and semantic abilities. However, if the scaling properties of neural language models hold for the speech modality, these abilities will improve as the amount of compute used for training increases. In this paper, we use models of this s… ▽ More

    Submitted 16 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  3. arXiv:2401.14289  [pdf, other

    cs.SD cs.LG eess.AS

    Speech foundation models on intelligibility prediction for hearing-impaired listeners

    Authors: Santiago Cuervo, Ricard Marxer

    Abstract: Speech foundation models (SFMs) have been benchmarked on many speech processing tasks, often achieving state-of-the-art performance with minimal adaptation. However, the SFM paradigm has been significantly less explored for applications of interest to the speech perception community. In this paper we present a systematic evaluation of 10 SFMs on one such application: Speech intelligibility predict… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: To be presented in ICASSP 2024

  4. Eiffel Tower: A Deep-Sea Underwater Dataset for Long-Term Visual Localization

    Authors: Clémentin Boittiaux, Claire Dune, Maxime Ferrera, Aurélien Arnaubec, Ricard Marxer, Marjolaine Matabos, Loïc Van Audenhaege, Vincent Hugel

    Abstract: Visual localization plays an important role in the positioning and navigation of robotics systems within previously visited environments. When visits occur over long periods of time, changes in the environment related to seasons or day-night cycles present a major challenge. Under water, the sources of variability are due to other factors such as water conditions or growth of marine organisms. Yet… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: The International Journal of Robotics Research, In press

  5. arXiv:2212.09129  [pdf, other

    cs.CV

    SUCRe: Leveraging Scene Structure for Underwater Color Restoration

    Authors: Clémentin Boittiaux, Ricard Marxer, Claire Dune, Aurélien Arnaubec, Maxime Ferrera, Vincent Hugel

    Abstract: Underwater images are altered by the physical characteristics of the medium through which light rays pass before reaching the optical sensor. Scattering and wavelength-dependent absorption significantly modify the captured colors depending on the distance of observed elements to the image plane. In this paper, we aim to recover an image of the scene as if the water had no effect on light propagati… ▽ More

    Submitted 18 January, 2024; v1 submitted 18 December, 2022; originally announced December 2022.

  6. arXiv:2206.02211  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.NE eess.AS

    Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

    Authors: Santiago Cuervo, Adrian Łańcucki, Ricard Marxer, Paweł Rychlikowski, Jan Chorowski

    Abstract: The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield signi… ▽ More

    Submitted 4 December, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

    Journal ref: Advances in Neural Information Processing Systems, 2022

  7. Homography-Based Loss Function for Camera Pose Regression

    Authors: Clémentin Boittiaux, Ricard Marxer, Claire Dune, Aurélien Arnaubec, Vincent Hugel

    Abstract: Some recent visual-based relocalization algorithms rely on deep learning methods to perform camera pose regression from image data. This paper focuses on the loss functions that embed the error between two poses to perform deep learning based camera pose regression. Existing loss functions are either difficult-to-tune multi-objective functions or present unstable reprojection errors that rely on g… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Journal ref: IEEE Robotics and Automation Letters 7 (3), pp.6242-6249 (2022)

  8. arXiv:2110.15909  [pdf, other

    cs.LG cs.SD eess.AS

    Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

    Authors: Santiago Cuervo, Maciej Grabias, Jan Chorowski, Grzegorz Ciesielski, Adrian Łańcucki, Paweł Rychlikowski, Ricard Marxer

    Abstract: We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing algorithms there is a trade off between categorization and segmentation performance. We investigate the source of this conflict and conclude that the use of context buil… ▽ More

    Submitted 25 February, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

  9. arXiv:2106.11603  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

    Authors: Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski

    Abstract: We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Published in Interspeech 2021

  10. arXiv:2104.11946  [pdf, other

    cs.LG cs.SD eess.AS

    Aligned Contrastive Predictive Coding

    Authors: Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski

    Abstract: We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned. In this way, the prediction netw… ▽ More

    Submitted 22 June, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

    Comments: Published in Interspeech 2021

  11. arXiv:2006.02547  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

    Authors: Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

    Abstract: Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs… ▽ More

    Submitted 8 September, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: Proceedings of Interspeech, 2020

  12. arXiv:2005.08520  [pdf, other

    cs.LG cs.CL stat.ML

    Robust Training of Vector Quantized Bottleneck Models

    Authors: Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J. G. A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent

    Abstract: In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representat… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: Published at IJCNN 2020

  13. arXiv:2004.11116  [pdf, ps, other

    cs.LG stat.ML

    Deep Learning Classification With Noisy Labels

    Authors: Guillaume Sanchez, Vincente Guis, Ricard Marxer, Frédéric Bouchara

    Abstract: Deep Learning systems have shown tremendous accuracy in image classification, at the cost of big image datasets. Collecting such amounts of data can lead to labelling errors in the training set. Indexing multimedia content for retrieval, classification or recommendation can involve tagging or classification based on multiple criteria. In our case, we train face recognition systems for actors ident… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

  14. arXiv:1808.00060  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation

    Authors: Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker, Amir Hussain

    Abstract: Human auditory cortex excels at selectively suppressing background noise to focus on a target speaker. The process of selective attention in the brain is known to contextually exploit the available audio and visual cues to better focus on target speaker while filtering out other noises. In this study, we propose a novel deep neural network (DNN) based audiovisual (AV) mask estimation model. The pr… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

    Comments: Accepted for Interspeech 2018, 5 pages, 4 figures

    ACM Class: I.5; I.4; I.2

  15. arXiv:1502.00524  [pdf, other

    cs.SD cs.IR cs.LG stat.ML

    Unsupervised Incremental Learning and Prediction of Music Signals

    Authors: Ricard Marxer, Hendrik Purwins

    Abstract: A system is presented that segments, clusters and predicts musical audio in an unsupervised manner, adjusting the number of (timbre) clusters instantaneously to the audio input. A sequence learning algorithm adapts its structure to a dynamically changing clustering tree. The flow of the system is as follows: 1) segmentation by onset detection, 2) timbre representation of each segment by Mel freque… ▽ More

    Submitted 23 October, 2015; v1 submitted 2 February, 2015; originally announced February 2015.

    Comments: 13 pages, 10 figures

    MSC Class: 68T05 ACM Class: I.2.6; H.5.5