Skip to main content

Showing 1–26 of 26 results for author: Roebel, A

.
  1. arXiv:2406.04467  [pdf, other

    eess.AS cs.CL cs.SD

    Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis

    Authors: Théodor Lemerle, Nicolas Obin, Axel Roebel

    Abstract: Recent advancements in text-to-speech (TTS) powered by language models have showcased remarkable capabilities in achieving naturalness and zero-shot voice cloning. Notably, the decoder-only transformer is the prominent architecture in this domain. However, transformers face challenges stemming from their quadratic complexity in sequence length, impeding training on lengthy sequences and resource-c… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Interspeech

  2. arXiv:2310.18320  [pdf, ps, other

    cs.CY cs.AI

    AI (r)evolution -- where are we heading? Thoughts about the future of music and sound technologies in the era of deep learning

    Authors: Giovanni Bindi, Nils Demerlé, Rodrigo Diaz, David Genova, Aliénor Golvet, Ben Hayes, Jiawen Huang, Lele Liu, Vincent Martos, Sarah Nabi, Teresa Pelinski, Lenny Renault, Saurjya Sarkar, Pedro Sarmento, Cyrus Vahidi, Lewis Wolstanholme, Yixiao Zhang, Axel Roebel, Nick Bryan-Kinns, Jean-Louis Giavitto, Mathieu Barthet

    Abstract: Artificial Intelligence (AI) technologies such as deep learning are evolving very quickly bringing many changes to our everyday lives. To explore the future impact and potential of AI in the field of music and sound technologies a doctoral day was held between Queen Mary University of London (QMUL, UK) and Sciences et Technologies de la Musique et du Son (STMS, France). Prompt questions about curr… ▽ More

    Submitted 20 September, 2023; originally announced October 2023.

  3. arXiv:2310.03444  [pdf, other

    eess.AS cs.SD

    VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice

    Authors: Frederik Bous, Axel Roebel

    Abstract: The information bottleneck auto-encoder is a tool for disentanglement commonly used for voice transformation. The successful disentanglement relies on the right choice of bottleneck size. Previous bottleneck auto-encoders created the bottleneck by the dimension of the latent space or through vector quantization and had no means to change the bottleneck size of a specific model. As the bottleneck r… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  4. arXiv:2210.02647  [pdf, other

    math.DS math.NA math.PR physics.ao-ph

    Ensemble Kalman Filtering for Glacier Modeling

    Authors: Emily Corcoran, Logan Knudsen, Talea Mayo, Hannah Park-Kaufmann, Alexander Robel

    Abstract: Working with a two-stage ice sheet model, we explore how statistical data assimilation methods can be used to improve predictions of glacier melt and relatedly, sea level rise. We find that the EnKF improves model runs initialized using incorrect initial conditions or parameters, providing us with better models of future glacier melt. We explore the necessary number of observations needed to produ… ▽ More

    Submitted 20 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

  5. Analysis and transformations of voice level in singing voice

    Authors: Frederik Bous, Axel Roebel

    Abstract: We introduce a neural auto-encoder that transforms the musical dynamic in recordings of singing voice via changes in voice level. Since most recordings of singing voice are not annotated with voice level we propose a means to estimate the voice level from the signal's timbre using a neural voice level estimator. We introduce the recording factor that relates the voice level to the recorded signal… ▽ More

    Submitted 22 November, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2023

  6. StyleWaveGAN: Style-based synthesis of drum sounds with extensive controls using generative adversarial networks

    Authors: Antoine Lavault, Axel Roebel, Matthieu Voiry

    Abstract: In this paper we introduce StyleWaveGAN, a style-based drum sound generator that is a variation of StyleGAN, a state-of-the-art image generator. By conditioning StyleWaveGAN on both the type of drum and several audio descriptors, we are able to synthesize waveforms faster than real-time on a GPU directly in CD quality up to a duration of 1.5s while retaining a considerable amount of control over t… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

    Comments: Accepted for publication in Sound and Music Computing 2022

  7. arXiv:2202.05718  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Defect Detection in Music with Deep Networks

    Authors: Daniel Wolff, Rémi Mignot, Axel Roebel

    Abstract: With increasing amounts of music being digitally transferred from production to distribution, automatic means of determining media quality are needed. Protection mechanisms in digital audio processing tools have not eliminated the need of production entities located downstream the distribution chain to assess audio quality and detect defects inserted further upstream. Such analysis often relies on… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: 6 pages

    Journal ref: Proceedings of the 22nd International Society for Music Information Retrieval Conference, Online, 2021

  8. arXiv:2110.03744  [pdf, other

    cs.SD eess.AS

    Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

    Authors: Frederik Bous, Laurent Benaroya, Nicolas Obin, Axel Roebel

    Abstract: This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressivity of the source speaker is preserved during conversion while the identity of a target speaker is transferred. To do so, an original neural- VC architecture is proposed based on sequence-to-sequence voice conversion (S2S-VC) in which the speech prosody of the source speaker is preserved during conv… ▽ More

    Submitted 31 May, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: text overlap with arXiv:2107.12346

  9. arXiv:2110.03329  [pdf, other

    eess.AS cs.SD

    Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

    Authors: Axel Roebel, Frederik Bous

    Abstract: This paper introduces the Multi-Band Excited WaveNet a neural vocoder for speaking and singing voices. It aims to advance the state of the art towards an universal neural vocoder, which is a model that can generate voice signals from arbitrary mel spectrograms extracted from voice signals. Following the success of the DDSP model and following the development of the recently proposed excitation voc… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  10. arXiv:2107.12346  [pdf, other

    cs.SD cs.LG eess.AS

    Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

    Authors: Laurent Benaroya, Nicolas Obin, Axel Roebel

    Abstract: Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic rendering. This paper goes beyond voice identity and prese… ▽ More

    Submitted 27 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  11. arXiv:2104.07288  [pdf, other

    eess.AS cs.LG cs.SD

    Speaker Attentive Speech Emotion Recognition

    Authors: Clément Le Moine, Nicolas Obin, Axel Roebel

    Abstract: Speech Emotion Recognition (SER) task has known significant improvements over the last years with the advent of Deep Neural Networks (DNNs). However, even the most successful methods are still rather failing when adaptation to specific speakers and scenarios is needed, inevitably leading to poorer performances when compared to humans. In this paper, we present novel work based on the idea of teach… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  12. arXiv:2104.07283  [pdf, other

    eess.AS cs.LG cs.SD

    Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels

    Authors: Clément Le Moine Veillon, Nicolas Obin, Axel Roebel

    Abstract: This paper presents a end-to-end framework for the F0 transformation in the context of expressive voice conversion. A single neural network is proposed, in which a first module is used to learn F0 representation over different temporal scales and a second adversarial module is used to learn the transformation from one emotion to another. The first module is composed of a convolution layer with wav… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  13. arXiv:2006.08723  [pdf

    cs.CR cs.IT eess.SP

    Threats and Countermeasures of Cyber Security in Direct and Remote Vehicle Communication Systems

    Authors: Subrato Bharati, Prajoy Podder, M. Rubaiyat Hossain Mondal, Md. Robiul Alam Robel

    Abstract: Traffic management, road safety, and environmental impact are important issues in the modern world. These challenges are addressed by the application of sensing, control and communication methods of intelligent transportation systems (ITS). A part of ITS is a vehicular ad-hoc network (VANET) which means a wireless network of vehicles. However, communication among vehicles in a VANET exposes severa… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: 12 pages, 7 figures

    Journal ref: Journal of Information Assurance and Security (ISSN 1554-1010), Volume 15 (2020), pp. 153-164, MIR Labs, www.mirlabs.net/jias/index.html

  14. arXiv:2003.01220  [pdf, ps, other

    eess.AS cs.SD

    Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework

    Authors: Frederik Bous, Luc Ardaillon, Axel Roebel

    Abstract: This article investigates into recently emerging approaches that use deep neural networks for the estimation of glottal closure instants (GCI). We build upon our previous approach that used synthetic speech exclusively to create perfectly annotated training data and that had been shown to compare favourably with other training approaches using electroglottograph (EGG) signals. Here we introduce a… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  15. arXiv:1910.12614  [pdf, other

    eess.AS cs.LG cs.SD

    CycleGAN Voice Conversion of Spectral Envelopes using Adversarial Weights

    Authors: Rafael Ferro, Nicolas Obin, Axel Roebel

    Abstract: This paper tackles GAN optimization and stability issues in the context of voice conversion. First, to simplify the conversion task, we propose to use spectral envelopes as inputs. Second we propose two adversarial weight training paradigms, the generalized weighted GAN and the generator impact GAN, both aim at reducing the impact of the generator on the discriminator, so both can learn more gradu… ▽ More

    Submitted 11 July, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: 5 pages, 1 figure

  16. arXiv:1910.10235  [pdf, other

    eess.AS

    GCI detection from raw speech using a fully-convolutional network

    Authors: Luc Ardaillon, Axel Roebel

    Abstract: Glottal Closure Instants (GCI) detection consists in automatically detecting temporal locations of most significant excitation of the vocal tract from the speech signal. It is used in many speech analysis and processing applications, and various algorithms have been proposed for this purpose. Recently, new approaches using convolutional neural networks have emerged, with encouraging results. Follo… ▽ More

    Submitted 20 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Minor corrections after reviews of ICASSP 2020 (accepted paper). (Corrected typos, added funding aknowledgments, added some references, cleaned bibliography, added a few details)

  17. arXiv:1910.09497  [pdf, other

    cs.SD eess.AS

    Sound texture synthesis using RI spectrograms

    Authors: Hugo Caracalla, Axel Roebel

    Abstract: This article introduces a new parametric synthesis method for sound textures based on existing works in visual and sound texture synthesis. Starting from a base sound signal, an optimization process is performed until the cross-correlations between the feature-maps of several untrained 2D Convolutional Neural Networks (CNN) resemble those of an original sound texture. We use compressed RI spectrog… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

  18. arXiv:1905.03637  [pdf, ps, other

    cs.SD eess.AS

    Sound texture synthesis using convolutional neural networks

    Authors: Hugo Caracalla, Axel Roebel

    Abstract: The following article introduces a new parametric synthesis algorithm for sound textures inspired by existing methods used for visual textures. Using a 2D Convolutional Neural Network (CNN), a sound signal is modified until the temporal cross-correlations of the feature maps of its log-spectrogram resemble those of a target texture. We show that the resulting synthesized sound signal is both diffe… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: submitted to Digital Audio Conference (DAFx 2019)

  19. arXiv:1903.01416  [pdf, other

    cs.SD eess.AS

    Data Augmentation for Drum Transcription with Convolutional Neural Networks

    Authors: Celine Jacques, Axel Roebel

    Abstract: A recurrent issue in deep learning is the scarcity of data, in particular precisely annotated data. Few publicly available databases are correctly annotated and generating correct labels is very time consuming. The present article investigates into data augmentation strategies for Neural Networks training, particularly for tasks related to drum transcription. These tasks need very precise annotati… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Journal ref: Published in Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019

  20. arXiv:1903.01415  [pdf, other

    cs.SD eess.AS

    Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

    Authors: Alice Cohen-Hadria, Axel Roebel, Geoffroy Peeters

    Abstract: State-of-the-art singing voice separation is based on deep learning making use of CNN structures with skip connections (like U-net model, Wave-U-Net model, or MSDENSELSTM). A key to the success of these models is the availability of a large amount of training data. In the following study, we are interested in singing voice separation for mono signals and will investigate into comparing the U-Net a… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Journal ref: Published in Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019

  21. arXiv:1903.01161  [pdf, ps, other

    eess.AS cs.SD

    Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

    Authors: Frederik Bous, Axel Roebel

    Abstract: We conduct an investigation on various hyper-parameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously pro… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Journal ref: Published in Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019

  22. arXiv:1502.00141  [pdf, other

    stat.ML cs.SD

    An evaluation framework for event detection using a morphological model of acoustic scenes

    Authors: Mathieu Lagrange, Grégoire Lafay, Mathias Rossignol, Emmanouil Benetos, Axel Roebel

    Abstract: This paper introduces a model of environmental acoustic scenes which adopts a morphological approach by ab-stracting temporal structures of acoustic scenes. To demonstrate its potential, this model is employed to evaluate the performance of a large set of acoustic events detection systems. This model allows us to explicitly control key morphological aspects of the acoustic scene and isolate their… ▽ More

    Submitted 31 January, 2015; originally announced February 2015.

  23. arXiv:1109.6651  [pdf, other

    cs.SD

    Sound Analysis and Synthesis Adaptive in Time and Two Frequency Bands

    Authors: Marco Liuni, Peter Balazs, Axel Röbel

    Abstract: We present an algorithm for sound analysis and resynthesis with local automatic adaptation of time-frequency resolution. There exists several algorithms allowing to adapt the analysis window depending on its time or frequency location; in what follows we propose a method which select the optimal resolution depending on both time and frequency. We consider an approach that we denote as analysis-wei… ▽ More

    Submitted 29 September, 2011; originally announced September 2011.

    Journal ref: Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011

  24. arXiv:1109.6314  [pdf, other

    cs.SD

    An Entropy Based Method for Local Time-Adaptation of the Spectrogram

    Authors: M. Liuni, A. Röbel, M. Romito, X. Rodet

    Abstract: We propose a method for automatic local time-adaptation of the spectrogram of audio signals: it is based on the decomposition of a signal within a Gabor multi-frame through the STFT operator. The sparsity of the analysis in every individual frame of the multi-frame is evaluated through the Rényi entropy measures: the best local resolution is determined minimizing the entropy values. The overall sp… ▽ More

    Submitted 27 September, 2011; originally announced September 2011.

    Journal ref: CMMR 2010, LNCS 6684, pp. 60-75, 2011

  25. arXiv:1109.6313  [pdf, other

    cs.SD

    A Reduced Multiple Gabor Frame for Local Time Adaptation of the Spectrogram

    Authors: M. Liuni, A. Röbel, M. Romito, X. Rodet

    Abstract: In this paper we propose a method for automatic local time adap- tation of the spectrogram of an audio signal, based on its decomposition within a Gabor multi-frame. The sparsity of the analyses within each individual frame is evaluated through the Rényi entropies measures. According to the sparsity of the decompositions, an optimal resolution and a reduced multi-frame are determined, defining an… ▽ More

    Submitted 27 September, 2011; originally announced September 2011.

    Journal ref: Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria , September 6-10, 2010

  26. arXiv:1109.5876  [pdf, other

    cs.SD

    Rényi Information Measures for Spectral Change Detection

    Authors: Marco Liuni, Axel Röbel, Marco Romito, Xavier Rodet

    Abstract: Change detection within an audio stream is an important task in several domains, such as classification and segmentation of a sound or of a music piece, as well as indexing of broadcast news or surveillance applications. In this paper we propose two novel methods for spectral change detection without any assumption about the input sound: they are both based on the evaluation of information measure… ▽ More

    Submitted 27 September, 2011; originally announced September 2011.

    Comments: 2011 IEEE Conference on Acoustics, Speech and Signal Processing