Skip to main content

Showing 1–6 of 6 results for author: Georges, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.00174  [pdf, other

    eess.AS cs.AI cs.CL cs.CV eess.IV

    Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices

    Authors: Gokul Srinivasagan, Michael Deisher, Munir Georges

    Abstract: People with visual impairments have difficulty accessing touchscreen-enabled personal computing devices like mobile phones and laptops. The image-to-speech (ITS) systems can assist them in mitigating this problem, but their huge model size makes it extremely hard to be deployed on low-resourced embedded devices. In this paper, we aim to overcome this challenge by develo** an efficient endto-end… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 5 pages, 2 figures, 2 tables, presented at the 15th ITG Conference on Speech Communications, September 2023, Aachen

  2. Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes

    Authors: Kevin Glocker, Aaricia Herygers, Munir Georges

    Abstract: This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the P… ▽ More

    Submitted 16 August, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 2 tables, accepted to INTERSPEECH 2023; published version

    ACM Class: I.2.7

    Journal ref: Proc. INTERSPEECH 2023, 2258-2262

  3. arXiv:2303.06078  [pdf, other

    eess.AS cs.AI cs.NE

    An End-to-End Neural Network for Image-to-Audio Transformation

    Authors: Liu Chen, Michael Deisher, Munir Georges

    Abstract: This paper describes an end-to-end (E2E) neural architecture for the audio rendering of small portions of display content on low resource personal computing devices. It is intended to address the problem of accessibility for vision-impaired or vision-distracted users at the hardware level. Neural image-to-text (ITT) and text-to-speech (TTS) approaches are reviewed and a new technique is introduced… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 5 pages, 3 figures, 2023 IEEE Conference on Acoustics, Speech, and Signal Processing

  4. arXiv:2204.02269  [pdf, other

    cs.SD cs.CL eess.AS

    Repeat after me: Self-supervised learning of acoustic-to-articulatory map** by vocal imitation

    Authors: Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

    Abstract: We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory c… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

  5. arXiv:2104.03204  [pdf, other

    cs.SD cs.CL eess.AS

    Learning robust speech representation with an articulatory-regularized variational autoencoder

    Authors: Marc-Antoine Georges, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

    Abstract: It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory p… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

  6. arXiv:2008.05011  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Compact Speaker Embedding: lrx-vector

    Authors: Munir Georges, Jonathan Huang, Tobias Bocklet

    Abstract: Deep neural networks (DNN) have recently been widely used in speaker recognition systems, achieving state-of-the-art performance on various benchmarks. The x-vector architecture is especially popular in this research community, due to its excellent performance and manageable computational complexity. In this paper, we present the lrx-vector system, which is the low-rank factorized version of the x… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: Accepted to INTERSPEECH 2020

    Journal ref: Proc. Interspeech 2020