Skip to main content

Showing 1–3 of 3 results for author: Deisher, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.00174  [pdf, other

    eess.AS cs.AI cs.CL cs.CV eess.IV

    Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices

    Authors: Gokul Srinivasagan, Michael Deisher, Munir Georges

    Abstract: People with visual impairments have difficulty accessing touchscreen-enabled personal computing devices like mobile phones and laptops. The image-to-speech (ITS) systems can assist them in mitigating this problem, but their huge model size makes it extremely hard to be deployed on low-resourced embedded devices. In this paper, we aim to overcome this challenge by develo** an efficient endto-end… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: 5 pages, 2 figures, 2 tables, presented at the 15th ITG Conference on Speech Communications, September 2023, Aachen

  2. arXiv:2303.06078  [pdf, other

    eess.AS cs.AI cs.NE

    An End-to-End Neural Network for Image-to-Audio Transformation

    Authors: Liu Chen, Michael Deisher, Munir Georges

    Abstract: This paper describes an end-to-end (E2E) neural architecture for the audio rendering of small portions of display content on low resource personal computing devices. It is intended to address the problem of accessibility for vision-impaired or vision-distracted users at the hardware level. Neural image-to-text (ITT) and text-to-speech (TTS) approaches are reviewed and a new technique is introduced… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 5 pages, 3 figures, 2023 IEEE Conference on Acoustics, Speech, and Signal Processing

  3. arXiv:1910.11488  [pdf, other

    eess.AS cs.LG

    Structural sparsification for Far-field Speaker Recognition with GNA

    Authors: **gchi Zhang, Jonathan Huang, Michael Deisher, Hai Li, Yiran Chen

    Abstract: Recently, deep neural networks (DNN) have been widely used in speaker recognition area. In order to achieve fast response time and high accuracy, the requirements for hardware resources increase rapidly. However, as the speaker recognition application is often implemented on mobile devices, it is necessary to maintain a low computational cost while kee** high accuracy in far-field condition. In… ▽ More

    Submitted 14 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: submitted to icassp2020