Search | arXiv e-print repository

Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

Authors: Hadeel Mabrouk, Omar Abugabal, Nourhan Sakr, Hesham M. Eraqi

Abstract: In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive progress in the domain of speech recognition has been exhibited by audio and audio-visual systems. Nevertheless, there is still much to be explored with regards to vi… ▽ More In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive progress in the domain of speech recognition has been exhibited by audio and audio-visual systems. Nevertheless, there is still much to be explored with regards to visual speech recognition systems due to the visual ambiguity of some phonemes. To this end, the development of visual speech recognition models is crucial given the instability of audio models. The main contributions of this work are i) building on recent state-of-the-art word-based lipreading models by integrating sequence-level and frame-level Knowledge Distillation (KD) to their systems; ii) leveraging audio data during training visual models, a feat which has not been utilized in prior word-based work; iii) proposing the Gaussian-shaped averaging in frame-level KD, as an efficient technique that aids the model in distilling knowledge at the sequence model encoder. This work proposes a novel and competitive architecture for lip-reading, as we demonstrate a noticeable improvement in performance, setting a new benchmark equals to 88.64% on the LRW dataset. △ Less

Submitted 5 June, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2108.03543

arXiv:2108.03543 [pdf, other]

Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading

Authors: Shahd Elashmawy, Marian Ramsis, Hesham M. Eraqi, Farah Eldeshnawy, Hadeel Mabrouk, Omar Abugabal, Nourhan Sakr

Abstract: Despite the advancement in the domain of audio and audio-visual speech recognition, visual speech recognition systems are still quite under-explored due to the visual ambiguity of some phonemes. In this work, we propose a new lip-reading model that combines three contributions. First, the model front-end adopts a spatio-temporal attention mechanism to help extract the informative data from the inp… ▽ More Despite the advancement in the domain of audio and audio-visual speech recognition, visual speech recognition systems are still quite under-explored due to the visual ambiguity of some phonemes. In this work, we propose a new lip-reading model that combines three contributions. First, the model front-end adopts a spatio-temporal attention mechanism to help extract the informative data from the input visual frames. Second, the model back-end utilizes a sequence-level and frame-level Knowledge Distillation (KD) techniques that allow leveraging audio data during the visual model training. Third, a data preprocessing pipeline is adopted that includes facial landmarks detection-based lip-alignment. On LRW lip-reading dataset benchmark, a noticeable accuracy improvement is demonstrated; the spatio-temporal attention, Knowledge Distillation, and lip-alignment contributions achieved 88.43%, 88.64%, and 88.37% respectively. △ Less

Submitted 7 August, 2021; originally announced August 2021.

arXiv:1203.1021 [pdf]

Development of an Ontology to Assist the Modeling of Accident Scenarii "Application on Railroad Transport "

Authors: Ahmed Maalel, Habib Hadj mabrouk, Lassad Mejri, Henda Hajjami Ben Ghezela

Abstract: In a world where communication and information sharing are at the heart of our business, the terminology needs are most pressing. It has become imperative to identify the terms used and defined in a consensual and coherent way while preserving linguistic diversity. To streamline and strengthen the process of acquisition, representation and exploitation of scenarii of train accidents, it is necessa… ▽ More In a world where communication and information sharing are at the heart of our business, the terminology needs are most pressing. It has become imperative to identify the terms used and defined in a consensual and coherent way while preserving linguistic diversity. To streamline and strengthen the process of acquisition, representation and exploitation of scenarii of train accidents, it is necessary to harmonize and standardize the terminology used by players in the security field. The research aims to significantly improve analytical activities and operations of the various safety studies, by tracking the error in system, hardware, software and human. This paper presents the contribution of ontology to modeling scenarii for rail accidents through a knowledge model based on a generic ontology and domain ontology. After a detailed presentation of the state of the art material, this article presents the first results of the developed model. △ Less

Submitted 5 March, 2012; originally announced March 2012.

Comments: 7 pages, 9 figures, Journal of Computing (ISSN 2151-9617); Journal of Computing, Volume 3, Issue 7, July 2011

Journal ref: J. of Computing. 3. 7. (2011) 1-8

Showing 1–3 of 3 results for author: Mabrouk, H