Search | arXiv e-print repository

Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

Authors: Hadeel Mabrouk, Omar Abugabal, Nourhan Sakr, Hesham M. Eraqi

Abstract: In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive progress in the domain of speech recognition has been exhibited by audio and audio-visual systems. Nevertheless, there is still much to be explored with regards to vi… ▽ More In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training. Impressive progress in the domain of speech recognition has been exhibited by audio and audio-visual systems. Nevertheless, there is still much to be explored with regards to visual speech recognition systems due to the visual ambiguity of some phonemes. To this end, the development of visual speech recognition models is crucial given the instability of audio models. The main contributions of this work are i) building on recent state-of-the-art word-based lipreading models by integrating sequence-level and frame-level Knowledge Distillation (KD) to their systems; ii) leveraging audio data during training visual models, a feat which has not been utilized in prior word-based work; iii) proposing the Gaussian-shaped averaging in frame-level KD, as an efficient technique that aids the model in distilling knowledge at the sequence model encoder. This work proposes a novel and competitive architecture for lip-reading, as we demonstrate a noticeable improvement in performance, setting a new benchmark equals to 88.64% on the LRW dataset. △ Less

Submitted 5 June, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2108.03543

arXiv:2207.01010 [pdf, other]

Government Intervention in Catastrophe Insurance Markets: A Reinforcement Learning Approach

Authors: Menna Hassan, Nourhan Sakr, Arthur Charpentier

Abstract: This paper designs a sequential repeated game of a micro-founded society with three types of agents: individuals, insurers, and a government. Nascent to economics literature, we use Reinforcement Learning (RL), closely related to multi-armed bandit problems, to learn the welfare impact of a set of proposed policy interventions per $1 spent on them. The paper rigorously discusses the desirability o… ▽ More This paper designs a sequential repeated game of a micro-founded society with three types of agents: individuals, insurers, and a government. Nascent to economics literature, we use Reinforcement Learning (RL), closely related to multi-armed bandit problems, to learn the welfare impact of a set of proposed policy interventions per $1 spent on them. The paper rigorously discusses the desirability of the proposed interventions by comparing them against each other on a case-by-case basis. The paper provides a framework for algorithmic policy evaluation using calibrated theoretical models which can assist in feasibility studies. △ Less

Submitted 3 July, 2022; originally announced July 2022.

arXiv:2108.03543 [pdf, other]

Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading

Authors: Shahd Elashmawy, Marian Ramsis, Hesham M. Eraqi, Farah Eldeshnawy, Hadeel Mabrouk, Omar Abugabal, Nourhan Sakr

Abstract: Despite the advancement in the domain of audio and audio-visual speech recognition, visual speech recognition systems are still quite under-explored due to the visual ambiguity of some phonemes. In this work, we propose a new lip-reading model that combines three contributions. First, the model front-end adopts a spatio-temporal attention mechanism to help extract the informative data from the inp… ▽ More Despite the advancement in the domain of audio and audio-visual speech recognition, visual speech recognition systems are still quite under-explored due to the visual ambiguity of some phonemes. In this work, we propose a new lip-reading model that combines three contributions. First, the model front-end adopts a spatio-temporal attention mechanism to help extract the informative data from the input visual frames. Second, the model back-end utilizes a sequence-level and frame-level Knowledge Distillation (KD) techniques that allow leveraging audio data during the visual model training. Third, a data preprocessing pipeline is adopted that includes facial landmarks detection-based lip-alignment. On LRW lip-reading dataset benchmark, a noticeable accuracy improvement is demonstrated; the spatio-temporal attention, Knowledge Distillation, and lip-alignment contributions achieved 88.43%, 88.64%, and 88.37% respectively. △ Less

Submitted 7 August, 2021; originally announced August 2021.

arXiv:2104.12558 [pdf, other]

EduPal leaves no professor behind: Supporting faculty via a peer-powered recommender system

Authors: Nourhan Sakr, Aya Salama, Nadeen Tameesh, Gihan Osman

Abstract: The swift transitions in higher education after the COVID-19 outbreak identified a gap in the pedagogical support available to faculty. We propose a smart, knowledge-based chatbot that addresses issues of knowledge distillation and provides faculty with personalized recommendations. Our collaborative system crowdsources useful pedagogical practices and continuously filters recommendations based on… ▽ More The swift transitions in higher education after the COVID-19 outbreak identified a gap in the pedagogical support available to faculty. We propose a smart, knowledge-based chatbot that addresses issues of knowledge distillation and provides faculty with personalized recommendations. Our collaborative system crowdsources useful pedagogical practices and continuously filters recommendations based on theory and user feedback, thus enhancing the experiences of subsequent peers. We build a prototype for our local STEM faculty as a proof concept and receive favorable feedback that encourages us to extend our development and outreach, especially to underresourced faculty. △ Less

Submitted 20 April, 2021; originally announced April 2021.

arXiv:2103.08850 [pdf, other]

Few Shot System Identification for Reinforcement Learning

Authors: Karim Farid, Nourhan Sakr

Abstract: Learning by interaction is the key to skill acquisition for most living organisms, which is formally called Reinforcement Learning (RL). RL is efficient in finding optimal policies for endowing complex systems with sophisticated behavior. All paradigms of RL utilize a system model for finding the optimal policy. Modeling dynamics can be done by formulating a mathematical model or system identifica… ▽ More Learning by interaction is the key to skill acquisition for most living organisms, which is formally called Reinforcement Learning (RL). RL is efficient in finding optimal policies for endowing complex systems with sophisticated behavior. All paradigms of RL utilize a system model for finding the optimal policy. Modeling dynamics can be done by formulating a mathematical model or system identification. Dynamic models are usually exposed to aleatoric and epistemic uncertainties that can divert the model from the one acquired and cause the RL algorithm to exhibit erroneous behavior. Accordingly, the RL process sensitive to operating conditions and changes in model parameters and lose its generality. To address these problems, Intensive system identification for modeling purposes is needed for each system even if the model dynamics structure is the same, as the slight deviation in the model parameters can render the model useless in RL. The existence of an oracle that can adaptively predict the rest of the trajectory regardless of the uncertainties can help resolve the issue. The target of this work is to present a framework for facilitating the system identification of different instances of the same dynamics class by learning a probability distribution of the dynamics conditioned on observed data with variational inference and show its reliability in robustly solving different instances of control problems with the same model in model-based RL with maximum sample efficiency. △ Less

Submitted 15 July, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

arXiv:1701.05854 [pdf, ps, other]

doi 10.1103/PhysRevLett.104.137203

Asymmetric Spin-wave Dispersion on Fe(110): Direct Evidence of Dzyaloshinskii--Moriya Interaction

Authors: Kh. Zakeri, Y. Zhang, J. Prokop, T. -H. Chuang, N. Sakr, W. X. Tang, J. Kirschner

Abstract: The influence of the Dzyaloshinskii-Moriya interaction on the spin-wave dispersion in an Fe double layer grown on W(110) is measured for the first time. It is demonstrated that the Dzyaloshinskii-Moriya interaction breaks the degeneracy of spin waves and leads to an asymmetric spin-wave dispersion relation. An extended Heisenberg spin Hamiltonian is employed to obtain the longitudinal component of… ▽ More The influence of the Dzyaloshinskii-Moriya interaction on the spin-wave dispersion in an Fe double layer grown on W(110) is measured for the first time. It is demonstrated that the Dzyaloshinskii-Moriya interaction breaks the degeneracy of spin waves and leads to an asymmetric spin-wave dispersion relation. An extended Heisenberg spin Hamiltonian is employed to obtain the longitudinal component of the Dzyaloshinskii-Moriya vectors from the experimentally measured energy asymmetry. △ Less

Submitted 20 January, 2017; originally announced January 2017.

Journal ref: Phys. Rev. Lett. 104, 137203 (2010)

arXiv:1610.06209 [pdf, other]

Structured adaptive and random spinners for fast machine learning computations

Authors: Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Nourhan Sakr, Tamas Sarlos, Jamal Atif

Abstract: We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consid… ▽ More We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms, new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD3HD2HD1 structured matrix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications. △ Less

Submitted 26 November, 2016; v1 submitted 19 October, 2016; originally announced October 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1605.09046

arXiv:1603.07947 [pdf, other]

An Empirical Study of Online Packet Scheduling Algorithms

Authors: Nourhan Sakr, Cliff Stein

Abstract: This work studies online scheduling algorithms for buffer management, develops new algorithms, and analyzes their performances. Packets arrive at a release time r, with a non-negative weight w and an integer deadline d. At each time step, at most one packet is scheduled. The modified greedy (MG) algorithm is 1.618-competitive for the objective of maximizing the sum of weights of packets sent, assu… ▽ More This work studies online scheduling algorithms for buffer management, develops new algorithms, and analyzes their performances. Packets arrive at a release time r, with a non-negative weight w and an integer deadline d. At each time step, at most one packet is scheduled. The modified greedy (MG) algorithm is 1.618-competitive for the objective of maximizing the sum of weights of packets sent, assuming agreeable deadlines. We analyze the empirical behavior of MG in a situation with arbitrary deadlines and demonstrate that it is at a disadvantage when frequently preferring maximum weight packets over early deadline ones. We develop the MLP algorithm, which remedies this problem whilst mimicking the behavior of the offline algorithm. Our comparative analysis shows that, although the competitive ratio of MLP is not as good as that of MG, it performs better in practice. We validate this by simulating the behavior of both algorithms under a spectrum of simulated parameter settings. Finally, we propose the design of three additional algorithms, which may help in improving performance in practice. △ Less

Submitted 25 March, 2016; originally announced March 2016.

Showing 1–8 of 8 results for author: Sakr, N