-
Towards a World-English Language Model for On-Device Virtual Assistants
Authors:
Rricha Jalota,
Lyan Verwimp,
Markus Nussbaum-Thom,
Amr Mousa,
Arturo Argueta,
Youssef Oualil
Abstract:
Neural Network Language Models (NNLMs) for Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories is one way to improve scalability. In this work, we combine regional variants of English to build a ``World English'' NNLM for on-device VAs. In particular,…
▽ More
Neural Network Language Models (NNLMs) for Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories is one way to improve scalability. In this work, we combine regional variants of English to build a ``World English'' NNLM for on-device VAs. In particular, we investigate the application of adapter bottlenecks to model dialect-specific characteristics in our existing production NNLMs {and enhance the multi-dialect baselines}. We find that adapter modules are more effective in modeling dialects than specializing entire sub-networks. Based on this insight and leveraging the design of our production models, we introduce a new architecture for World English NNLM that meets the accuracy, latency, and memory constraints of our single-dialect models.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Arabic Character Segmentation Using Projection Based Approach with Profile's Amplitude Filter
Authors:
Mahmoud A. A. Mousa,
Mohammed S. Sayed,
Mahmoud I. Abdalla
Abstract:
Arabic is one of the languages that present special challenges to Optical character recognition (OCR). The main challenge in Arabic is that it is mostly cursive. Therefore, a segmentation process must be carried out to determine where the character begins and where it ends. This step is essential for character recognition. This paper presents Arabic character segmentation algorithm. The proposed a…
▽ More
Arabic is one of the languages that present special challenges to Optical character recognition (OCR). The main challenge in Arabic is that it is mostly cursive. Therefore, a segmentation process must be carried out to determine where the character begins and where it ends. This step is essential for character recognition. This paper presents Arabic character segmentation algorithm. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. This is done using profile's amplitude filter and simple edge tool to find characters separations. Our algorithm shows promising performance when applied on different machine printed documents with different Arabic fonts.
△ Less
Submitted 3 July, 2017;
originally announced July 2017.
-
Optimal Control for Multi-Mode Systems with Discrete Costs
Authors:
Mahmoud A. A. Mousa,
Sven Schewe,
Dominik Wojtczak
Abstract:
This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an o…
▽ More
This paper studies optimal time-bounded control in multi-mode systems with discrete costs. Multi-mode systems are an important subclass of linear hybrid systems, in which there are no guards on transitions and all invariants are global. Each state has a continuous cost attached to it, which is linear in the sojourn time, while a discrete cost is attached to each transition taken. We show that an optimal control for this model can be computed in NEXPTIME and approximated in PSPACE. We also show that the one-dimensional case is simpler: although the problem is NP-complete (and in LOGSPACE for an infinite time horizon), we develop an FPTAS for finding an approximate solution.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Authors:
Zixing Zhang,
Jürgen Geiger,
Jouni Pohjalainen,
Amr El-Desoky Mousa,
Wenyu **,
Björn Schuller
Abstract:
Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the sho…
▽ More
Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks.
△ Less
Submitted 21 September, 2018; v1 submitted 30 May, 2017;
originally announced May 2017.
-
The ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel LSTM Speech Enhancement with Multi-Channel Correlation Sha** Dereverberation and LSTM Language Models
Authors:
Amr El-Desoky Mousa,
Erik Marchi,
Björn Schuller
Abstract:
This paper presents our contribution to the 3rd CHiME Speech Separation and Recognition Challenge. Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE). Networks are trained to predict clean speech as well as noise features from noisy speech features. In addition, the system applies two methods of dereverberation…
▽ More
This paper presents our contribution to the 3rd CHiME Speech Separation and Recognition Challenge. Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE). Networks are trained to predict clean speech as well as noise features from noisy speech features. In addition, the system applies two methods of dereverberation on the 6-channel recordings of the challenge. The first is the Phase-Error based Filtering (PEF) that uses time-varying phase-error filters based on estimated time-difference of arrival of the speech source and the phases of the microphone signals. The second is the Correlation Sha** (CS) that applies a reduction of the long-term correlation energy in reverberant speech. The Linear Prediction (LP) residual is processed to suppress the long-term correlation. Furthermore, the system employs a LSTM Language Model (LM) to perform N-best rescoring of recognition hypotheses. Using the proposed methods, an improved Word Error Rate (WER) of 24.38% is achieved over the real eval test set. This is around 25% relative improvement over the challenge baseline.
△ Less
Submitted 1 October, 2015;
originally announced October 2015.