Search | arXiv e-print repository

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions of autoregressive deliberation systems. We further show that the design of the denoising training allows PRoDeliberation to overcome the limitations of small ASR devices, and we provide analysis on the necessity of each component of the system. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2404.16710 [pdf, other]

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Authors: Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

Abstract: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exi… ▽ More We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task. △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Code open sourcing is in progress

arXiv:2402.18113 [pdf, other]

Small But Funny: A Feedback-Driven Approach to Humor Distillation

Authors: Sahithya Ravi, Patrick Huber, Akshat Shrivastava, Aditya Sagar, Ahmed Aly, Vered Shwartz, Arash Einolghozati

Abstract: The emergence of Large Language Models (LLMs) has brought to light promising language generation capabilities, particularly in performing tasks like complex reasoning and creative writing. Consequently, distillation through imitation of teacher responses has emerged as a popular technique to transfer knowledge from LLMs to more accessible, Small Language Models (SLMs). While this works well for si… ▽ More The emergence of Large Language Models (LLMs) has brought to light promising language generation capabilities, particularly in performing tasks like complex reasoning and creative writing. Consequently, distillation through imitation of teacher responses has emerged as a popular technique to transfer knowledge from LLMs to more accessible, Small Language Models (SLMs). While this works well for simpler tasks, there is a substantial performance gap on tasks requiring intricate language comprehension and creativity, such as humor generation. We hypothesize that this gap may stem from the fact that creative tasks might be hard to learn by imitation alone and explore whether an approach, involving supplementary guidance from the teacher, could yield higher performance. To address this, we study the effect of assigning a dual role to the LLM - as a "teacher" generating data, as well as a "critic" evaluating the student's performance. Our experiments on humor generation reveal that the incorporation of feedback significantly narrows the performance gap between SLMs and their larger counterparts compared to merely relying on imitation. As a result, our research highlights the potential of using feedback as an additional dimension to data when transferring complex language abilities via distillation. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2305.08094 [pdf, other]

Accelerating genetic optimization of nonlinear model predictive control by learning optimal search space size

Authors: Eslam Mostafa, Hussein A. Aly, Ahmed Elliethy

Abstract: Nonlinear model predictive control (NMPC) solves a multivariate optimization problem to estimate the system's optimal control inputs in each control cycle. Such optimization is made more difficult by several factors, such as nonlinearities inherited in the system, highly coupled inputs, and various constraints related to the system's physical limitations. These factors make the optimization to be… ▽ More Nonlinear model predictive control (NMPC) solves a multivariate optimization problem to estimate the system's optimal control inputs in each control cycle. Such optimization is made more difficult by several factors, such as nonlinearities inherited in the system, highly coupled inputs, and various constraints related to the system's physical limitations. These factors make the optimization to be non-convex and hard to solve traditionally. Genetic algorithm (GA) is typically used extensively to tackle such optimization in several application domains because it does not involve differential calculation or gradient evaluation in its solution estimation. However, the size of the search space in which the GA searches for the optimal control inputs is crucial for the applicability of the GA with systems that require fast response. This paper proposes an approach to accelerate the genetic optimization of NMPC by learning optimal search space size. The proposed approach trains a multivariate regression model to adaptively predict the best smallest search space in every control cycle. The estimated best smallest size of search space is fed to the GA to allow for searching the optimal control inputs within this search space. The proposed approach not only reduces the GA's computational time but also improves the chance of obtaining the optimal control inputs in each cycle. The proposed approach was evaluated on two nonlinear systems and compared with two other genetic-based NMPC approaches implemented on the GPU of a Nvidia Jetson TX2 embedded platform in a processor-in-the-loop (PIL) fashion. The results show that the proposed approach provides a 39-53\% reduction in computational time. Additionally, it increases the convergence percentage to the optimal control inputs within the cycle's time by 48-56\%, resulting in a significant performance enhancement. The source code is available on GitHub. △ Less

Submitted 14 May, 2023; originally announced May 2023.

arXiv:2209.02448 [pdf, ps, other]

Fast Adaptive Regression-based Model Predictive Control

Authors: Eslam Mostafa, Hussein A. Aly, Ahmed Elliethy

Abstract: Model predictive control (MPC) is an optimal control method that predicts the future states of the system being controlled and estimates the optimal control inputs that drive the predicted states to the required reference. The computations of the MPC are performed at pre-determined sample instances over a finite time horizon. The number of sample instances and the horizon length determine the perf… ▽ More Model predictive control (MPC) is an optimal control method that predicts the future states of the system being controlled and estimates the optimal control inputs that drive the predicted states to the required reference. The computations of the MPC are performed at pre-determined sample instances over a finite time horizon. The number of sample instances and the horizon length determine the performance of the MPC and its computational cost. A long horizon with a large sample count allows the MPC to better estimate the inputs when the states have rapid changes over time, which results in better performance but at the expense of high computational cost. However, this long horizon is not always necessary, especially for slowly-varying states. In this case, a short horizon with less sample count is preferable as the same MPC performance can be obtained but at a fraction of the computational cost. In this paper, we propose an adaptive regression-based MPC that predicts the best minimum horizon length and the sample count from several features extracted from the time-varying changes of the states. The proposed technique builds a synthetic dataset using the system model and utilizes the dataset to train a support vector regressor that performs the prediction. The proposed technique is experimentally compared with several state-of-the-art techniques on both linear and non-linear models. The proposed technique shows a superior reduction in computational time with a reduction of about 35-65\% compared with the other techniques without introducing a noticeable loss in performance. △ Less

Submitted 4 May, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: Accepted for publication in Control Theory and Technology May. 2023

arXiv:2207.04521 [pdf, other]

Information-Theoretic Bounds for Steganography in Multimedia

Authors: Hassan Y. El Arsh, Amr Abdelaziz, Ahmed Elliethy, Hussein A. Aly, T. Aaron Gulliver

Abstract: Steganography in multimedia aims to embed secret data into an innocent looking multimedia cover object. This embedding introduces some distortion to the cover object and produces a corresponding stego object. The embedding distortion is measured by a cost function that determines the detection probability of the existence of the embedded secret data. A cost function related to the maximum embeddin… ▽ More Steganography in multimedia aims to embed secret data into an innocent looking multimedia cover object. This embedding introduces some distortion to the cover object and produces a corresponding stego object. The embedding distortion is measured by a cost function that determines the detection probability of the existence of the embedded secret data. A cost function related to the maximum embedding rate is typically employed to evaluate a steganographic system. In addition, the distribution of multimedia sources follows the Gibbs distribution which is a complex statistical model that restricts analysis. Thus, previous multimedia steganographic approaches either assume a relaxed distribution or presume a proposition on the maximum embedding rate and then try to prove it is correct. Conversely, this paper introduces an analytic approach to determining the maximum embedding rate in multimedia cover objects through a constrained optimization problem concerning the relationship between the maximum embedding rate and the probability of detection by any steganographic detector. The KL-divergence between the distributions for the cover and stego objects is used as the cost function as it upper bounds the performance of the optimal steganographic detector. An equivalence between the Gibbs and correlated-multivariate-quantized-Gaussian distributions is established to solve this optimization problem. The solution provides an analytic form for the maximum embedding rate in terms of the WrightOmega function. Moreover, it is proven that the maximum embedding rate is in agreement with the commonly used Square Root Law (SRL) for steganography, but the solution presented here is more accurate. Finally, the theoretical results obtained are verified experimentally. △ Less

Submitted 15 July, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2111.04960

arXiv:2206.14330 [pdf, ps, other]

Model-Based Approaches to Channel Charting

Authors: Amr Aly, Ender Ayanoglu

Abstract: We present new ways of producing a channel chart [1] employing model-based approaches. We estimate the angle of arrival theta and the distance rho between the base station and the user equipment by employing our algorithms, inverse of the root sum squares of channel coefficients (ISQ) algorithm, linear regression (LR) algorithm, and MUSIC/MUSIC (MM) algorithm. We compare these methods with the cha… ▽ More We present new ways of producing a channel chart [1] employing model-based approaches. We estimate the angle of arrival theta and the distance rho between the base station and the user equipment by employing our algorithms, inverse of the root sum squares of channel coefficients (ISQ) algorithm, linear regression (LR) algorithm, and MUSIC/MUSIC (MM) algorithm. We compare these methods with the channel charting algorithms principal component analysis (PCA), Sammon's method (SM), and autoencoder (AE) [1]. We show that ISQ, LR, and MM surpass PCA, SM, and AE in performance. We also compare our algorithm MM with an algorithm from the literature that uses the MUSIC algorithm jointly on theta and rho. We call this algorithm the JM algorithm. JM performs very slightly better than MM but at a substantial increase in complexity. Finally, we introduce the rotate-and-sum (RS) algorithm which has about the same performance as the MM and JM algorithms. Unlike MUSIC, RS does not employ eigenvalue and eigenvector analysis. Thus, it is more suitable for direct register transfer logic (RTL) implementation. △ Less

Submitted 30 October, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: 17 pages, 13 figures, 9 tables

arXiv:2206.09520 [pdf, other]

ILX: Intelligent "Location+X" Data Systems (Vision Paper)

Authors: Walid G. Aref, Ahmed M. Aly, Anas Daghistani, Yeasir Rayhan, Jianguo Wang, Libin Zhou

Abstract: Due to the ubiquity of mobile phones and location-detection devices, location data is being generated in very large volumes. Queries and operations that are performed on location data warrant the use of database systems. Despite that, location data is being supported in data systems as an afterthought. Typically, relational or NoSQL data systems that are mostly designed with non-location data in m… ▽ More Due to the ubiquity of mobile phones and location-detection devices, location data is being generated in very large volumes. Queries and operations that are performed on location data warrant the use of database systems. Despite that, location data is being supported in data systems as an afterthought. Typically, relational or NoSQL data systems that are mostly designed with non-location data in mind get extended with spatial or spatiotemporal indexes, some query operators, and higher level syntactic sugar in order to support location data. The ubiquity of location data and location data services call for systems that are solely designed and optimized for the efficient support of location data. This paper envisions designing intelligent location+X data systems, ILX for short, where location is treated as a first-class citizen type. ILX is tailored with location data as the main data type (location-first). Because location data is typically augmented with other data types X, e.g., graphs, text data, click streams, annotations, etc., ILX needs to be extensible to support other data types X along with location. This paper envisions the main features that ILX should support, and highlights research challenges in realizing and supporting ILX. △ Less

Submitted 1 August, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

arXiv:2202.00901 [pdf, other]

Retrieve-and-Fill for Scenario-based Task-Oriented Semantic Parsing

Authors: Akshat Shrivastava, Shrey Desai, Anchit Gupta, Ali Elkahky, Aleksandr Livshits, Alexander Zotov, Ahmed Aly

Abstract: Task-oriented semantic parsing models have achieved strong results in recent years, but unfortunately do not strike an appealing balance between model size, runtime latency, and cross-domain generalizability. We tackle this problem by introducing scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario" (an intent-slot template wi… ▽ More Task-oriented semantic parsing models have achieved strong results in recent years, but unfortunately do not strike an appealing balance between model size, runtime latency, and cross-domain generalizability. We tackle this problem by introducing scenario-based semantic parsing: a variant of the original task which first requires disambiguating an utterance's "scenario" (an intent-slot template with variable leaf spans) before generating its frame, complete with ontology and utterance tokens. This formulation enables us to isolate coarse-grained and fine-grained aspects of the task, each of which we solve with off-the-shelf neural modules, also optimizing for the axes outlined above. Concretely, we create a Retrieve-and-Fill (RAF) architecture comprised of (1) a retrieval module which ranks the best scenario given an utterance and (2) a filling module which imputes spans into the scenario to create the frame. Our model is modular, differentiable, interpretable, and allows us to garner extra supervision from scenarios. RAF achieves strong results in high-resource, low-resource, and multilingual settings, outperforming recent approaches by wide margins despite, using base pre-trained encoders, small sequence lengths, and parallel decoding. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2112.13174 [pdf]

doi 10.14778/3494124.3494132

An Experimental Evaluation and Investigation of Waves of Misery in R-trees

Authors: Lu Xing, Eric Lee, Tong An, Bo-Cheng Chu, Ahmed Mahmood, Ahmed M. Aly, Jianguo Wang, Walid G. Aref

Abstract: Waves of misery is a phenomenon where spikes of many node splits occur over short periods of time in tree indexes. Waves of misery negatively affect the performance of tree indexes in insertion-heavy workloads.Waves of misery have been first observed in the context of the B-tree, where these waves cause unpredictable index performance. In particular, the performance of search and index-update oper… ▽ More Waves of misery is a phenomenon where spikes of many node splits occur over short periods of time in tree indexes. Waves of misery negatively affect the performance of tree indexes in insertion-heavy workloads.Waves of misery have been first observed in the context of the B-tree, where these waves cause unpredictable index performance. In particular, the performance of search and index-update operations deteriorate when a wave of misery takes place, but is more predictable between the waves. This paper investigates the presence or lack of waves of misery in several R-tree variants, and studies the extent of which these waves impact the performance of each variant. Interestingly, although having poorer query performance, the Linear and Quadratic R-trees are found to be more resilient to waves of misery than both the Hilbert and R*-trees. This paper presents several techniques to reduce the impact in performance of the waves of misery for the Hilbert and R*-trees. One way to eliminate waves of misery is to force node splits to take place at regular times before nodes become full to achieve deterministic performance. The other way is that upon splitting a node, do not split it evenly but rather at different node utilization factors. This allows leaf nodes not to fill at the same pace. We study the impact of two new techniques to mitigate waves of misery after the tree index has been constructed, namely Regular Elective Splits (RES, for short) and Unequal Random Splits (URS, for short). Our experimental investigation highlights the trade-offs in performance of the introduced techniques and the pros and cons of each technique. △ Less

Submitted 24 December, 2021; originally announced December 2021.

Comments: To appear in VLDB 2022

arXiv:2111.06621 [pdf]

Radiative Pattern of Intralayer and Interlayer Excitons in Two-Dimensional WS2/WSe2 Heterostructure

Authors: Mohammed Adel Aly, Manan Shah, Lorenz Maximilian Schneider, Kyungnam Kang, Martin Koch, Eui-Hyeok Yang, Arash Rahimi-Iman

Abstract: Two-dimensional (2D) heterostructures (HS) formed by transition-metal dichalcogenide (TMDC) monolayers offer a unique platform for the study of intralayer and interlayer excitons as well as moiré-pattern-induced features. Particularly, the dipolar charge-transfer exciton comprising an electron and a hole, which are confined to separate layers of 2D semiconductors and Coulomb-bound across the heter… ▽ More Two-dimensional (2D) heterostructures (HS) formed by transition-metal dichalcogenide (TMDC) monolayers offer a unique platform for the study of intralayer and interlayer excitons as well as moiré-pattern-induced features. Particularly, the dipolar charge-transfer exciton comprising an electron and a hole, which are confined to separate layers of 2D semiconductors and Coulomb-bound across the heterojunction interface, has drawn considerable attention in the research community. On the one hand, it bears significance for optoelectronic devices, e.g. in terms of charge carrier extraction from photovoltaic devices. On the other hand, its spatially indirect nature and correspondingly high longevity among excitons as well as its out-of-plane dipole orientation render it attractive for excitonic Bose-Einstein condensation studies, which address collective coherence effects, and for photonic integration schemes with TMDCs. Here, we demonstrate the interlayer excitons' out-of-plane dipole orientation through angle-resolved spectroscopy of the HS photoluminescence at cryogenic temperatures, employing a tungsten-based TMDC HS. Within the measurable light cone, the directly-obtained radiation profile of this species clearly resembles that of an in-plane emitter which deviates from that of the intralayer bright excitons as well as the other excitonic HS features recently attributed to artificial superlattices formed by moiré patterns. △ Less

Submitted 12 November, 2021; originally announced November 2021.

arXiv:2111.06331 [pdf, other]

Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset

Authors: Aly Moustafa, Salah A. Aly

Abstract: Current authentication and trusted systems depend on classical and biometric methods to recognize or authorize users. Such methods include audio speech recognitions, eye, and finger signatures. Recent tools utilize deep learning and transformers to achieve better results. In this paper, we develop a deep learning constructed model for Arabic speakers identification by using Wav2Vec2.0 and HuBERT a… ▽ More Current authentication and trusted systems depend on classical and biometric methods to recognize or authorize users. Such methods include audio speech recognitions, eye, and finger signatures. Recent tools utilize deep learning and transformers to achieve better results. In this paper, we develop a deep learning constructed model for Arabic speakers identification by using Wav2Vec2.0 and HuBERT audio representation learning tools. The end-to-end Wav2Vec2.0 paradigm acquires contextualized speech representations learnings by randomly masking a set of feature vectors, and then applies a transformer neural network. We employ an MLP classifier that is able to differentiate between invariant labeled classes. We show several experimental results that safeguard the high accuracy of the proposed model. The experiments ensure that an arbitrary wave signal for a certain speaker can be identified with 98% and 97.1% accuracies in the cases of Wav2Vec2.0 and HuBERT, respectively. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 5 pages, 9 figures, 2 tables

arXiv:2111.04960 [pdf, ps, other]

Information-Theoretic Limits for Steganography in Multimedia

Authors: Hassan Y. El-Arsh, Amr Abdelaziz, Ahmed Elliethy, Hussein A. Aly

Abstract: Steganography is the art and science of hiding data within innocent-looking objects (cover objects). Multimedia objects such as images and videos are an attractive type of cover objects due to their high embedding rates. There exist many techniques for performing steganography in both the literature and the practical world. Meanwhile, the definition of the steganographic capacity for multimedia an… ▽ More Steganography is the art and science of hiding data within innocent-looking objects (cover objects). Multimedia objects such as images and videos are an attractive type of cover objects due to their high embedding rates. There exist many techniques for performing steganography in both the literature and the practical world. Meanwhile, the definition of the steganographic capacity for multimedia and how to be calculated has not taken full attention. In this paper, for multivariate quantized-Gaussian-distributed multimedia, we study the maximum achievable embedding rate with respect to the statistical properties of cover objects against the maximum achievable performance by any steganalytic detector. Toward this goal, we evaluate the maximum allowed entropy of the hidden message source subject to the maximum probability of error of the steganalytic detector which is bounded by the KL-divergence between the statistical distributions for the cover and the stego objects. We give the exact scaling constant that governs the relationship between the entropies of the hidden message and the cover object. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: Manuscript posted on 03.07.2021, 23:19 at "https://www.techrxiv.org/articles/preprint/Information-Theoretic_Limits_for_Steganography_in_Multimedia/14867241"

arXiv:2111.01136 [pdf]

ASMDD: Arabic Speech Mispronunciation Detection Dataset

Authors: Salah A. Aly, Abdelrahman Salah, Hesham M. Eraqi

Abstract: The largest dataset of Arabic speech mispronunciation detections in Egyptian dialogues is introduced. The dataset is composed of annotated audio files representing the top 100 words that are most frequently used in the Arabic language, pronounced by 100 Egyptian children (aged between 2 and 8 years old). The dataset is collected and annotated on segmental pronunciation error detections by expert l… ▽ More The largest dataset of Arabic speech mispronunciation detections in Egyptian dialogues is introduced. The dataset is composed of annotated audio files representing the top 100 words that are most frequently used in the Arabic language, pronounced by 100 Egyptian children (aged between 2 and 8 years old). The dataset is collected and annotated on segmental pronunciation error detections by expert listeners. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: 3 pages, 2 tables, 2 figures, dataset link: https://drive.google.com/drive/folders/1dhlp-L0n6_RAzoosVK4bRa7hxBnzebqs

arXiv:2110.06384 [pdf, other]

AutoNLU: Detecting, root-causing, and fixing NLU model errors

Authors: Pooja Sethi, Denis Savenkov, Forough Arabshahi, Jack Goetz, Micaela Tolliver, Nicolas Scheffer, Ilknur Kabul, Yue Liu, Ahmed Aly

Abstract: Improving the quality of Natural Language Understanding (NLU) models, and more specifically, task-oriented semantic parsing models, in production is a cumbersome task. In this work, we present a system called AutoNLU, which we designed to scale the NLU quality improvement process. It adds automation to three key steps: detection, attribution, and correction of model errors, i.e., bugs. We detected… ▽ More Improving the quality of Natural Language Understanding (NLU) models, and more specifically, task-oriented semantic parsing models, in production is a cumbersome task. In this work, we present a system called AutoNLU, which we designed to scale the NLU quality improvement process. It adds automation to three key steps: detection, attribution, and correction of model errors, i.e., bugs. We detected four times more failed tasks than with random sampling, finding that even a simple active learning sampling method on an uncalibrated model is surprisingly effective for this purpose. The AutoNLU tool empowered linguists to fix ten times more semantic parsing bugs than with prior manual processes, auto-correcting 65% of all identified bugs. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 8 pages, 5 figures

ACM Class: I.2.7

arXiv:2110.04425 [pdf, other]

Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset

Authors: Omar Mohamed, Salah A. Aly

Abstract: Recently, there have been tremendous research outcomes in the fields of speech recognition and natural language processing. This is due to the well-developed multi-layers deep learning paradigms such as wav2vec2.0, Wav2vecU, WavBERT, and HuBERT that provide better representation learning and high information capturing. Such paradigms run on hundreds of unlabeled data, then fine-tuned on a small da… ▽ More Recently, there have been tremendous research outcomes in the fields of speech recognition and natural language processing. This is due to the well-developed multi-layers deep learning paradigms such as wav2vec2.0, Wav2vecU, WavBERT, and HuBERT that provide better representation learning and high information capturing. Such paradigms run on hundreds of unlabeled data, then fine-tuned on a small dataset for specific tasks. This paper introduces a deep learning constructed emotional recognition model for Arabic speech dialogues. The developed model employs the state of the art audio representations include wav2vec2.0 and HuBERT. The experiment and performance results of our model overcome the previous known outcomes. △ Less

Submitted 8 October, 2021; originally announced October 2021.

Comments: 6 pages, 6 figures

arXiv:2107.04736 [pdf, other]

Assessing Data Efficiency in Task-Oriented Semantic Parsing

Authors: Shrey Desai, Akshat Shrivastava, Justin Rill, Brian Moran, Safiyyah Saleem, Alexander Zotov, Ahmed Aly

Abstract: Data efficiency, despite being an attractive characteristic, is often challenging to measure and optimize for in task-oriented semantic parsing; unlike exact match, it can require both model- and domain-specific setups, which have, historically, varied widely across experiments. In our work, as a step towards providing a unified solution to data-efficiency-related questions, we introduce a four-st… ▽ More Data efficiency, despite being an attractive characteristic, is often challenging to measure and optimize for in task-oriented semantic parsing; unlike exact match, it can require both model- and domain-specific setups, which have, historically, varied widely across experiments. In our work, as a step towards providing a unified solution to data-efficiency-related questions, we introduce a four-stage protocol which gives an approximate measure of how much in-domain, "target" data a parser requires to achieve a certain quality bar. Specifically, our protocol consists of (1) sampling target subsets of different cardinalities, (2) fine-tuning parsers on each subset, (3) obtaining a smooth curve relating target subset (%) vs. exact match (%), and (4) referencing the curve to mine ad-hoc (target subset, exact match) points. We apply our protocol in two real-world case studies -- model generalizability and intent complexity -- illustrating its flexibility and applicability to practitioners in task-oriented semantic parsing. △ Less

Submitted 9 July, 2021; originally announced July 2021.

arXiv:2106.11890 [pdf, other]

Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization

Authors: David Eriksson, Pierce I-Jen Chuang, Samuel Daulton, Peng Xia, Akshat Shrivastava, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, Maximilian Balandat

Abstract: When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trad… ▽ More When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook. △ Less

Submitted 25 June, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: To Appear at the 8th ICML Workshop on Automated Machine Learning, ICML 2021

arXiv:2105.13496 [pdf, other]

Diagnosing Transformers in Task-Oriented Semantic Parsing

Authors: Shrey Desai, Ahmed Aly

Abstract: Modern task-oriented semantic parsing approaches typically use seq2seq transformers to map textual utterances to semantic frames comprised of intents and slots. While these models are empirically strong, their specific strengths and weaknesses have largely remained unexplored. In this work, we study BART and XLM-R, two state-of-the-art parsers, across both monolingual and multilingual settings. Ou… ▽ More Modern task-oriented semantic parsing approaches typically use seq2seq transformers to map textual utterances to semantic frames comprised of intents and slots. While these models are empirically strong, their specific strengths and weaknesses have largely remained unexplored. In this work, we study BART and XLM-R, two state-of-the-art parsers, across both monolingual and multilingual settings. Our experiments yield several key results: transformer-based parsers struggle not only with disambiguating intents/slots, but surprisingly also with producing syntactically-valid frames. Though pre-training imbues transformers with syntactic inductive biases, we find the ambiguity of copying utterance spans into frames often leads to tree invalidity, indicating span extraction is a major bottleneck for current parsers. However, as a silver lining, we show transformer-based parsers give sufficient indicators for whether a frame is likely to be correct or incorrect, making them easier to deploy in production settings. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted to Findings of ACL 2021

arXiv:2104.07275 [pdf, other]

Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing

Authors: Akshat Shrivastava, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, Ahmed Aly

Abstract: An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance $x$, predicting a frame's length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottlenecked by length prediction, as even small inaccuracies change the… ▽ More An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance $x$, predicting a frame's length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottlenecked by length prediction, as even small inaccuracies change the syntactic and semantic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive parsers which shift the decoding task from text generation to span prediction; that is, when imputing utterance spans into frame slots, our model produces endpoints (e.g., [i, j]) as opposed to text (e.g., "6pm"). This natural quantization of the output space reduces the variability of gold frames, therefore improving length prediction and, ultimately, exact match. Furthermore, length prediction is now responsible for frame syntax and the decoder is responsible for frame semantics, resulting in a coarse-to-fine model. We evaluate our approach on several task-oriented semantic parsing datasets. Notably, we bridge the quality gap between non-autogressive and autoregressive parsers, achieving 87 EM on TOPv2 (Chen et al. 2020). Furthermore, due to our more consistent gold frames, we show strong improvements in model generalization in both cross-domain and cross-lingual transfer in low-resource settings. Finally, due to our diminished output vocabulary, we observe 70% reduction in latency and 83% reduction in memory at beam size 5 compared to prior non-autoregressive parsers. △ Less

Submitted 14 September, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

arXiv:2104.07224 [pdf, other]

Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling

Authors: Shrey Desai, Akshat Shrivastava, Alexander Zotov, Ahmed Aly

Abstract: Task-oriented semantic parsing models typically have high resource requirements: to support new ontologies (i.e., intents and slots), practitioners crowdsource thousands of samples for supervised fine-tuning. Partly, this is due to the structure of de facto copy-generate parsers; these models treat ontology labels as discrete entities, relying on parallel data to extrinsically derive their meaning… ▽ More Task-oriented semantic parsing models typically have high resource requirements: to support new ontologies (i.e., intents and slots), practitioners crowdsource thousands of samples for supervised fine-tuning. Partly, this is due to the structure of de facto copy-generate parsers; these models treat ontology labels as discrete entities, relying on parallel data to extrinsically derive their meaning. In our work, we instead exploit what we intrinsically know about ontology labels; for example, the fact that SL:TIME_ZONE has the categorical type "slot" and language-based span "time zone". Using this motivation, we build our approach with offline and online stages. During preprocessing, for each ontology label, we extract its intrinsic properties into a component, and insert each component into an inventory as a cache of sorts. During training, we fine-tune a seq2seq, pre-trained transformer to map utterances and inventories to frames, parse trees comprised of utterance and ontology tokens. Our formulation encourages the model to consider ontology labels as a union of its intrinsic properties, therefore substantially bootstrap** learning in low-resource settings. Experiments show our model is highly sample efficient: using a low-resource benchmark derived from TOPv2, our inventory parser outperforms a copy-generate parser by +15 EM absolute (44% relative) when fine-tuning on 10 samples from an unseen domain. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2104.04923 [pdf, other]

Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog

Authors: Arun Babu, Akshat Shrivastava, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan Ghazvininejad

Abstract: Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic pars… ▽ More Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic parse trees with an efficient seq2seq model architecture. By combining non-autoregressive prediction with convolutional neural networks, we achieve significant latency gains and parameter size reduction compared to traditional RNN models. Our novel architecture achieves up to an 81% reduction in latency on TOP dataset and retains competitive performance to non-pretrained models on three different semantic parsing datasets. Our code is available at https://github.com/facebookresearch/pytext △ Less

Submitted 11 April, 2021; originally announced April 2021.

arXiv:2103.10471 [pdf, ps, other]

Stationary underdispersed INAR(1) models based on the backward approach

Authors: Emad-Eldin AA Aly, Nadjib Bouzar

Abstract: Most of the stationary first-order autoregressive integer-valued (INAR(1)) models were developed for a given thinning operator using either the forward approach or the backward approach. In the forward approach the marginal distribution of the time series is specified and an appropriate distribution for the innovation sequence is sought. Whereas in the backward setting, the roles are reversed. The… ▽ More Most of the stationary first-order autoregressive integer-valued (INAR(1)) models were developed for a given thinning operator using either the forward approach or the backward approach. In the forward approach the marginal distribution of the time series is specified and an appropriate distribution for the innovation sequence is sought. Whereas in the backward setting, the roles are reversed. The common distribution of the innovation sequence is specified and the distributional properties of the marginal distribution of the time series are studied. In this article we focus on the backward approach in presence of the Binomial thinning operator. We establish a number of theoretical results which we proceed to use to develop stationary INAR(1) models with finite mean. We illustrate our results by presenting some new INAR(1) models that show underdispersion. △ Less

Submitted 18 March, 2021; originally announced March 2021.

MSC Class: 62M10 (Primary) 60E99 (Secondary)

arXiv:2003.03528 [pdf]

Exploratory Study: Children's with Autism Awareness of being Imitated by Nao Robot

Authors: Andreea Peca, Adriana Tapus, Amir Aly, Cristina Pop, Lavinia Jisa, Sebastian Pintea, Alina Rusu, Daniel David

Abstract: This paper presents an exploratory study designed for children with Autism Spectrum Disorders (ASD) that investigates children's awareness of being imitated by a robot in a play/game scenario. The Nao robot imitates all the arm movement behaviors of the child in real-time in dyadic and triadic interactions. Different behavioral criteria (i.e., eye gaze, gaze shifting, initiation and imitation of a… ▽ More This paper presents an exploratory study designed for children with Autism Spectrum Disorders (ASD) that investigates children's awareness of being imitated by a robot in a play/game scenario. The Nao robot imitates all the arm movement behaviors of the child in real-time in dyadic and triadic interactions. Different behavioral criteria (i.e., eye gaze, gaze shifting, initiation and imitation of arm movements, smile/laughter) were analyzed based on the video data of the interaction. The results confirm only parts of the research hypothesis. However, these results are promising for the future directions of this work. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Comments: Proceedings of the 1st International Conference on Innovative Technologies for Autism Spectrum Disorders. ASD: Tools, Trends and Testimonials (ITASD), Spain, 2012

arXiv:2002.12360 [pdf]

Social Engagement of Children with Autism during Interaction with a Robot

Authors: Adriana Tapus, Andreea Peca, Amir Aly, Cristina Pop, Lavinia Jisa, Sebastian Pintea, Alina Rusu, Daniel David

Abstract: Imitation plays an important role in development, being one of the precursors of social cognition. Even though some children with autism imitate spontaneously and other children with autism can learn to imitate, the dynamics of imitation is affected in the large majority of cases. Existing studies from the literature suggest that robots can be used to teach children with autism basic interaction s… ▽ More Imitation plays an important role in development, being one of the precursors of social cognition. Even though some children with autism imitate spontaneously and other children with autism can learn to imitate, the dynamics of imitation is affected in the large majority of cases. Existing studies from the literature suggest that robots can be used to teach children with autism basic interaction skills like imitation. Based on these findings, in this study, we investigate if children with autism show more social engagement when interacting with an imitative robot (Fig 1) compared to a human partner in a motor imitation task. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: Proceedings of the 2nd International Conference on Innovative Research in Autism (IRIA), France, 2012

arXiv:2002.01779 [pdf]

Human Posture Recognition and Gesture Imitation with a Humanoid Robot

Authors: Amir Aly

Abstract: This study proposes different approaches for static and dynamic gesture analysis and imitation with the social robot Nao This study proposes different approaches for static and dynamic gesture analysis and imitation with the social robot Nao △ Less

Submitted 21 March, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

Comments: University of Paris 6 (UPMC), University of Sorbonne, France

MSC Class: 14J60 (Robotics) ACM Class: F.2.2

arXiv:2002.01535 [pdf, ps, other]

Lightweight Convolutional Representations for On-Device Natural Language Processing

Authors: Shrey Desai, Geoffrey Goh, Arun Babu, Ahmed Aly

Abstract: The increasing computational and memory complexities of deep neural networks have made it difficult to deploy them on low-resource electronic devices (e.g., mobile phones, tablets, wearables). Practitioners have developed numerous model compression methods to address these concerns, but few have condensed input representations themselves. In this work, we propose a fast, accurate, and lightweight… ▽ More The increasing computational and memory complexities of deep neural networks have made it difficult to deploy them on low-resource electronic devices (e.g., mobile phones, tablets, wearables). Practitioners have developed numerous model compression methods to address these concerns, but few have condensed input representations themselves. In this work, we propose a fast, accurate, and lightweight convolutional representation that can be swapped into any neural model and compressed significantly (up to 32x) with a negligible reduction in performance. In addition, we show gains over recurrent representations when considering resource-centric metrics (e.g., model file size, latency, memory usage) on a Samsung Galaxy S9. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: Accepted to MLSys 2020

arXiv:2001.02048 [pdf, other]

doi 10.1007/s11042-021-10575-y

Flexible Architecture for Real-time Processing of Multiple Video Signals

Authors: Mohamed Awad, Islam T. Abougindia, Ahmed Elliethy, Hussein A. Aly

Abstract: Simultaneous processing of multiple video sources requires each pixel in a frame from a video source to be processed synchronously with the pixels at the same spatial positions in corresponding frames from the other video sources. However, simultaneous processing is challenging as corresponding frames from different video signals provided by multiple sources have time-varying delay because of the… ▽ More Simultaneous processing of multiple video sources requires each pixel in a frame from a video source to be processed synchronously with the pixels at the same spatial positions in corresponding frames from the other video sources. However, simultaneous processing is challenging as corresponding frames from different video signals provided by multiple sources have time-varying delay because of the electrical and mechanical restrictions inside the video sources hardware that cause deviation in the corresponding frame rates. Researchers overcome the aforementioned challenges either by utilizing ready-made video processing systems or designing and implementing a custom system tailored to their specific application. These video processing systems lack flexibility in handling different applications requirements such as the required number of video sources and outputs, video standards, or frame rates of the input/output videos. In this paper, we present a design for a flexible simultaneous video processing architecture that is suitable for various applications. The proposed architecture is upgradeable to deal with multiple video standards, scalable to process/produce a variable number of input/output videos, and compatible with most video processors. Moreover, we present in details the analog/digital mixed-signals and power distribution considerations used in designing the proposed architecture. As a case study application of the proposed flexible architecture, we utilized the architecture for a realization of a simultaneous video processing system that performs video fusion from visible and near-infrared video sources in real time. We make available the source files of the hardware design along with the bill of material (BOM) of the case study to be a reference for researchers who intend to design and implement simultaneous multi-video processing systems. △ Less

Submitted 29 December, 2019; originally announced January 2020.

Comments: 13 pages, 16 figures, 3 tables

Journal ref: Springer Multimedia Tools and Applications (2021)

arXiv:1910.12708 [pdf, other]

Evaluating Lottery Tickets Under Distributional Shifts

Authors: Shrey Desai, Hongyuan Zhan, Ahmed Aly

Abstract: The Lottery Ticket Hypothesis suggests large, over-parameterized neural networks consist of small, sparse subnetworks that can be trained in isolation to reach a similar (or better) test accuracy. However, the initialization and generalizability of the obtained sparse subnetworks have been recently called into question. Our work focuses on evaluating the initialization of sparse subnetworks under… ▽ More The Lottery Ticket Hypothesis suggests large, over-parameterized neural networks consist of small, sparse subnetworks that can be trained in isolation to reach a similar (or better) test accuracy. However, the initialization and generalizability of the obtained sparse subnetworks have been recently called into question. Our work focuses on evaluating the initialization of sparse subnetworks under distributional shifts. Specifically, we investigate the extent to which a sparse subnetwork obtained in a source domain can be re-trained in isolation in a dissimilar, target domain. In addition, we examine the effects of different initialization strategies at transfer-time. Our experiments show that sparse subnetworks obtained through lottery ticket training do not simply overfit to particular domains, but rather reflect an inductive bias of deep neural networks that can be exploited in multiple domains. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: Accepted to EMNLP 2019 Workshop on Deep Learning for Low-Resource NLP

arXiv:1902.02371 [pdf, other]

Diffeomorphic Medial Modeling

Authors: Paul A. Yushkevich, Ahmed Aly, Jiancong Wang, Long Xie, Robert C. Gorman, Laurent Younes, Alison Pouch

Abstract: Deformable shape modeling approaches that describe objects in terms of their medial axis geometry (e.g., m-reps [Pizer et al., 2003]) yield rich geometrical features that can be useful for analyzing the shape of sheet-like biological structures, such as the myocardium. We present a novel shape analysis approach that combines the benefits of medial shape modeling and diffeomorphometry. Our algorith… ▽ More Deformable shape modeling approaches that describe objects in terms of their medial axis geometry (e.g., m-reps [Pizer et al., 2003]) yield rich geometrical features that can be useful for analyzing the shape of sheet-like biological structures, such as the myocardium. We present a novel shape analysis approach that combines the benefits of medial shape modeling and diffeomorphometry. Our algorithm is formulated as a problem of matching shapes using diffeomorphic flows under constraints that approximately preserve medial axis geometry during deformation. As the result, correspondence between the medial axes of similar shapes is maintained. The approach is evaluated in the context of modeling the shape of the left ventricular wall from 3D echocardiography images. △ Less

Submitted 28 February, 2019; v1 submitted 6 February, 2019; originally announced February 2019.

Comments: Accepted to the 26th International Conference on Information Processing in Medical Imaging (IPMI 2019)

arXiv:1901.05988 [pdf, other]

Optimizing Deep Neural Networks with Multiple Search Neuroevolution

Authors: Ahmed Aly, David Weikersdorfer, Claire Delaunay

Abstract: This paper presents an evolutionary metaheuristic called Multiple Search Neuroevolution (MSN) to optimize deep neural networks. The algorithm attempts to search multiple promising regions in the search space simultaneously, maintaining sufficient distance between them. It is tested by training neural networks for two tasks, and compared with other optimization algorithms. The first task is to solv… ▽ More This paper presents an evolutionary metaheuristic called Multiple Search Neuroevolution (MSN) to optimize deep neural networks. The algorithm attempts to search multiple promising regions in the search space simultaneously, maintaining sufficient distance between them. It is tested by training neural networks for two tasks, and compared with other optimization algorithms. The first task is to solve Global Optimization functions with challenging topographies. We found to MSN to outperform classic optimization algorithms such as Evolution Strategies, reducing the number of optimization steps performed by at least 2X. The second task is to train a convolutional neural network (CNN) on the popular MNIST dataset. Using 3.33% of the training set, MSN reaches a validation accuracy of 90%. Stochastic Gradient Descent (SGD) was able to match the same accuracy figure, while taking 7X less optimization steps. Despite lagging, the fact that the MSN metaheurisitc trains a 4.7M-parameter CNN suggests promise for future development. This is by far the largest network ever evolved using a pool of only 50 samples. △ Less

Submitted 17 January, 2019; originally announced January 2019.

Comments: Submitted to IEEE CEC2019

arXiv:1812.08729 [pdf, other]

PyText: A Seamless Path from NLP research to production

Authors: Ahmed Aly, Kushal Lakhotia, Shicong Zhao, Mrinal Mohit, Barlas Oguz, Abhinav Arora, Sonal Gupta, Christopher Dewan, Stef Nelson-Lindall, Rushin Shah

Abstract: We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch's capabilities of exporting models for inference via the optimized Caffe2 execution engine.… ▽ More We introduce PyText - a deep learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces for model components, and by using PyTorch's capabilities of exporting models for inference via the optimized Caffe2 execution engine. We report our own experience of migrating experimentation and production workflows to PyText, which enabled us to iterate faster on novel modeling ideas and then seamlessly ship them at industrial scale. △ Less

Submitted 12 December, 2018; originally announced December 2018.

arXiv:1812.04840 [pdf]

Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction

Authors: Amir Aly, Tadahiro Taniguchi

Abstract: Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is inferring the latent grammatical structure of language, which includes grounding parts of speech (e.g., verbs, nouns, adjectives, and prepositions) through visual… ▽ More Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is inferring the latent grammatical structure of language, which includes grounding parts of speech (e.g., verbs, nouns, adjectives, and prepositions) through visual perception, and induction of Combinatory Categorial Grammar (CCG) for phrases. This paves the way towards grounding phrases so as to make a robot able to understand human instructions appropriately during interaction. △ Less

Submitted 13 March, 2020; v1 submitted 12 December, 2018; originally announced December 2018.

Comments: Proceedings of the International Conference on Social Cognition in Humans and Robots (socSMCs), Germany, 2018

arXiv:1808.05730 [pdf, other]

Efficient Single-Shot Multibox Detector for Construction Site Monitoring

Authors: Viral Thakar, Himani Saini, Walid Ahmed, Mohammad M Soltani, Ahmed Aly, Jia Yuan Yu

Abstract: Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring… ▽ More Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering --- APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge. △ Less

Submitted 19 August, 2018; v1 submitted 16 August, 2018; originally announced August 2018.

Comments: 6 pages, 4 figures, to appear in the Proceedings of the ISC2 2018, 16-19 September 2018, Kansas, USA

arXiv:1808.05525 [pdf, other]

Experiential Robot Learning with Accelerated Neuroevolution

Authors: Ahmed Aly, Joanne B. Dugan

Abstract: Derivative-based optimization techniques such as Stochastic Gradient Descent has been wildly successful in training deep neural networks. However, it has constraints such as end-to-end network differentiability. As an alternative, we present the Accelerated Neuroevolution algorithm. The new algorithm is aimed towards physical robotic learning tasks following the Experiential Robot Learning method.… ▽ More Derivative-based optimization techniques such as Stochastic Gradient Descent has been wildly successful in training deep neural networks. However, it has constraints such as end-to-end network differentiability. As an alternative, we present the Accelerated Neuroevolution algorithm. The new algorithm is aimed towards physical robotic learning tasks following the Experiential Robot Learning method. We test our algorithm first on a simulated task of playing the game Flappy Bird, then on a physical NAO robot in a static Object Centering task. The agents successfully navigate the given tasks, in a relatively low number of generations. Based on our results, we propose to use the algorithm in more complex tasks. △ Less

Submitted 16 August, 2018; originally announced August 2018.

arXiv:1801.08354 [pdf, other]

Secure and Privacy-Friendly Local Electricity Trading and Billing in Smart Grid

Authors: Aysajan Abidin, Abdelrahaman Aly, Sara Cleemput, Mustafa A. Mustafa

Abstract: This paper proposes two decentralised, secure and privacy-friendly protocols for local electricity trading and billing, respectively. The trading protocol employs a bidding algorithm based upon secure multiparty computations and allows users to trade their excess electricity among themselves. The bid selection and calculation of the trading price are performed in a decentralised and oblivious mann… ▽ More This paper proposes two decentralised, secure and privacy-friendly protocols for local electricity trading and billing, respectively. The trading protocol employs a bidding algorithm based upon secure multiparty computations and allows users to trade their excess electricity among themselves. The bid selection and calculation of the trading price are performed in a decentralised and oblivious manner. The billing protocol is based on a simple privacy-friendly aggregation technique that allows suppliers to compute their customers' monthly bills without learning their fine-grained electricity consumption data. We also implemented and tested the performance of the trading protocol with realistic data. Our results show that it can be performed for 2500 bids in less than five minutes in the on-line phase, showing its feasibility for a typical electricity trading period of 30 minutes. △ Less

Submitted 25 January, 2018; originally announced January 2018.

arXiv:1801.08353 [pdf, other]

A Secure and Privacy-preserving Protocol for Smart Metering Operational Data Collection

Authors: Mustafa A. Mustafa, Sara Cleemput, Abelrahaman Aly, Aysajan Abidin

Abstract: In this paper we propose a novel protocol that allows suppliers and grid operators to collect users' aggregate metering data in a secure and privacy-preserving manner. We use secure multiparty computation to ensure privacy protection. In addition, we propose three different data aggregation algorithms that offer different balances between privacy-protection and performance. Our protocol is designe… ▽ More In this paper we propose a novel protocol that allows suppliers and grid operators to collect users' aggregate metering data in a secure and privacy-preserving manner. We use secure multiparty computation to ensure privacy protection. In addition, we propose three different data aggregation algorithms that offer different balances between privacy-protection and performance. Our protocol is designed for a realistic scenario in which the data need to be sent to different parties, such as grid operators and suppliers. Furthermore, it facilitates an accurate calculation of transmission, distribution and grid balancing fees in a privacy-preserving manner. We also present a security analysis and a performance evaluation of our protocol based on well known multiparty computation algorithms implemented in C++. △ Less

Submitted 14 March, 2019; v1 submitted 25 January, 2018; originally announced January 2018.

Comments: Accepted for publication at IEEE Transactions on Smart Grid

arXiv:1709.03129 [pdf, ps, other]

Expectation thinning operators based on linear fractional probability generating functions

Authors: Emad-Eldin A. A. Aly, Nadjib Bouzar

Abstract: We introduce a two-parameter expectation thinning operator based on a linear fractional probability generating function. The operator is then used to define a first-order integer-valued autoregressive \inar1 process. Distributional properties of the \inar1 process are described. We revisit the Bernoulli-geometric \inar1 process of Bourguignon and Weiß (2017) and we introduce a new stationary \inar… ▽ More We introduce a two-parameter expectation thinning operator based on a linear fractional probability generating function. The operator is then used to define a first-order integer-valued autoregressive \inar1 process. Distributional properties of the \inar1 process are described. We revisit the Bernoulli-geometric \inar1 process of Bourguignon and Weiß (2017) and we introduce a new stationary \inar1 process with a compound negative binomial distribution. Lastly, we show how a proper randomization of our operator leads to a generalized notion of monotonicity for distributions on \bzp. △ Less

Submitted 10 September, 2017; originally announced September 2017.

MSC Class: 60E05 60E10 62M10

Journal ref: Journal of the Indian Society for Probability and Statistics 20 (2019), 89-107

arXiv:1709.02533 [pdf, other]

Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

Authors: Ahmed R. Mahmood, Anas Daghistani, Ahmed M. Aly, Walid G. Aref, Mingjie Tang, Saleh Basalamah, Sunil Prabhakar

Abstract: The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of a… ▽ More The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput. △ Less

Submitted 8 September, 2017; originally announced September 2017.

arXiv:1709.02529 [pdf, other]

FAST: Frequency-Aware Spatio-Textual Indexing for In-Memory Continuous Filter Query Processing

Authors: Ahmed R. Mahmood, Ahmed M. Aly, Walid G. Aref

Abstract: Many applications need to process massive streams of spatio-textual data in real-time against continuous spatio-textual queries. For example, in location-aware ad targeting publish/subscribe systems, it is required to disseminate millions of ads and promotions to millions of users based on the locations and textual profiles of users. In this paper, we study indexing of continuous spatio-textual qu… ▽ More Many applications need to process massive streams of spatio-textual data in real-time against continuous spatio-textual queries. For example, in location-aware ad targeting publish/subscribe systems, it is required to disseminate millions of ads and promotions to millions of users based on the locations and textual profiles of users. In this paper, we study indexing of continuous spatio-textual queries. There exist several related spatio-textual indexes that typically integrate a spatial index with a textual index. However, these indexes usually have a high demand for main-memory and assume that the entire vocabulary of keywords is known in advance. Also, these indexes do not successfully capture the variations in the frequencies of keywords across different spatial regions and treat frequent and infrequent keywords in the same way. Moreover, existing indexes do not adapt to the changes in workload over space and time. For example, some keywords may be trending at certain times in certain locations and this may change as time passes. This affects the indexing and searching performance of existing indexes significantly. In this paper, we introduce FAST, a Frequency-Aware Spatio-Textual index for continuous spatio-textual queries. FAST is a main-memory index that requires up to one third of the memory needed by the state-of-the-art index. FAST does not assume prior knowledge of the entire vocabulary of indexed objects. FAST adaptively accounts for the difference in the frequencies of keywords within their corresponding spatial regions to automatically choose the best indexing approach that optimizes the insertion and search times. Extensive experimental evaluation using real and synthetic datasets demonstrates that FAST is up to 3x faster in search time and 5x faster in insertion time than the state-of-the-art indexes. △ Less

Submitted 4 October, 2017; v1 submitted 8 September, 2017; originally announced September 2017.

arXiv:1601.02475 [pdf, ps, other]

doi 10.1007/s10714-015-2011-4

Angular diameter distances reconsidered in the Newman and Penrose formalism

Authors: Thomas P. Kling, Aly Aly

Abstract: Using the Newman and Penrose spin coefficient (NP) formalism, we provide a derivation of the Dyer-Roeder equation for the angular diameter distance in cosmological space-times. We show that the geodesic deviation equation written in NP formalism is precisely the Dyer-Roeder equation for a general Friedman-Robertson-Walker (FRW) space-time, and then we examine the angular diameter distance to redsh… ▽ More Using the Newman and Penrose spin coefficient (NP) formalism, we provide a derivation of the Dyer-Roeder equation for the angular diameter distance in cosmological space-times. We show that the geodesic deviation equation written in NP formalism is precisely the Dyer-Roeder equation for a general Friedman-Robertson-Walker (FRW) space-time, and then we examine the angular diameter distance to redshift relation in the case that a flat FRW metric is perturbed by a gravitational potential. We examine the perturbation in the case that the gravitational potential exhibits the properties of a thin gravitational lens, demonstrating how the weak lensing shear and convergence act as source terms for the perturbed Dyer-Roeder equation. △ Less

Submitted 11 January, 2016; originally announced January 2016.

Comments: 21 pages, 6 figures, accepted to GRG

arXiv:1508.04921 [pdf, ps, other]

Robust Node Estimation and Topology Discovery Algorithm in Large-Scale Wireless Sensor Networks

Authors: Ahmed Douik, Salah A. Aly, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini

Abstract: This paper introduces a novel algorithm for cardinality, i.e., the number of nodes, estimation in large scale anonymous graphs using statistical inference methods. Applications of this work include estimating the number of sensor devices, online social users, active protein cells, etc. In anonymous graphs, each node possesses little or non-existing information on the network topology. In particula… ▽ More This paper introduces a novel algorithm for cardinality, i.e., the number of nodes, estimation in large scale anonymous graphs using statistical inference methods. Applications of this work include estimating the number of sensor devices, online social users, active protein cells, etc. In anonymous graphs, each node possesses little or non-existing information on the network topology. In particular, this paper assumes that each node only knows its unique identifier. The aim is to estimate the cardinality of the graph and the neighbours of each node by querying a small portion of them. While the former allows the design of more efficient coding schemes for the network, the second provides a reliable way for routing packets. As a reference for comparison, this work considers the Best Linear Unbiased Estimators (BLUE). For dense graphs and specific running times, the proposed algorithm produces a cardinality estimate proportional to the BLUE. Furthermore, for an arbitrary number of iterations, the estimate converges to the BLUE as the number of queried nodes tends to the total number of nodes in the network. Simulation results confirm the theoretical results by revealing that, for a moderate running time, asking a small group of nodes is sufficient to perform an estimation of 95% of the whole network. △ Less

Submitted 20 August, 2015; originally announced August 2015.

arXiv:1506.01903

Analytical Solution of The Two-Qubit Quantum Rabi Model

Authors: Doaa A. M. Abo-Kahla, Salah A. Aly, Mahmoud Abdel-Aty

Abstract: In this paper, an analytical solution of the two-qubit Rabi model for the general case is presented. Furthermore, a comparison between the information entropies and the Von Neumann entropy $(ρ_{A})$ is given for some special values of the qubit-photon coupling constants in case of the detuning parameters. It is demonstrated that oscillations of the occupation probabilities… ▽ More In this paper, an analytical solution of the two-qubit Rabi model for the general case is presented. Furthermore, a comparison between the information entropies and the Von Neumann entropy $(ρ_{A})$ is given for some special values of the qubit-photon coupling constants in case of the detuning parameters. It is demonstrated that oscillations of the occupation probabilities $ρ_{11}, ρ_{22}, ρ_{33}$ and $ρ_{44}$ are equivalent to the case of the spontaneous emission. The occupation probability $ρ_{11}$ reaches the case of sudden death, when the detuning parameters $Δ_{2}$ equals zero. △ Less

Submitted 9 June, 2015; v1 submitted 5 June, 2015; originally announced June 2015.

Comments: due to crucial error in the transformation

arXiv:1209.3433 [pdf, ps, other]

A Hajj And Umrah Location Classification System For Video Crowded Scenes

Authors: Hossam M. Zawbaa, Salah A. Aly, Adnan A. Gutub

Abstract: In this paper, a new automatic system for classifying ritual locations in diverse Hajj and Umrah video scenes is investigated. This challenging subject has mostly been ignored in the past due to several problems one of which is the lack of realistic annotated video datasets. HUER Dataset is defined to model six different Hajj and Umrah ritual locations[26]. The proposed Hajj and Umrah ritual loc… ▽ More In this paper, a new automatic system for classifying ritual locations in diverse Hajj and Umrah video scenes is investigated. This challenging subject has mostly been ignored in the past due to several problems one of which is the lack of realistic annotated video datasets. HUER Dataset is defined to model six different Hajj and Umrah ritual locations[26]. The proposed Hajj and Umrah ritual location classifying system consists of four main phases: Preprocessing, segmentation, feature extraction, and location classification phases. The shot boundary detection and background/foregroud segmentation algorithms are applied to prepare the input video scenes into the KNN, ANN, and SVM classifiers. The system improves the state of art results on Hajj and Umrah location classifications, and successfully recognizes the six Hajj rituals with more than 90% accuracy. The various demonstrated experiments show the promising results. △ Less

Submitted 15 September, 2012; originally announced September 2012.

Comments: 9 pages, 10 figures, 2 tables, 3 algirthms

arXiv:1208.5365 [pdf, ps, other]

A Missing and Found Recognition System for Hajj and Umrah

Authors: Salah A. Aly

Abstract: This note describes an integrated recognition system for identifying missing and found objects as well as missing, dead, and found people during Hajj and Umrah seasons in the two Holy cities of Makkah and Madina in the Kingdom of Saudi Arabia. It is assumed that the total estimated number of pilgrims will reach 20 millions during the next decade. The ultimate goal of this system is to integrate fa… ▽ More This note describes an integrated recognition system for identifying missing and found objects as well as missing, dead, and found people during Hajj and Umrah seasons in the two Holy cities of Makkah and Madina in the Kingdom of Saudi Arabia. It is assumed that the total estimated number of pilgrims will reach 20 millions during the next decade. The ultimate goal of this system is to integrate facial recognition and object identification solutions into the Hajj and Umrah rituals. The missing and found computerized system is part of the CrowdSensing system for Hajj and Umrah crowd estimation, management and safety. △ Less

Submitted 27 August, 2012; originally announced August 2012.

Comments: website available via http://www.mfhajj.com

arXiv:1208.0074 [pdf, other]

Spatial Queries with Two kNN Predicates

Authors: Ahmed M. Aly, Walid G. Aref, Mourad Ouzzani

Abstract: The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimizat… ▽ More The widespread use of location-aware devices has led to countless location-based services in which a user query can be arbitrarily complex, i.e., one that embeds multiple spatial selection and join predicates. Amongst these predicates, the k-Nearest-Neighbor (kNN) predicate stands as one of the most important and widely used predicates. Unlike related research, this paper goes beyond the optimization of queries with single kNN predicates, and shows how queries with two kNN predicates can be optimized. In particular, the paper addresses the optimization of queries with: (i) two kNN-select predicates, (ii) two kNN-join predicates, and (iii) one kNN-join predicate and one kNN-select predicate. For each type of queries, conceptually correct query evaluation plans (QEPs) and new algorithms that optimize the query execution time are presented. Experimental results demonstrate that the proposed algorithms outperform the conceptually correct QEPs by orders of magnitude. △ Less

Submitted 31 July, 2012; originally announced August 2012.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 11, pp. 1100-1111 (2012)

arXiv:1205.4463 [pdf, ps, other]

Pilgrims Face Recognition Dataset -- HUFRD

Authors: Salah A. Aly

Abstract: In this work, we define a new pilgrims face recognition dataset, called HUFRD dataset. The new developed dataset presents various pilgrims' images taken from outside the Holy Masjid El-Harram in Makkah during the 2011-2012 Hajj and Umrah seasons. Such dataset will be used to test our developed facial recognition and detection algorithms, as well as assess in the missing and found recognition syste… ▽ More In this work, we define a new pilgrims face recognition dataset, called HUFRD dataset. The new developed dataset presents various pilgrims' images taken from outside the Holy Masjid El-Harram in Makkah during the 2011-2012 Hajj and Umrah seasons. Such dataset will be used to test our developed facial recognition and detection algorithms, as well as assess in the missing and found recognition system \cite{crowdsensing}. △ Less

Submitted 29 December, 2012; v1 submitted 20 May, 2012; originally announced May 2012.

Comments: 5 pages, 13 images, 1 table of a new HUFRD work

arXiv:1205.2345 [pdf, ps, other]

Hajj and Umrah Event Recognition Datasets

Authors: Hossam Zawbaa, Salah A. Aly

Abstract: In this note, new Hajj and Umrah Event Recognition datasets (HUER) are presented. The demonstrated datasets are based on videos and images taken during 2011-2012 Hajj and Umrah seasons. HUER is the first collection of datasets covering the six types of Hajj and Umrah ritual events (rotating in Tawaf around Kabaa, performing Sa'y between Safa and Marwa, standing on the mount of Arafat, staying over… ▽ More In this note, new Hajj and Umrah Event Recognition datasets (HUER) are presented. The demonstrated datasets are based on videos and images taken during 2011-2012 Hajj and Umrah seasons. HUER is the first collection of datasets covering the six types of Hajj and Umrah ritual events (rotating in Tawaf around Kabaa, performing Sa'y between Safa and Marwa, standing on the mount of Arafat, staying overnight in Muzdalifah, staying two or three days in Mina, and throwing Jamarat). The HUER datasets also contain video and image databases for nine types of human actions during Hajj and Umrah (walking, drinking from Zamzam water, slee**, smiling, eating, praying, sitting, shaving hairs and ablutions, reading the holy Quran and making duaa). The spatial resolutions are 1280 x 720 pixels for images and 640 x 480 pixels for videos and have lengths of 20 seconds in average with 30 frame per second rates. △ Less

Submitted 10 May, 2012; originally announced May 2012.

Comments: 4 pages, 18 figures with 33 images

arXiv:1205.2077 [pdf, ps, other]

Data Dissemination And Collection Algorithms For Collaborative Sensor Networks Using Dynamic Cluster Heads

Authors: Salah A. Aly, Mohamed Salim

Abstract: We develop novel data dissemination and collection algorithms for Wireless Sensor Networks (WSNs) in which we consider $n$ sensor nodes distributed randomly in a certain field to measure a physical phenomena. Such sensors have limited energy, shortage coverage range, bandwidth and memory constraints. We desire to disseminate nodes' data throughout the network such that a base station will be able… ▽ More We develop novel data dissemination and collection algorithms for Wireless Sensor Networks (WSNs) in which we consider $n$ sensor nodes distributed randomly in a certain field to measure a physical phenomena. Such sensors have limited energy, shortage coverage range, bandwidth and memory constraints. We desire to disseminate nodes' data throughout the network such that a base station will be able to collect the sensed data by querying a small number of nodes. We propose two data dissemination and collection algorithms (DCA's) to solve this problem. Data dissemination is achieved through dynamical selection of some nodes. The selected nodes will be changed after a time slot $t$ and may be repeated after a period $T$. △ Less

Submitted 9 May, 2012; originally announced May 2012.

Comments: 6 pages, 5 figures

arXiv:1202.2449 [pdf, ps, other]

Efficient Web-based Facial Recognition System Employing 2DHOG

Authors: Moataz M. Abdelwahab, Salah A. Aly, Islam Yousry

Abstract: In this paper, a system for facial recognition to identify missing and found people in Hajj and Umrah is described as a web portal. Explicitly, we present a novel algorithm for recognition and classifications of facial images based on applying 2DPCA to a 2D representation of the Histogram of oriented gradients (2D-HOG) which maintains the spatial relation between pixels of the input images. This a… ▽ More In this paper, a system for facial recognition to identify missing and found people in Hajj and Umrah is described as a web portal. Explicitly, we present a novel algorithm for recognition and classifications of facial images based on applying 2DPCA to a 2D representation of the Histogram of oriented gradients (2D-HOG) which maintains the spatial relation between pixels of the input images. This algorithm allows a compact representation of the images which reduces the computational complexity and the storage requirments, while maintaining the highest reported recognition accuracy. This promotes this method for usage with very large datasets. Large dataset was collected for people in Hajj. Experimental results employing ORL, UMIST, JAFFE, and HAJJ datasets confirm these excellent properties. △ Less

Submitted 11 February, 2012; originally announced February 2012.

Showing 1–50 of 90 results for author: Aly, A