-
Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN
Authors:
Oluwaleke Yusuf,
Maki Habib,
Mohamed Moustafa
Abstract:
This study focuses on Hand Gesture Recognition (HGR), which is vital for perceptual computing across various real-world contexts. The primary challenge in the HGR domain lies in dealing with the individual variations inherent in human hand morphology. To tackle this challenge, we introduce an innovative HGR framework that combines data-level fusion and an Ensemble Tuner Multi-stream CNN architectu…
▽ More
This study focuses on Hand Gesture Recognition (HGR), which is vital for perceptual computing across various real-world contexts. The primary challenge in the HGR domain lies in dealing with the individual variations inherent in human hand morphology. To tackle this challenge, we introduce an innovative HGR framework that combines data-level fusion and an Ensemble Tuner Multi-stream CNN architecture. This approach effectively encodes spatiotemporal gesture information from the skeleton modality into RGB images, thereby minimizing noise while improving semantic gesture comprehension. Our framework operates in real-time, significantly reducing hardware requirements and computational complexity while maintaining competitive performance on benchmark datasets such as SHREC2017, DHG1428, FPHA, LMDHG and CNR. This improvement in HGR demonstrates robustness and paves the way for practical, real-time applications that leverage resource-limited devices for human-machine interaction and ambient intelligence.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses
Authors:
Mahmoud Ahmed,
Omer Moussa,
Ismail Shaheen,
Mohamed Abdelfattah,
Amr Abdalla,
Marwan Eid,
Hesham Eraqi,
Mohamed Moustafa
Abstract:
One of the major challenges in training deep neural networks for text-to-image generation is the significant linguistic discrepancy between ground-truth captions of each image in most popular datasets. The large difference in the choice of words in such captions results in synthesizing images that are semantically dissimilar to each other and to their ground-truth counterparts. Moreover, existing…
▽ More
One of the major challenges in training deep neural networks for text-to-image generation is the significant linguistic discrepancy between ground-truth captions of each image in most popular datasets. The large difference in the choice of words in such captions results in synthesizing images that are semantically dissimilar to each other and to their ground-truth counterparts. Moreover, existing models either fail to generate the fine-grained details of the image or require a huge number of parameters that renders them inefficient for text-to-image synthesis. To fill this gap in the literature, we propose using the contrastive learning approach with a novel combination of two loss functions: fake-to-fake loss to increase the semantic consistency between generated images of the same caption, and fake-to-real loss to reduce the gap between the distributions of real images and fake ones. We test this approach on two baseline models: SSAGAN and AttnGAN (with style blocks to enhance the fine-grained details of the images.) Results show that our approach improves the qualitative results on AttnGAN with style blocks on the CUB dataset. Additionally, on the challenging COCO dataset, our approach achieves competitive results against the state-of-the-art Lafite model, outperforms the FID score of SSAGAN model by 44.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Attestation with Constrained Relying Party
Authors:
Mariam Moustafa,
Arto Niemi,
Philip Ginzboorg,
Jan-Erik Ekberg
Abstract:
Allowing a compromised device to receive privacy-sensitive sensor readings, or to operate a safety-critical actuator, carries significant risk. Usually, such risks are mitigated by validating the device's security state with remote attestation, but current remote attestation protocols are not suitable when the beneficiary of attestation, the relying party, is a constrained device such as a small s…
▽ More
Allowing a compromised device to receive privacy-sensitive sensor readings, or to operate a safety-critical actuator, carries significant risk. Usually, such risks are mitigated by validating the device's security state with remote attestation, but current remote attestation protocols are not suitable when the beneficiary of attestation, the relying party, is a constrained device such as a small sensor or actuator. These devices typically lack the power and memory to operate public-key cryptography needed by such protocols, and may only be able to communicate with devices in their physical proximity, such as with the controller whose security state they wish to evaluate. In this paper, we present a remote platform attestation protocol suitable for relying parties that are limited to symmetric-key cryptography and a single communication channel. We show that our protocol, including the needed cryptography and message processing, can be implemented with a code size of 6 KB and validate its security via model checking with the ProVerif tool.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Gibbs state sampling via cluster expansions
Authors:
Norhan M. Eassa,
Mahmoud M. Moustafa,
Arnab Banerjee,
Jeffrey Cohn
Abstract:
Gibbs states (i.e., thermal states) can be used for several applications such as quantum simulation, quantum machine learning, quantum optimization, and the study of open quantum systems. Moreover, semi-definite programming, combinatorial optimization problems, and training quantum Boltzmann machines can all be addressed by sampling from well-prepared Gibbs states. With that, however, comes the fa…
▽ More
Gibbs states (i.e., thermal states) can be used for several applications such as quantum simulation, quantum machine learning, quantum optimization, and the study of open quantum systems. Moreover, semi-definite programming, combinatorial optimization problems, and training quantum Boltzmann machines can all be addressed by sampling from well-prepared Gibbs states. With that, however, comes the fact that preparing and sampling from Gibbs states on a quantum computer are notoriously difficult tasks. Such tasks can require large overhead in resources and/or calibration even in the simplest of cases, as well as the fact that the implementation might be limited to only a specific set of systems. We propose a method based on sampling from a quasi-distribution consisting of tensor products of mixed states on local clusters, i.e., expanding the full Gibbs state into a sum of products of local "Gibbs-cumulant" type states easier to implement and sample from on quantum hardware. We begin with presenting results for 4-spin linear chains with XY spin interactions, for which we obtain the $ZZ$ dynamical spin-spin correlation functions. We also present the results of measuring the specific heat of the 8-spin chain Gibbs state $ρ_8$.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Dataset Creation Pipeline for Camera-Based Heart Rate Estimation
Authors:
Mohamed Moustafa,
Amr Elrasad,
Joseph Lemley,
Peter Corcoran
Abstract:
Heart rate is one of the most vital health metrics which can be utilized to investigate and gain intuitions into various human physiological and psychological information. Estimating heart rate without the constraints of contact-based sensors thus presents itself as a very attractive field of research as it enables well-being monitoring in a wider variety of scenarios. Consequently, various techni…
▽ More
Heart rate is one of the most vital health metrics which can be utilized to investigate and gain intuitions into various human physiological and psychological information. Estimating heart rate without the constraints of contact-based sensors thus presents itself as a very attractive field of research as it enables well-being monitoring in a wider variety of scenarios. Consequently, various techniques for camera-based heart rate estimation have been developed ranging from classical image processing to convoluted deep learning models and architectures. At the heart of such research efforts lies health and visual data acquisition, cleaning, transformation, and annotation. In this paper, we discuss how to prepare data for the task of develo** or testing an algorithm or machine learning model for heart rate estimation from images of facial regions. The data prepared is to include camera frames as well as sensor readings from an electrocardiograph sensor. The proposed pipeline is divided into four main steps, namely removal of faulty data, frame and electrocardiograph timestamp de-jittering, signal denoising and filtering, and frame annotation creation. Our main contributions are a novel technique of eliminating jitter from health sensor and camera timestamps and a method to accurately time align both visual frame and electrocardiogram sensor data which is also applicable to other sensor types.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Dynamic Conditional Imitation Learning for Autonomous Driving
Authors:
Hesham M. Eraqi,
Mohamed N. Moustafa,
Jens Honer
Abstract:
Conditional imitation learning (CIL) trains deep neural networks, in an end-to-end manner, to mimic human driving. This approach has demonstrated suitable vehicle control when following roads, avoiding obstacles, or taking specific turns at intersections to reach a destination. Unfortunately, performance dramatically decreases when deployed to unseen environments and is inconsistent against varyin…
▽ More
Conditional imitation learning (CIL) trains deep neural networks, in an end-to-end manner, to mimic human driving. This approach has demonstrated suitable vehicle control when following roads, avoiding obstacles, or taking specific turns at intersections to reach a destination. Unfortunately, performance dramatically decreases when deployed to unseen environments and is inconsistent against varying weather conditions. Most importantly, the current CIL fails to avoid static road blockages. In this work, we propose a solution to those deficiencies. First, we fuse the laser scanner with the regular camera streams, at the features level, to overcome the generalization and consistency challenges. Second, we introduce a new efficient Occupancy Grid Map** (OGM) method along with new algorithms for road blockages avoidance and global route planning. Consequently, our proposed method dynamically detects partial and full road blockages, and guides the controlled vehicle to another route to reach the destination. Following the original CIL work, we demonstrated the effectiveness of our proposal on CARLA simulator urban driving benchmark. Our experiments showed that our model improved consistency against weather conditions by four times and autonomous driving success rate generalization by 52%. Furthermore, our global route planner improved the driving success rate by 37%. Our proposed road blockages avoidance algorithm improved the driving success rate by 27%. Finally, the average kilometers traveled before a collision with a static object increased by 1.5 times. The main source code can be reached at https://heshameraqi.github.io/dynamic_cil_autonomous_driving.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
An Evaluation of RGB and LiDAR Fusion for Semantic Segmentation
Authors:
Amr S. Mohamed,
Ali Abdelkader,
Mohamed Anany,
Omar El-Behady,
Muhammad Faisal,
Asser Hangal,
Hesham M. Eraqi,
Mohamed N. Moustafa
Abstract:
LiDARs and cameras are the two main sensors that are planned to be included in many announced autonomous vehicles prototypes. Each of the two provides a unique form of data from a different perspective to the surrounding environment. In this paper, we explore and attempt to answer the question: is there an added benefit by fusing those two forms of data for the purpose of semantic segmentation wit…
▽ More
LiDARs and cameras are the two main sensors that are planned to be included in many announced autonomous vehicles prototypes. Each of the two provides a unique form of data from a different perspective to the surrounding environment. In this paper, we explore and attempt to answer the question: is there an added benefit by fusing those two forms of data for the purpose of semantic segmentation within the context of autonomous driving? We also attempt to show at which level does said fusion prove to be the most useful. We evaluated our algorithms on the publicly available SemanticKITTI dataset. All fusion models show improvements over the base model, with the mid-level fusion showing the highest improvement of 2.7% in terms of mean Intersection over Union (mIoU) metric.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
Pervasive Hand Gesture Recognition for Smartphones using Non-audible Sound and Deep Learning
Authors:
Ahmed Ibrahim,
Ayman El-Refai,
Sara Ahmed,
Mariam Aboul-Ela,
Hesham M. Eraqi,
Mohamed Moustafa
Abstract:
Due to the mass advancement in ubiquitous technologies nowadays, new pervasive methods have come into the practice to provide new innovative features and stimulate the research on new human-computer interactions. This paper presents a hand gesture recognition method that utilizes the smartphone's built-in speakers and microphones. The proposed system emits an ultrasonic sonar-based signal (inaudib…
▽ More
Due to the mass advancement in ubiquitous technologies nowadays, new pervasive methods have come into the practice to provide new innovative features and stimulate the research on new human-computer interactions. This paper presents a hand gesture recognition method that utilizes the smartphone's built-in speakers and microphones. The proposed system emits an ultrasonic sonar-based signal (inaudible sound) from the smartphone's stereo speakers, which is then received by the smartphone's microphone and processed via a Convolutional Neural Network (CNN) for Hand Gesture Recognition. Data augmentation techniques are proposed to improve the detection accuracy and three dual-channel input fusion methods are compared. The first method merges the dual-channel audio as a single input spectrogram image. The second method adopts early fusion by concatenating the dual-channel spectrograms. The third method adopts late fusion by having two convectional input branches processing each of the dual-channel spectrograms and then the outputs are merged by the last layers. Our experimental results demonstrate a promising detection accuracy for the six gestures presented in our publicly available dataset with an accuracy of 93.58\% as a baseline.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Locally correct confidence intervals for a binomial proportion: A new criteria for an interval estimator
Authors:
Paul H. Garthwaite,
Maha W. Moustafa,
Fadlalla G. Elfadaly
Abstract:
Well-recommended methods of forming `confidence intervals' for a binomial proportion give interval estimates that do not actually meet the definition of a confidence interval, in that their coverages are sometimes lower than the nominal confidence level. The methods are favoured because their intervals have a shorter average length than the Clopper-Pearson (gold-standard) method, whose intervals r…
▽ More
Well-recommended methods of forming `confidence intervals' for a binomial proportion give interval estimates that do not actually meet the definition of a confidence interval, in that their coverages are sometimes lower than the nominal confidence level. The methods are favoured because their intervals have a shorter average length than the Clopper-Pearson (gold-standard) method, whose intervals really are confidence intervals. Comparison of such methods is tricky -- the best method should perhaps be the one that gives the shortest intervals (on average), but when is the coverage of a method so poor that it should not be classed as a means of forming confidence intervals?
As the definition of a confidence interval is not being adhered to, another criterion for forming interval estimates for a binomial proportion is needed. In this paper we suggest a new criterion; methods which meet the criterion are said to yield $\textit{locally correct confidence intervals}$. We propose a method that yields such intervals and prove that its intervals have a shorter average length than those of any other method that meets the criterion. Compared with the Clopper-Pearson method, the proposed method gives intervals with an appreciably smaller average length. The mid-$p$ method also satisfies the new criterion and has its own optimality property.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Adversarial Unsupervised Domain Adaptation Guided with Deep Clustering for Face Presentation Attack Detection
Authors:
Yomna Safaa El-Din,
Mohamed N. Moustafa,
Hani Mahdi
Abstract:
Face Presentation Attack Detection (PAD) has drawn increasing attentions to secure the face recognition systems that are widely used in many applications. Conventional face anti-spoofing methods have been proposed, assuming that testing is from the same domain used for training, and so cannot generalize well on unseen attack scenarios. The trained models tend to overfit to the acquisition sensors…
▽ More
Face Presentation Attack Detection (PAD) has drawn increasing attentions to secure the face recognition systems that are widely used in many applications. Conventional face anti-spoofing methods have been proposed, assuming that testing is from the same domain used for training, and so cannot generalize well on unseen attack scenarios. The trained models tend to overfit to the acquisition sensors and attack types available in the training data. In light of this, we propose an end-to-end learning framework based on Domain Adaptation (DA) to improve PAD generalization capability. Labeled source-domain samples are used to train the feature extractor and classifier via cross-entropy loss, while unsupervised data from the target domain are utilized in adversarial DA approach causing the model to learn domain-invariant features. Using DA alone in face PAD fails to adapt well to target domain that is acquired in different conditions with different devices and attack types than the source domain. And so, in order to keep the intrinsic properties of the target domain, deep clustering of target samples is performed. Training and deep clustering are performed end-to-end, and experiments performed on several public benchmark datasets validate that our proposed Deep Clustering guided Unsupervised Domain Adaptation (DCDA) can learn more generalized information compared with the state-of-the-art classification error on the target domain.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Enhanced 3D Myocardial Strain Estimation from Multi-View 2D CMR Imaging
Authors:
Mohamed Abdelkhalek,
Heba Aguib,
Mohamed Moustafa,
Khalil Elkhodary
Abstract:
In this paper, we propose an enhanced 3D myocardial strain estimation procedure, which combines complementary displacement information from multiple orientations of a single imaging modality (untagged CMR SSFP images). To estimate myocardial strain across the left ventricle, we register the sets of short-axis, four-chamber and two-chamber views via a 2D non-rigid registration algorithm implemented…
▽ More
In this paper, we propose an enhanced 3D myocardial strain estimation procedure, which combines complementary displacement information from multiple orientations of a single imaging modality (untagged CMR SSFP images). To estimate myocardial strain across the left ventricle, we register the sets of short-axis, four-chamber and two-chamber views via a 2D non-rigid registration algorithm implemented in a commercial software (Segment, Medviso). We then create a series of interpolating functions for the three orthogonal directions of motion and use them to deform a tetrahedral mesh representation of a patient-specific left ventricle. Additionally, we correct for overestimation of displacement by introducing a weighting scheme that is based on displacement along the long axis. The procedure was evaluated on the STACOM 2011 dataset containing CMR SSFP images for 16 healthy volunteers. We show increased accuracy in estimating the three strain components (radial, circumferential, longitudinal) compared to reported results in the challenge, for the imaging modality of interest (SSFP). Our peak strain estimates are also significantly closer to reported measurements from studies of a larger cohort in the literature and our own ground truth measurements using Segment Strain Analysis Module. Our proposed procedure provides a relatively fast and simple method to improve 2D tracking results, with the added flexibility in either deforming a reconstructed mesh model from other image modalities or using the built-in CMR mesh reconstruction procedure. Our, proposed scheme presents a deforming patient-specific model of the left ventricle, using the commonest imaging modality , routinely administered in clinical settings, without requiring additional or specialized imaging protocols.
△ Less
Submitted 29 November, 2020; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Predicting Nonlinear Seismic Response of Structural Braces Using Machine Learning
Authors:
Elif Ecem Bas,
Denis Aslangil,
Mohamed A. Moustafa
Abstract:
Numerical modeling of different structural materials that have highly nonlinear behaviors has always been a challenging problem in engineering disciplines. Experimental data is commonly used to characterize this behavior. This study aims to improve the modeling capabilities by using state of the art Machine Learning techniques, and attempts to answer several scientific questions: (i) Which ML algo…
▽ More
Numerical modeling of different structural materials that have highly nonlinear behaviors has always been a challenging problem in engineering disciplines. Experimental data is commonly used to characterize this behavior. This study aims to improve the modeling capabilities by using state of the art Machine Learning techniques, and attempts to answer several scientific questions: (i) Which ML algorithm is capable and is more efficient to learn such a complex and nonlinear problem? (ii) Is it possible to artificially reproduce structural brace seismic behavior that can represent real physics? (iii) How can our findings be extended to the different engineering problems that are driven by similar nonlinear dynamics? To answer these questions, the presented methods are validated by using experimental brace data. The paper shows that after proper data preparation, the long-short term memory (LSTM) method is highly capable of capturing the nonlinear behavior of braces. Additionally, the effects of tuning the hyperparameters on the models, such as layer numbers, neuron numbers, and the activation functions, are presented. Finally, the ability to learn nonlinear dynamics by using deep neural network algorithms and their advantages are briefly discussed.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Deep convolutional neural networks for face and iris presentation attack detection: Survey and case study
Authors:
Yomna Safaa El-Din,
Mohamed N. Moustafa,
Hani Mahdi
Abstract:
Biometric presentation attack detection is gaining increasing attention. Users of mobile devices find it more convenient to unlock their smart applications with finger, face or iris recognition instead of passwords. In this paper, we survey the approaches presented in the recent literature to detect face and iris presentation attacks. Specifically, we investigate the effectiveness of fine tuning v…
▽ More
Biometric presentation attack detection is gaining increasing attention. Users of mobile devices find it more convenient to unlock their smart applications with finger, face or iris recognition instead of passwords. In this paper, we survey the approaches presented in the recent literature to detect face and iris presentation attacks. Specifically, we investigate the effectiveness of fine tuning very deep convolutional neural networks to the task of face and iris antispoofing. We compare two different fine tuning approaches on six publicly available benchmark datasets. Results show the effectiveness of these deep models in learning discriminative features that can tell apart real from fake biometric images with very low error rate. Cross-dataset evaluation on face PAD showed better generalization than state of the art. We also performed cross-dataset testing on iris PAD datasets in terms of equal error rate which was not reported in literature before. Additionally, we propose the use of a single deep network trained to detect both face and iris attacks. We have not noticed accuracy degradation compared to networks trained for only one biometric separately. Finally, we analyzed the learned features by the network, in correlation with the image frequency components, to justify its prediction decision.
△ Less
Submitted 28 April, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Using Machine Learning Approach for Computational Substructure in Real-Time Hybrid Simulation
Authors:
Elif Ecem Bas,
Mohamed A. Moustafa,
David Feil-Seifer,
Janelle Blankenburg
Abstract:
Hybrid simulation (HS) is a widely used structural testing method that combines a computational substructure with a numerical model for well-understood components and an experimental substructure for other parts of the structure that are physically tested. One challenge for fast HS or real-time HS (RTHS) is associated with the analytical substructures of relatively complex structures, which could…
▽ More
Hybrid simulation (HS) is a widely used structural testing method that combines a computational substructure with a numerical model for well-understood components and an experimental substructure for other parts of the structure that are physically tested. One challenge for fast HS or real-time HS (RTHS) is associated with the analytical substructures of relatively complex structures, which could have large number of degrees of freedoms (DOFs), for instance. These large DOFs computations could be hard to perform in real-time, even with the all current hardware capacities. In this study, a metamodeling technique is proposed to represent the structural dynamic behavior of the analytical substructure. A preliminary study is conducted where a one-bay one-story concentrically braced frame (CBF) is tested under earthquake loading by using a compact HS setup at the University of Nevada, Reno. The experimental setup allows for using a small-scale brace as the experimental substructure combined with a steel frame at the prototype full-scale for the analytical substructure. Two different machine learning algorithms are evaluated to provide a valid and useful metamodeling solution for analytical substructure. The metamodels are trained with the available data that is obtained from the pure analytical solution of the prototype steel frame. The two algorithms used for develo** the metamodels are: (1) linear regression (LR) model, and (2) basic recurrent neural network (RNN). The metamodels are first validated against the pure analytical response of the structure. Next, RTHS experiments are conducted by using metamodels. RTHS test results using both LR and RNN models are evaluated, and the advantages and disadvantages of these models are discussed.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
Driver Distraction Identification with an Ensemble of Convolutional Neural Networks
Authors:
Hesham M. Eraqi,
Yehya Abouelnaga,
Mohamed H. Saad,
Mohamed N. Moustafa
Abstract:
The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are…
▽ More
The World Health Organization (WHO) reported 1.25 million deaths yearly due to road traffic accidents worldwide and the number has been continuously increasing over the last few years. Nearly fifth of these accidents are caused by distracted drivers. Existing work of distracted driver detection is concerned with a small set of distractions (mostly, cell phone usage). Unreliable ad-hoc methods are often used.In this paper, we present the first publicly available dataset for driver distraction identification with more distraction postures than existing alternatives. In addition, we propose a reliable deep learning-based solution that achieves a 90% accuracy. The system consists of a genetically-weighted ensemble of convolutional neural networks, we show that a weighted ensemble of classifiers using a genetic algorithm yields in a better classification confidence. We also study the effect of different visual elements in distraction detection by means of face and hand localizations, and skin segmentation. Finally, we present a thinned version of our ensemble that could achieve 84.64% classification accuracy and operate in a real-time environment.
△ Less
Submitted 22 January, 2019;
originally announced January 2019.
-
Benchmarking neural networks for quantum computation
Authors:
N. H. Nguyen,
E. C. Behrman,
M. A. Moustafa,
J. E. Steck
Abstract:
The power of quantum computers is still somewhat speculative. While they are certainly faster than classical ones at some tasks, the class of problems they can efficiently solve has not been mapped definitively onto known classical complexity theory. This means that we do not know for which calculations there will be a "quantum advantage," once an algorithm is found. One way to answer the question…
▽ More
The power of quantum computers is still somewhat speculative. While they are certainly faster than classical ones at some tasks, the class of problems they can efficiently solve has not been mapped definitively onto known classical complexity theory. This means that we do not know for which calculations there will be a "quantum advantage," once an algorithm is found. One way to answer the question is to find those algorithms, but finding truly quantum algorithms turns out to be very difficult. In previous work over the past three decades we have pursued the idea of using techniques of machine learning to develop algorithms for quantum computing. Here we compare the performance of standard real- and complex-valued classical neural networks with that of one of our models for a quantum neural network, on both classical problems and on an archetypal quantum problem: the computation of an entanglement witness. The quantum network is shown to need far fewer epochs and a much smaller network to achieve comparable or better results.
△ Less
Submitted 6 December, 2018; v1 submitted 9 July, 2018;
originally announced July 2018.
-
End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies
Authors:
Hesham M. Eraqi,
Mohamed N. Moustafa,
Jens Honer
Abstract:
Steering a car through traffic is a complex task that is difficult to cast into algorithms. Therefore, researchers turn to training artificial neural networks from front-facing camera data stream along with the associated steering angles. Nevertheless, most existing solutions consider only the visual camera frames as input, thus ignoring the temporal relationship between frames. In this work, we p…
▽ More
Steering a car through traffic is a complex task that is difficult to cast into algorithms. Therefore, researchers turn to training artificial neural networks from front-facing camera data stream along with the associated steering angles. Nevertheless, most existing solutions consider only the visual camera frames as input, thus ignoring the temporal relationship between frames. In this work, we propose a Convolutional Long Short-Term Memory Recurrent Neural Network (C-LSTM), that is end-to-end trainable, to learn both visual and dynamic temporal dependencies of driving. Additionally, We introduce posing the steering angle regression problem as classification while imposing a spatial relationship between the output layer neurons. Such method is based on learning a sinusoidal function that encodes steering angles. To train and validate our proposed methods, we used the publicly available Comma.ai dataset. Our solution improved steering root mean square error by 35% over recent methods, and led to a more stable steering by 87%.
△ Less
Submitted 22 November, 2017; v1 submitted 10 October, 2017;
originally announced October 2017.
-
Real-time Distracted Driver Posture Classification
Authors:
Yehya Abouelnaga,
Hesham M. Eraqi,
Mohamed N. Moustafa
Abstract:
In this paper, we present a new dataset for "distracted driver" posture estimation. In addition, we propose a novel system that achieves 95.98% driving posture estimation classification accuracy. The system consists of a genetically-weighted ensemble of Convolutional Neural Networks (CNNs). We show that a weighted ensemble of classifiers using a genetic algorithm yields in better classification co…
▽ More
In this paper, we present a new dataset for "distracted driver" posture estimation. In addition, we propose a novel system that achieves 95.98% driving posture estimation classification accuracy. The system consists of a genetically-weighted ensemble of Convolutional Neural Networks (CNNs). We show that a weighted ensemble of classifiers using a genetic algorithm yields in better classification confidence. We also study the effect of different visual elements (i.e. hands and face) in distraction detection and classification by means of face and hand localizations. Finally, we present a thinned version of our ensemble that could achieve a 94.29% classification accuracy and operate in a realtime environment.
△ Less
Submitted 29 November, 2018; v1 submitted 28 June, 2017;
originally announced June 2017.
-
A Hybrid Deep Learning Approach for Texture Analysis
Authors:
Hussein Adly,
Mohamed Moustafa
Abstract:
Texture classification is a problem that has various applications such as remote sensing and forest species recognition. Solutions tend to be custom fit to the dataset used but fails to generalize. The Convolutional Neural Network (CNN) in combination with Support Vector Machine (SVM) form a robust selection between powerful invariant feature extractor and accurate classifier. The fusion of expert…
▽ More
Texture classification is a problem that has various applications such as remote sensing and forest species recognition. Solutions tend to be custom fit to the dataset used but fails to generalize. The Convolutional Neural Network (CNN) in combination with Support Vector Machine (SVM) form a robust selection between powerful invariant feature extractor and accurate classifier. The fusion of experts provides stability in classification rates among different datasets.
△ Less
Submitted 24 March, 2017;
originally announced March 2017.
-
CIFAR-10: KNN-based Ensemble of Classifiers
Authors:
Yehya Abouelnaga,
Ola S. Ali,
Hager Rady,
Mohamed Moustafa
Abstract:
In this paper, we study the performance of different classifiers on the CIFAR-10 dataset, and build an ensemble of classifiers to reach a better performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and Convolutional Neural Network (CNN), on some classes, are mutually exclusive, thus yield in higher accuracy when combined. We reduce KNN overfitting using Principal Component Analysis (P…
▽ More
In this paper, we study the performance of different classifiers on the CIFAR-10 dataset, and build an ensemble of classifiers to reach a better performance. We show that, on CIFAR-10, K-Nearest Neighbors (KNN) and Convolutional Neural Network (CNN), on some classes, are mutually exclusive, thus yield in higher accuracy when combined. We reduce KNN overfitting using Principal Component Analysis (PCA), and ensemble it with a CNN to increase its accuracy. Our approach improves our best CNN model from 93.33% to 94.03%.
△ Less
Submitted 15 November, 2016;
originally announced November 2016.
-
Reactive Collision Avoidance using Evolutionary Neural Networks
Authors:
Hesham Eraqi,
Youssef EmadEldin,
Mohamed Moustafa
Abstract:
Collision avoidance systems can play a vital role in reducing the number of accidents and saving human lives. In this paper, we introduce and validate a novel method for vehicles reactive collision avoidance using evolutionary neural networks (ENN). A single front-facing rangefinder sensor is the only input required by our method. The training process and the proposed method analysis and validatio…
▽ More
Collision avoidance systems can play a vital role in reducing the number of accidents and saving human lives. In this paper, we introduce and validate a novel method for vehicles reactive collision avoidance using evolutionary neural networks (ENN). A single front-facing rangefinder sensor is the only input required by our method. The training process and the proposed method analysis and validation are carried out using simulation. Extensive experiments are conducted to analyse the proposed method and evaluate its performance. Firstly, we experiment the ability to learn collision avoidance in a static free track. Secondly, we analyse the effect of the rangefinder sensor resolution on the learning process. Thirdly, we experiment the ability of a vehicle to individually and simultaneously learn collision avoidance. Finally, we test the generality of the proposed method. We used a more realistic and powerful simulation environment (CarMaker), a camera as an alternative input sensor, and lane kee** as an extra feature to learn. The results are encouraging; the proposed method successfully allows vehicles to learn collision avoidance in different scenarios that are unseen during training. It also generalizes well if any of the input sensor, the simulator, or the task to be learned is changed.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
House price estimation from visual and textual features
Authors:
Eman Ahmed,
Mohamed Moustafa
Abstract:
Most existing automatic house price estimation systems rely only on some textual data like its neighborhood area and the number of rooms. The final price is estimated by a human agent who visits the house and assesses it visually. In this paper, we propose extracting visual features from house photographs and combining them with the house's textual information. The combined features are fed to a f…
▽ More
Most existing automatic house price estimation systems rely only on some textual data like its neighborhood area and the number of rooms. The final price is estimated by a human agent who visits the house and assesses it visually. In this paper, we propose extracting visual features from house photographs and combining them with the house's textual information. The combined features are fed to a fully connected multilayer Neural Network (NN) that estimates the house price as its single output. To train and evaluate our network, we have collected the first houses dataset (to our knowledge) that combines both images and textual attributes. The dataset is composed of 535 sample houses from the state of California, USA. Our experiments showed that adding the visual features increased the R-value by a factor of 3 and decreased the Mean Square Error (MSE) by one order of magnitude compared with textual-only features. Additionally, when trained on the benchmark textual-only features housing dataset, our proposed NN still outperformed the existing model published results.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
GAdaBoost: Accelerating Adaboost Feature Selection with Genetic Algorithms
Authors:
Mai Tolba,
Mohamed Moustafa
Abstract:
Boosted cascade of simple features, by Viola and Jones, is one of the most famous object detection frameworks. However, it suffers from a lengthy training process. This is due to the vast features space and the exhaustive search nature of Adaboost. In this paper we propose GAdaboost: a Genetic Algorithm to accelerate the training procedure through natural feature selection. Specifically, we propos…
▽ More
Boosted cascade of simple features, by Viola and Jones, is one of the most famous object detection frameworks. However, it suffers from a lengthy training process. This is due to the vast features space and the exhaustive search nature of Adaboost. In this paper we propose GAdaboost: a Genetic Algorithm to accelerate the training procedure through natural feature selection. Specifically, we propose to limit Adaboost search within a subset of the huge feature space, while evolving this subset following a Genetic Algorithm. Experiments demonstrate that our proposed GAdaboost is up to 3.7 times faster than Adaboost. We also demonstrate that the price of this speedup is a mere decrease (3%, 4%) in detection accuracy when tested on FDDB benchmark face detection set, and Caltech Web Faces respectively.
△ Less
Submitted 20 September, 2016;
originally announced September 2016.
-
Learning quantum annealing
Authors:
E. C. Behrman,
J. E. Steck,
M. A. Moustafa
Abstract:
We propose and develop a new procedure, whereby a quantum system can learn to anneal to a desired ground state. We demonstrate successful learning to produce an entangled state for a two-qubit system, then demonstrate generalizability to larger systems. The amount of additional learning necessary decreases as the size of the system increases. Because current technologies limit measurement of the s…
▽ More
We propose and develop a new procedure, whereby a quantum system can learn to anneal to a desired ground state. We demonstrate successful learning to produce an entangled state for a two-qubit system, then demonstrate generalizability to larger systems. The amount of additional learning necessary decreases as the size of the system increases. Because current technologies limit measurement of the states of quantum annealing machines to determination of the average spin at each site, we then construct a "broken pathway" between the initial and desired states, at each step of which the average spins are nonzero, and show successful learning of that pathway. Using this technique we show we can direct annealing to multiqubit GHZ and W states, and verify that we have done so. Because quantum neural networks are robust to noise and decoherence we expect our method to be readily implemented experimentally; we show some preliminary results which support this.
△ Less
Submitted 31 January, 2017; v1 submitted 5 March, 2016;
originally announced March 2016.
-
Applying deep learning to classify pornographic images and videos
Authors:
Mohamed Moustafa
Abstract:
It is no secret that pornographic material is now a one-click-away from everyone, including children and minors. General social media networks are striving to isolate adult images and videos from normal ones. Intelligent image analysis methods can help to automatically detect and isolate questionable images in media. Unfortunately, these methods require vast experience to design the classifier inc…
▽ More
It is no secret that pornographic material is now a one-click-away from everyone, including children and minors. General social media networks are striving to isolate adult images and videos from normal ones. Intelligent image analysis methods can help to automatically detect and isolate questionable images in media. Unfortunately, these methods require vast experience to design the classifier including one or more of the popular computer vision feature descriptors. We propose to build a classifier based on one of the recently flourishing deep learning techniques. Convolutional neural networks contain many layers for both automatic features extraction and classification. The benefit is an easier system to build (no need for hand-crafting features and classifiers). Additionally, our experiments show that it is even more accurate than the state of the art methods on the most recent benchmark dataset.
△ Less
Submitted 28 November, 2015;
originally announced November 2015.