Search | arXiv e-print repository

Optimized Model Selection for Estimating Treatment Effects from Costly Simulations of the US Opioid Epidemic

Authors: Abdulrahman A. Ahmed, M. Amin Rahimian, Mark S. Roberts

Abstract: Agent-based simulation with a synthetic population can help us compare different treatment conditions while kee** everything else constant within the same population (i.e., as digital twins). Such population-scale simulations require large computational power (i.e., CPU resources) to get accurate estimates for treatment effects. We can use meta models of the simulation results to circumvent the… ▽ More Agent-based simulation with a synthetic population can help us compare different treatment conditions while kee** everything else constant within the same population (i.e., as digital twins). Such population-scale simulations require large computational power (i.e., CPU resources) to get accurate estimates for treatment effects. We can use meta models of the simulation results to circumvent the need to simulate every treatment condition. Selecting the best estimating model at a given sample size (number of simulation runs) is a crucial problem. Depending on the sample size, the ability of the method to estimate accurately can change significantly. In this paper, we discuss different methods to explore what model works best at a specific sample size. In addition to the empirical results, we provide a mathematical analysis of the MSE equation and how its components decide which model to select and why a specific method behaves that way in a range of sample sizes. The analysis showed why the direction estimation method is better than model-based methods in larger sample sizes and how the between-group variation and the within-group variation affect the MSE equation. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: To be presented in 2024 Annual Simulation Conference (ANNSIM'24)

arXiv:2308.13040 [pdf, other]

Estimating Treatment Effects Using Costly Simulation Samples from a Population-Scale Model of Opioid Use Disorder

Authors: Abdulrahman A. Ahmed, M. Amin Rahimian, Mark S. Roberts

Abstract: Large-scale models require substantial computational resources for analysis and studying treatment conditions. Specifically, estimating treatment effects using simulations may require a lot of infeasible resources to allocate at every treatment condition. Therefore, it is essential to develop efficient methods to allocate computational resources for estimating treatment effects. Agent-based simula… ▽ More Large-scale models require substantial computational resources for analysis and studying treatment conditions. Specifically, estimating treatment effects using simulations may require a lot of infeasible resources to allocate at every treatment condition. Therefore, it is essential to develop efficient methods to allocate computational resources for estimating treatment effects. Agent-based simulation allows us to generate highly realistic simulation samples. FRED (A Framework for Reconstructing Epidemiological Dynamics) is an agent-based modeling system with a geospatial perspective using a synthetic population constructed based on the U.S. census data. Given its synthetic population, FRED simulations present a baseline for comparable results from different treatment conditions and treatment conditions. In this paper, we show three other methods for estimating treatment effects. In the first method, we resort to brute-force allocation, where all treatment conditions have an equal number of samples with a relatively large number of simulation runs. In the second method, we try to reduce the number of simulation runs by customizing individual samples required for each treatment effect based on the width of confidence intervals around the mean estimates. In the third method, we use a regression model, which allows us to learn across the treatment conditions such that simulation samples allocated for a treatment condition will help better estimate treatment effects in other conditions. We show that the regression-based methods result in a comparable estimate of treatment effects with less computational resources. The reduced variability and faster convergence of model-based estimates come at the cost of increased bias, and the bias-variance trade-off can be controlled by adjusting the number of model parameters (e.g., including higher-order interaction terms in the regression model). △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: To be presented in IEEE International Conference on Biomedical and Health Informatics 2023, repository link: https://github.com/abdulrahmanfci/intervention-estimation

arXiv:2307.12186 [pdf, other]

Inferring epidemic dynamics using Gaussian process emulation of agent-based simulations

Authors: Abdulrahman A. Ahmed, M. Amin Rahimian, Mark S. Roberts

Abstract: Computational models help decision makers understand epidemic dynamics to optimize public health interventions. Agent-based simulation of disease spread in synthetic populations allows us to compare and contrast different effects across identical populations or to investigate the effect of interventions kee** every other factor constant between ``digital twins''. FRED (A Framework for Reconstruc… ▽ More Computational models help decision makers understand epidemic dynamics to optimize public health interventions. Agent-based simulation of disease spread in synthetic populations allows us to compare and contrast different effects across identical populations or to investigate the effect of interventions kee** every other factor constant between ``digital twins''. FRED (A Framework for Reconstructing Epidemiological Dynamics) is an agent-based modeling system with a geo-spatial perspective using a synthetic population that is constructed based on the U.S. census data. In this paper, we show how Gaussian process regression can be used on FRED-synthesized data to infer the differing spatial dispersion of the epidemic dynamics for two disease conditions that start from the same initial conditions and spread among identical populations. Our results showcase the utility of agent-based simulation frameworks such as FRED for inferring differences between conditions where controlling for all confounding factors for such comparisons is next to impossible without synthetic data. △ Less

Submitted 11 September, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: To be presented in Winter Simulation Conference 2023, repository link: https://github.com/abdulrahmanfci/gpr-abm

arXiv:2206.07798 [pdf, other]

Gaussian Blue Noise

Authors: Abdalla G. M. Ahmed, **g Ren, Peter Wonka

Abstract: Among the various approaches for producing point distributions with blue noise spectrum, we argue for an optimization framework using Gaussian kernels. We show that with a wise selection of optimization parameters, this approach attains unprecedented quality, provably surpassing the current state of the art attained by the optimal transport (BNOT) approach. Further, we show that our algorithm scal… ▽ More Among the various approaches for producing point distributions with blue noise spectrum, we argue for an optimization framework using Gaussian kernels. We show that with a wise selection of optimization parameters, this approach attains unprecedented quality, provably surpassing the current state of the art attained by the optimal transport (BNOT) approach. Further, we show that our algorithm scales smoothly and feasibly to high dimensions while maintaining the same quality, realizing unprecedented high-quality high-dimensional blue noise sets. Finally, we show an extension to adaptive sampling. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2104.07061 [pdf, other]

Exact and Approximate Hierarchical Clustering Using A*

Authors: Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Avinava Dubey, Patrick Flaherty, Manzil Zaheer, Amr Ahmed, Kyle Cranmer, Andrew McCallum

Abstract: Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To… ▽ More Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with $10^{12}$ trees to $10^{15}$ trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than $10^{1000}$ trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: 30 pages, 9 figures

arXiv:2012.08489 [pdf, other]

Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization

Authors: Valerio Perrone, Huibin Shen, Aida Zolic, Iaroslav Shcherbatyi, Amr Ahmed, Tanya Bansal, Michele Donini, Fela Winkelmolen, Rodolphe Jenatton, Jean Baptiste Faddoul, Barbara Pogorzelska, Miroslav Miladinovic, Krishnaram Kenthapadi, Matthias Seeger, Cédric Archambeau

Abstract: Tuning complex machine learning systems is challenging. Machine learning typically requires to set hyperparameters, be it regularization, architecture, or optimization parameters, whose tuning is critical to achieve good predictive performance. To democratize access to machine learning systems, it is essential to automate the tuning. This paper presents Amazon SageMaker Automatic Model Tuning (AMT… ▽ More Tuning complex machine learning systems is challenging. Machine learning typically requires to set hyperparameters, be it regularization, architecture, or optimization parameters, whose tuning is critical to achieve good predictive performance. To democratize access to machine learning systems, it is essential to automate the tuning. This paper presents Amazon SageMaker Automatic Model Tuning (AMT), a fully managed system for gradient-free optimization at scale. AMT finds the best version of a trained machine learning model by repeatedly evaluating it with different hyperparameter configurations. It leverages either random search or Bayesian optimization to choose the hyperparameter values resulting in the best model, as measured by the metric chosen by the user. AMT can be used with built-in algorithms, custom algorithms, and Amazon SageMaker pre-built containers for machine learning frameworks. We discuss the core functionality, system architecture, our design principles, and lessons learned. We also describe more advanced features of AMT, such as automated early stop** and warm-starting, showing in experiments their benefits to users. △ Less

Submitted 18 June, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

arXiv:2009.07717 [pdf, other]

doi 10.1007/978-3-030-68790-8_51

Relative Attribute Classification with Deep Rank SVM

Authors: Sara Atito Ali Ahmed, Berrin Yanikoglu

Abstract: Relative attributes indicate the strength of a particular attribute between image pairs. We introduce a deep Siamese network with rank SVM loss function, called Deep Rank SVM (DRSVM), in order to decide which one of a pair of images has a stronger presence of a specific attribute. The network is trained in an end-to-end fashion to jointly learn the visual features and the ranking function. We demo… ▽ More Relative attributes indicate the strength of a particular attribute between image pairs. We introduce a deep Siamese network with rank SVM loss function, called Deep Rank SVM (DRSVM), in order to decide which one of a pair of images has a stronger presence of a specific attribute. The network is trained in an end-to-end fashion to jointly learn the visual features and the ranking function. We demonstrate the effectiveness of our approach against the state-of-the-art methods on four image benchmark datasets: LFW-10, PubFig, UTZap50K-lexi and UTZap50K-2 datasets. DRSVM surpasses state-of-art in terms of the average accuracy across attributes, on three of the four image benchmark datasets. △ Less

Submitted 9 September, 2020; originally announced September 2020.

arXiv:2009.06851 [pdf, other]

Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes

Authors: Xinyuan Zhang, Ruiyi Zhang, Manzil Zaheer, Amr Ahmed

Abstract: High-quality dialogue-summary paired data is expensive to produce and domain-sensitive, making abstractive dialogue summarization a challenging task. In this work, we propose the first unsupervised abstractive dialogue summarization model for tete-a-tetes (SuTaT). Unlike standard text summarization, a dialogue summarization method should consider the multi-speaker scenario where the speakers have… ▽ More High-quality dialogue-summary paired data is expensive to produce and domain-sensitive, making abstractive dialogue summarization a challenging task. In this work, we propose the first unsupervised abstractive dialogue summarization model for tete-a-tetes (SuTaT). Unlike standard text summarization, a dialogue summarization method should consider the multi-speaker scenario where the speakers have different roles, goals, and language styles. In a tete-a-tete, such as a customer-agent conversation, SuTaT aims to summarize for each speaker by modeling the customer utterances and the agent utterances separately while retaining their correlations. SuTaT consists of a conditional generative module and two unsupervised summarization modules. The conditional generative module contains two encoders and two decoders in a variational autoencoder framework where the dependencies between two latent spaces are captured. With the same encoders and decoders, two unsupervised summarization modules equipped with sentence-level self-attention mechanisms generate summaries without using any annotations. Experimental results show that SuTaT is superior on unsupervised dialogue summarization for both automatic and human evaluations, and is capable of dialogue classification and single-turn conversation generation. △ Less

Submitted 14 September, 2020; originally announced September 2020.

arXiv:2009.02961 [pdf, other]

doi 10.1109/ACCESS.2021.3088717

Deep Convolutional Neural Network Ensembles using ECOC

Authors: Sara Atito Ali Ahmed, Cemre Zor, Berrin Yanikoglu, Muhammad Awais, Josef Kittler

Abstract: Deep neural networks have enhanced the performance of decision making systems in many applications including image understanding, and further gains can be achieved by constructing ensembles. However, designing an ensemble of deep networks is often not very beneficial since the time needed to train the networks is very high or the performance gain obtained is not very significant. In this paper, we… ▽ More Deep neural networks have enhanced the performance of decision making systems in many applications including image understanding, and further gains can be achieved by constructing ensembles. However, designing an ensemble of deep networks is often not very beneficial since the time needed to train the networks is very high or the performance gain obtained is not very significant. In this paper, we analyse error correcting output coding (ECOC) framework to be used as an ensemble technique for deep networks and propose different design strategies to address the accuracy-complexity trade-off. We carry out an extensive comparative study between the introduced ECOC designs and the state-of-the-art ensemble techniques such as ensemble averaging and gradient boosting decision trees. Furthermore, we propose a combinatory technique which is shown to achieve the highest classification performance amongst all. △ Less

Submitted 7 March, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

Comments: 13 pages double column IEEE transactions style

MSC Class: 68T07; ACM Class: I.5.2; I.2.0

arXiv:2009.01004 [pdf, other]

FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERT

Authors: Omar Mossad, Amgad Ahmed, Anandharaju Raju, Hari Karthikeyan, Zayed Ahmed

Abstract: Machine based text comprehension has always been a significant research field in natural language processing. Once a full understanding of the text context and semantics is achieved, a deep learning model can be trained to solve a large subset of tasks, e.g. text summarization, classification and question answering. In this paper we focus on the question answering problem, specifically the multipl… ▽ More Machine based text comprehension has always been a significant research field in natural language processing. Once a full understanding of the text context and semantics is achieved, a deep learning model can be trained to solve a large subset of tasks, e.g. text summarization, classification and question answering. In this paper we focus on the question answering problem, specifically the multiple choice type of questions. We develop a model based on BERT, a state-of-the-art transformer network. Moreover, we alleviate the ability of BERT to support large text corpus by extracting the highest influence sentences through a semantic similarity model. Evaluations of our proposed model demonstrate that it outperforms the leading models in the MovieQA challenge and we are currently ranked first in the leader board with test accuracy of 87.79%. Finally, we discuss the model shortcomings and suggest possible improvements to overcome these limitations. △ Less

Submitted 22 August, 2020; originally announced September 2020.

Comments: source code available: https://github.com/omossad/fat-albert

arXiv:2007.14062 [pdf, other]

Big Bird: Transformers for Longer Sequences

Authors: Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

Abstract: Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that… ▽ More Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data. △ Less

Submitted 8 January, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

Journal ref: Neural Information Processing Systems (NeurIPS) 2020

arXiv:2006.08714 [pdf, other]

Latent Bandits Revisited

Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier

Abstract: A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning---complex models can be learned offline with the agent identifying latent state online---… ▽ More A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning---complex models can be learned offline with the agent identifying latent state online---of practical relevance in, say, recommender systems. In this work, we propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. Our methods are contextual and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 16 pages, 2 figures

arXiv:2006.08236 [pdf, other]

Non-Stationary Off-Policy Optimization

Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed

Abstract: Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution… ▽ More Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to state-of-the-art baselines on both synthetic and real-world datasets. Our approach outperforms methods that act only on observed context. △ Less

Submitted 4 April, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: AISTATS 2021; 16 pages, 2 figures

arXiv:2003.12159 [pdf, other]

Learning To Solve Differential Equations Across Initial Conditions

Authors: Shehryar Malik, Usman Anwar, Ali Ahmed, Alireza Aghasi

Abstract: Recently, there has been a lot of interest in using neural networks for solving partial differential equations. A number of neural network-based partial differential equation solvers have been formulated which provide performances equivalent, and in some cases even superior, to classical solvers. However, these neural solvers, in general, need to be retrained each time the initial conditions or th… ▽ More Recently, there has been a lot of interest in using neural networks for solving partial differential equations. A number of neural network-based partial differential equation solvers have been formulated which provide performances equivalent, and in some cases even superior, to classical solvers. However, these neural solvers, in general, need to be retrained each time the initial conditions or the domain of the partial differential equation changes. In this work, we posit the problem of approximating the solution of a fixed partial differential equation for any arbitrary initial conditions as learning a conditional probability distribution. We demonstrate the utility of our method on Burger's Equation. △ Less

Submitted 19 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

arXiv:2003.08197 [pdf, other]

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

Abstract: Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural grou**s and similarities) and embed the objects independently into individual vectors. As a result, existing meth… ▽ More Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural grou**s and similarities) and embed the objects independently into individual vectors. As a result, existing methods do not scale to large vocabulary sizes. In this paper, we design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix. We call our method Anchor & Transform (ANT) as the embeddings of discrete objects are a sparse linear combination of the anchors, weighted according to the transformation matrix. ANT is scalable, flexible, and end-to-end trainable. We further provide a statistical interpretation of our algorithm as a Bayesian nonparametric prior for embeddings that encourages sparsity and leverages natural grou**s among objects. By deriving an approximate inference algorithm based on Small Variance Asymptotics, we obtain a natural extension that automatically learns the optimal number of anchors instead of having to tune it as a hyperparameter. On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes and demonstrates stronger performance with fewer parameters (up to 40x compression) as compared to existing compression baselines. △ Less

Submitted 11 March, 2021; v1 submitted 18 March, 2020; originally announced March 2020.

Comments: ICLR 2021, code can be found at http://github.com/pliang279/sparse_discrete

arXiv:2002.12578 [pdf, other]

Class-Specific Blind Deconvolutional Phase Retrieval Under a Generative Prior

Authors: Fahad Shamshad, Ali Ahmed

Abstract: In this paper, we consider the highly ill-posed problem of jointly recovering two real-valued signals from the phaseless measurements of their circular convolution. The problem arises in various imaging modalities such as Fourier ptychography, X-ray crystallography, and in visible light communication. We propose to solve this inverse problem using alternating gradient descent algorithm under two p… ▽ More In this paper, we consider the highly ill-posed problem of jointly recovering two real-valued signals from the phaseless measurements of their circular convolution. The problem arises in various imaging modalities such as Fourier ptychography, X-ray crystallography, and in visible light communication. We propose to solve this inverse problem using alternating gradient descent algorithm under two pretrained deep generative networks as priors; one is trained on sharp images and the other on blur kernels. The proposed recovery algorithm strives to find a sharp image and a blur kernel in the range of the respective pre-generators that \textit{best} explain the forward measurement model. In doing so, we are able to reconstruct quality image estimates. Moreover, the numerics show that the proposed approach performs well on the challenging measurement models that reflect the physically realizable imaging systems and is also robust to noise △ Less

Submitted 28 February, 2020; originally announced February 2020.

Comments: 10 pages

arXiv:2001.08155 [pdf, other]

An Intelligent and Time-Efficient DDoS Identification Framework for Real-Time Enterprise Networks SAD-F: Spark Based Anomaly Detection Framework

Authors: Awais Ahmed, Sufian Hameed, Muhammad Rafi, Qublai Khan Ali Mirza

Abstract: Anomaly detection is a crucial step for preventing malicious activities in the network and kee** resources available all the time for legitimate users. It is noticed from various studies that classical anomaly detectors work well with small and sampled data, but the chances of failures increase with real-time (non-sampled data) traffic data. In this paper, we will be exploring security analytic… ▽ More Anomaly detection is a crucial step for preventing malicious activities in the network and kee** resources available all the time for legitimate users. It is noticed from various studies that classical anomaly detectors work well with small and sampled data, but the chances of failures increase with real-time (non-sampled data) traffic data. In this paper, we will be exploring security analytic techniques for DDoS anomaly detection using different machine learning techniques. In this paper, we are proposing a novel approach which deals with real traffic as input to the system. Further, we study and compare the performance factor of our proposed framework on three different testbeds including normal commodity hardware, low-end system, and high-end system. Hardware details of testbeds are discussed in the respective section. Further in this paper, we investigate the performance of the classifiers in (near) real-time detection of anomalies attacks. This study also focused on the feature selection process that is as important for the anomaly detection process as it is for general modeling problems. Several techniques have been studied for feature selection and it is observed that proper feature selection can increase performance in terms of model's execution time - which totally depends upon the traffic file or traffic capturing process. △ Less

Submitted 14 February, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

arXiv:1909.10340 [pdf, other]

AHA! an 'Artificial Hippocampal Algorithm' for Episodic Machine Learning

Authors: Gideon Kowadlo, Abdelrahman Ahmed, David Rawlinson

Abstract: The majority of ML research concerns slow, statistical learning of i.i.d. samples from large, labelled datasets. Animals do not learn this way. An enviable characteristic of animal learning is `episodic' learning - the ability to memorise a specific experience as a composition of existing concepts, after just one experience, without provided labels. The new knowledge can then be used to distinguis… ▽ More The majority of ML research concerns slow, statistical learning of i.i.d. samples from large, labelled datasets. Animals do not learn this way. An enviable characteristic of animal learning is `episodic' learning - the ability to memorise a specific experience as a composition of existing concepts, after just one experience, without provided labels. The new knowledge can then be used to distinguish between similar experiences, to generalise between classes, and to selectively consolidate to long-term memory. The Hippocampus is known to be vital to these abilities. AHA is a biologically-plausible computational model of the Hippocampus. Unlike most machine learning models, AHA is trained without external labels and uses only local credit assignment. We demonstrate AHA in a superset of the Omniglot one-shot classification benchmark. The extended benchmark covers a wider range of known hippocampal functions by testing pattern separation, completion, and recall of original input. These functions are all performed within a single configuration of the computational model. Despite these constraints, image classification results are comparable to conventional deep convolutional ANNs. △ Less

Submitted 25 March, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

ACM Class: I.2.6; I.5.0; I.5.1

arXiv:1905.13652 [pdf]

L0 Regularization Based Neural Network Design and Compression

Authors: S. Asim Ahmed

Abstract: We consider complexity of Deep Neural Networks (DNNs) and their associated massive over-parameterization. Such over-parametrization may entail susceptibility to adversarial attacks, loss of interpretability and adverse Size, Weight and Power - Cost (SWaP-C) considerations. We ask if there are methodical ways (regularization) to reduce complexity and how can we interpret trade-off between desired m… ▽ More We consider complexity of Deep Neural Networks (DNNs) and their associated massive over-parameterization. Such over-parametrization may entail susceptibility to adversarial attacks, loss of interpretability and adverse Size, Weight and Power - Cost (SWaP-C) considerations. We ask if there are methodical ways (regularization) to reduce complexity and how can we interpret trade-off between desired metric and complexity of DNN. Reducing complexity is directly applicable to scaling of AI applications to real world problems (especially for off-the-cloud applications). We show that presence and evaluation of the knee of the tradeoff curve. We apply a form of L0 regularization to MNIST data and signal modulation classifications. We show that such regularization captures saliency in the input space as well. △ Less

Submitted 31 May, 2019; originally announced May 2019.

Comments: 4 pages 11 figures

arXiv:1905.11589 [pdf, other]

Learning distant cause and effect using only local and immediate credit assignment

Authors: David Rawlinson, Abdelrahman Ahmed, Gideon Kowadlo

Abstract: We present a recurrent neural network memory that uses sparse coding to create a combinatoric encoding of sequential inputs. Using several examples, we show that the network can associate distant causes and effects in a discrete stochastic process, predict partially-observable higher-order sequences, and enable a DQN agent to navigate a maze by giving it memory. The network uses only biologically-… ▽ More We present a recurrent neural network memory that uses sparse coding to create a combinatoric encoding of sequential inputs. Using several examples, we show that the network can associate distant causes and effects in a discrete stochastic process, predict partially-observable higher-order sequences, and enable a DQN agent to navigate a maze by giving it memory. The network uses only biologically-plausible, local and immediate credit assignment. Memory requirements are typically one order of magnitude less than existing LSTM, GRU and autoregressive feed-forward sequence learning models. The most significant limitation of the memory is generalization to unseen input sequences. We explore this limitation by measuring next-word prediction perplexity on the Penn Treebank dataset. △ Less

Submitted 18 August, 2021; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)

arXiv:1812.11065 [pdf, other]

Deep Ptych: Subsampled Fourier Ptychography using Generative Priors

Authors: Fahad Shamshad, Farwa Abbas, Ali Ahmed

Abstract: This paper proposes a novel framework to regularize the highly ill-posed and non-linear Fourier ptychography problem using generative models. We demonstrate experimentally that our proposed algorithm, Deep Ptych, outperforms the existing Fourier ptychography techniques, in terms of quality of reconstruction and robustness against noise, using far fewer samples. We further modify the proposed appro… ▽ More This paper proposes a novel framework to regularize the highly ill-posed and non-linear Fourier ptychography problem using generative models. We demonstrate experimentally that our proposed algorithm, Deep Ptych, outperforms the existing Fourier ptychography techniques, in terms of quality of reconstruction and robustness against noise, using far fewer samples. We further modify the proposed approach to allow the generative model to explore solutions outside the range, leading to improved performance. △ Less

Submitted 22 December, 2018; originally announced December 2018.

arXiv:1812.07159 [pdf, other]

Autoencoder Based Architecture For Fast & Real Time Audio Style Transfer

Authors: Dhruv Ramani, Samarjit Karmakar, Anirban Panda, Asad Ahmed, Pratham Tangri

Abstract: Recently, there has been great interest in the field of audio style transfer, where a stylized audio is generated by imposing the style of a reference audio on the content of a target audio. We improve on the current approaches which use neural networks to extract the content and the style of the audio signal and propose a new autoencoder based architecture for the task. This network generates a s… ▽ More Recently, there has been great interest in the field of audio style transfer, where a stylized audio is generated by imposing the style of a reference audio on the content of a target audio. We improve on the current approaches which use neural networks to extract the content and the style of the audio signal and propose a new autoencoder based architecture for the task. This network generates a stylized audio for a content audio in a single forward pass. The proposed network architecture proves to be advantageous over the quality of audio produced and the time taken to train the network. The network is experimented on speech signals to confirm the validity of our proposal. △ Less

Submitted 26 December, 2018; v1 submitted 17 December, 2018; originally announced December 2018.

arXiv:1811.12488 [pdf, other]

Leveraging Deep Stein's Unbiased Risk Estimator for Unsupervised X-ray Denoising

Authors: Fahad Shamshad, Muhammad Awais, Muhammad Asim, Zain ul Aabidin Lodhi, Muhammad Umair, Ali Ahmed

Abstract: Among the plethora of techniques devised to curb the prevalence of noise in medical images, deep learning based approaches have shown the most promise. However, one critical limitation of these deep learning based denoisers is the requirement of high-quality noiseless ground truth images that are difficult to obtain in many medical imaging applications such as X-rays. To circumvent this issue, we… ▽ More Among the plethora of techniques devised to curb the prevalence of noise in medical images, deep learning based approaches have shown the most promise. However, one critical limitation of these deep learning based denoisers is the requirement of high-quality noiseless ground truth images that are difficult to obtain in many medical imaging applications such as X-rays. To circumvent this issue, we leverage recently proposed approach of [7] that incorporates Stein's Unbiased Risk Estimator (SURE) to train a deep convolutional neural network without requiring denoised ground truth X-ray data. Our experimental results demonstrate the effectiveness of SURE based approach for denoising X-ray images. △ Less

Submitted 29 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/223

arXiv:1811.06103 [pdf]

Deep Neural Networks based Modrec: Some Results with Inter-Symbol Interference and Adversarial Examples

Authors: S. Asim Ahmed, Subhashish Chakravarty, Michael Newhouse

Abstract: Recent successes and advances in Deep Neural Networks (DNN) in machine vision and Natural Language Processing (NLP) have motivated their use in traditional signal processing and communications systems. In this paper, we present results of such applications to the problem of automatic modulation recognition. Variations in wireless communication channels are represented by statistical channel models… ▽ More Recent successes and advances in Deep Neural Networks (DNN) in machine vision and Natural Language Processing (NLP) have motivated their use in traditional signal processing and communications systems. In this paper, we present results of such applications to the problem of automatic modulation recognition. Variations in wireless communication channels are represented by statistical channel models and their parameterization will increase with the advent of 5G. In this paper, we report effect of simple two path channel model on our naive deep neural network based implementation. We also report impact of adversarial perturbation to the input signal. △ Less

Submitted 14 November, 2018; originally announced November 2018.

Comments: 4 pages, 13 figures

arXiv:1808.05854 [pdf, other]

Robust Compressive Phase Retrieval via Deep Generative Priors

Authors: Fahad Shamshad, Ali Ahmed

Abstract: This paper proposes a new framework to regularize the highly ill-posed and non-linear phase retrieval problem through deep generative priors using simple gradient descent algorithm. We experimentally show effectiveness of proposed algorithm for random Gaussian measurements (practically relevant in imaging through scattering media) and Fourier friendly measurements (relevant in optical set ups). We… ▽ More This paper proposes a new framework to regularize the highly ill-posed and non-linear phase retrieval problem through deep generative priors using simple gradient descent algorithm. We experimentally show effectiveness of proposed algorithm for random Gaussian measurements (practically relevant in imaging through scattering media) and Fourier friendly measurements (relevant in optical set ups). We demonstrate that proposed approach achieves impressive results when compared with traditional hand engineered priors including sparsity and denoising frameworks for number of measurements and robustness against noise. Finally, we show the effectiveness of the proposed approach on a real transmission matrix dataset in an actual application of multiple scattering media imaging. △ Less

Submitted 17 August, 2018; originally announced August 2018.

Comments: Preprint. Work in progress

arXiv:1711.11179 [pdf, other]

State Space LSTM Models with Particle MCMC Inference

Authors: Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander J Smola

Abstract: Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike… ▽ More Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike \cite{zaheer2017latent}, we do not make any factorization assumptions in our inference algorithm. We present an efficient sampler based on sequential Monte Carlo (SMC) method that draws from the joint posterior directly. Experimental results confirms the superiority and stability of this SMC inference algorithm on a variety of domains. △ Less

Submitted 29 November, 2017; originally announced November 2017.

arXiv:1512.01845 [pdf, other]

Explaining reviews and ratings with PACO: Poisson Additive Co-Clustering

Authors: Chao-Yuan Wu, Alex Beutel, Amr Ahmed, Alexander J. Smola

Abstract: Understanding a user's motivations provides valuable information beyond the ability to recommend items. Quite often this can be accomplished by perusing both ratings and review texts, since it is the latter where the reasoning for specific preferences is explicitly expressed. Unfortunately matrix factorization approaches to recommendation result in large, complex models that are difficult to int… ▽ More Understanding a user's motivations provides valuable information beyond the ability to recommend items. Quite often this can be accomplished by perusing both ratings and review texts, since it is the latter where the reasoning for specific preferences is explicitly expressed. Unfortunately matrix factorization approaches to recommendation result in large, complex models that are difficult to interpret and give recommendations that are hard to clearly explain to users. In contrast, in this paper, we attack this problem through succinct additive co-clustering. We devise a novel Bayesian technique for summing co-clusterings of Poisson distributions. With this novel technique we propose a new Bayesian model for joint collaborative filtering of ratings and text reviews through a sum of simple co-clusterings. The simple structure of our model yields easily interpretable recommendations. Even with a simple, succinct structure, our model outperforms competitors in terms of predicting ratings with reviews. △ Less

Submitted 6 December, 2015; originally announced December 2015.

arXiv:1501.00199 [pdf, other]

ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly

Authors: Alex Beutel, Amr Ahmed, Alexander J. Smola

Abstract: Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model… ▽ More Matrix completion and approximation are popular tools to capture a user's preferences for recommendation and to approximate missing data. Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model learned, significantly beats all factorization approaches to matrix approximation. Even more surprisingly, we find that summing over small co-clusterings is more effective in modeling matrices than classic co-clustering, which uses just one large partitioning of the matrix. Following Occam's razor principle suggests that the simple structure induced by our model better captures the latent preferences and decision making processes present in the real world than classic co-clustering or matrix factorization. We provide an iterative minimization algorithm, a collapsed Gibbs sampler, theoretical guarantees for matrix approximation, and excellent empirical evidence for the efficacy of our approach. We achieve state-of-the-art results on the Netflix problem with a fraction of the model complexity. △ Less

Submitted 31 December, 2014; originally announced January 2015.

Comments: 22 pages, under review for conference publication

ACM Class: H.2.8; H.3.3; I.2.6

arXiv:1309.3268 [pdf, ps, other]

Transmuted Generalized Inverse Weibull Distribution

Authors: Faton Merovci, Ibrahim Elbatal, Alaa Ahmed

Abstract: A generalization of the generalized inverse Weibull distribution so-called transmuted generalized inverse Weibull dis- tribution is proposed and studied. We will use the quadratic rank transmutation map (QRTM) in order to generate a flexible family of probability distributions taking generalized inverse Weibull distribution as the base value distribution by introducing a new parameter that would o… ▽ More A generalization of the generalized inverse Weibull distribution so-called transmuted generalized inverse Weibull dis- tribution is proposed and studied. We will use the quadratic rank transmutation map (QRTM) in order to generate a flexible family of probability distributions taking generalized inverse Weibull distribution as the base value distribution by introducing a new parameter that would offer more distributional flexibility. Various structural properties including explicit expressions for the mo- ments, quantiles, and moment generating function of the new dis- tribution are derived.We proposed the method of maximum likelihood for estimating the model parameters and obtain the observed information matrix. A real data set are used to compare the exibility of the transmuted version versus the generalized inverseWeibull distribution. △ Less

Submitted 11 September, 2013; originally announced September 2013.

arXiv:1308.5146 [pdf, other]

Compressive Multiplexing of Correlated Signals

Authors: Ali Ahmed, Justin Romberg

Abstract: We present a general architecture for the acquisition of ensembles of correlated signals. The signals are multiplexed onto a single line by mixing each one against a different code and then adding them together, and the resulting signal is sampled at a high rate. We show that if the $M$ signals, each bandlimited to $W/2$ Hz, can be approximated by a superposition of $R < M$ underlying signals, the… ▽ More We present a general architecture for the acquisition of ensembles of correlated signals. The signals are multiplexed onto a single line by mixing each one against a different code and then adding them together, and the resulting signal is sampled at a high rate. We show that if the $M$ signals, each bandlimited to $W/2$ Hz, can be approximated by a superposition of $R < M$ underlying signals, then the ensemble can be recovered by sampling at a rate within a logarithmic factor of $RW$ (as compared to the Nyquist rate of $MW$). This sampling theorem shows that the correlation structure of the signal ensemble can be exploited in the acquisition process even though it is unknown a priori. The reconstruction of the ensemble is recast as a low-rank matrix recovery problem from linear measurements. The architectures we are considering impose a certain type of structure on the linear operators. Although our results depend on the mixing forms being random, this imposed structure results in a very different type of random projection than those analyzed in the low-rank recovery literature to date. △ Less

Submitted 12 June, 2018; v1 submitted 23 August, 2013; originally announced August 2013.

Comments: 38 pages, 11 figures

arXiv:1203.3463 [pdf]

Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream

Authors: Amr Ahmed, Eric P. Xing

Abstract: Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' distribution and popularity are time-evolving. Several models exist that model the evolution of some but not all of the above aspects. In t… ▽ More Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collections often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' distribution and popularity are time-evolving. Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce infinite dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the documents within each epoch are exchangeable but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the representation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the efficacy of our model on both simulated and real datasets with favorable outcome. △ Less

Submitted 15 March, 2012; originally announced March 2012.

Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Report number: UAI-P-2010-PG-20-29

arXiv:0912.5507 [pdf, other]

MedLDA: A General Framework of Maximum Margin Supervised Topic Models

Authors: Jun Zhu, Amr Ahmed, Eric P. Xing

Abstract: Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of max-margin supervised topic models for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allo… ▽ More Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of max-margin supervised topic models for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the max-margin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction tasks. The general principle of MedLDA can be applied to perform joint max-margin learning and maximum likelihood estimation for arbitrary topic models, directed or undirected, and supervised or unsupervised, when the supervised side information is available. We develop efficient variational methods for posterior inference and parameter estimation, and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihood-based topic models on movie review and 20 Newsgroups data sets. △ Less

Submitted 30 December, 2009; originally announced December 2009.

Comments: 27 Pages

Journal ref: Journal of Machine Learning Research, 13(Aug): 2237--2278, 2012

arXiv:0901.0138 [pdf, other]

Time-Varying Networks: Recovering Temporally Rewiring Genetic Networks During the Life Cycle of Drosophila melanogaster

Authors: Amr Ahmed, Le Song, Eric P. Xing

Abstract: Due to the dynamic nature of biological systems, biological networks underlying temporal process such as the development of {\it Drosophila melanogaster} can exhibit significant topological changes to facilitate dynamic regulatory functions. Thus it is essential to develop methodologies that capture the temporal evolution of networks, which make it possible to study the driving forces underlying… ▽ More Due to the dynamic nature of biological systems, biological networks underlying temporal process such as the development of {\it Drosophila melanogaster} can exhibit significant topological changes to facilitate dynamic regulatory functions. Thus it is essential to develop methodologies that capture the temporal evolution of networks, which make it possible to study the driving forces underlying dynamic rewiring of gene regulation circuity, and to predict future network structures. Using a new machine learning method called Tesla, which builds on a novel temporal logistic regression technique, we report the first successful genome-wide reverse-engineering of the latent sequence of temporally rewiring gene networks over more than 4000 genes during the life cycle of \textit{Drosophila melanogaster}, given longitudinal gene expression measurements and even when a single snapshot of such measurement resulted from each (time-specific) network is available. Our methods offer the first glimpse of time-specific snapshots and temporal evolution patterns of gene networks in a living organism during its full developmental course. The recovered networks with this unprecedented resolution chart the onset and duration of many gene interactions which are missed by typical static network analysis, and are suggestive of a wide array of other temporal behaviors of the gene network over time not noticed before. △ Less

Submitted 6 January, 2009; v1 submitted 31 December, 2008; originally announced January 2009.

Comments: Correcting some figure formatting errors

Report number: Amr Ahmed, Le Song, Eric Xing (2008). Time-Varying Networks: Reconstructing Temporally Rewiring Genetic Interactions During the Life Cycle of Drosophila melanogaster. CMU-MLD Technical Report CMU-ML-08-118

arXiv:0812.5087 [pdf, ps, other]

doi 10.1214/09-AOAS308

Estimating time-varying networks

Authors: Mladen Kolar, Le Song, Amr Ahmed, Eric P. Xing

Abstract: Stochastic networks are a plausible representation of the relational information among entities in dynamic systems such as living cells or social communities. While there is a rich literature in estimating a static or temporally invariant network from observation data, little has been done toward estimating time-varying networks from time series of entity attributes. In this paper we present two n… ▽ More Stochastic networks are a plausible representation of the relational information among entities in dynamic systems such as living cells or social communities. While there is a rich literature in estimating a static or temporally invariant network from observation data, little has been done toward estimating time-varying networks from time series of entity attributes. In this paper we present two new machine learning methods for estimating time-varying networks, which both build on a temporally smoothed $l_1$-regularized logistic regression formalism that can be cast as a standard convex-optimization problem and solved efficiently using generic solvers scalable to large networks. We report promising results on recovering simulated time-varying networks. For real data sets, we reverse engineer the latent sequence of temporally rewiring political networks between Senators from the US Senate voting records and the latent evolving regulatory networks underlying 588 genes across the life cycle of Drosophila melanogaster from the microarray time course. △ Less

Submitted 20 October, 2010; v1 submitted 30 December, 2008; originally announced December 2008.

Comments: Published in at http://dx.doi.org/10.1214/09-AOAS308 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS308

Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 1, 94-123

Showing 1–34 of 34 results for author: Ahmed, A