-
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Authors:
Dengwang Tang,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomiz…
▽ More
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomized mixture of multiple arms. The goal thus is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We propose a novel, parameter-free algorithm, called the Score Function-based Successive Reject (SFSR) algorithm, that combines the classical successive reject framework with a novel score-function-based rejection criteria based on linear programming theory to identify the optimal support. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$ and some constants that characterize the hardness of the problem instance. We also develop an information theoretic lower bound on the error probability that shows that these constants appropriately characterize the problem difficulty. We validate this empirically on a number of average and hard instances.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach
Authors:
Dengwang Tang,
Rahul Jain,
Botao Hao,
Zheng Wen
Abstract:
In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. We assume that the offline dataset is generated by an expert but with unknown level of competence, i.e., it is not perfect and not necessarily using the optimal policy. We show that if the learning agent models the behavioral policy (paramet…
▽ More
In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. We assume that the offline dataset is generated by an expert but with unknown level of competence, i.e., it is not perfect and not necessarily using the optimal policy. We show that if the learning agent models the behavioral policy (parameterized by a competence parameter) used by the expert, it can do substantially better in terms of minimizing cumulative regret, than if it doesn't do that. We establish an upper bound on regret of the exact informed PSRL algorithm that scales as $\tilde{O}(\sqrt{T})$. This requires a novel prior-dependent regret analysis of Bayesian online learning algorithms for the infinite horizon setting. We then propose the Informed RLSVI algorithm to efficiently approximate the iPSRL algorithm.
△ Less
Submitted 1 February, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Posterior Sampling-based Online Learning for Episodic POMDPs
Authors:
Dengwang Tang,
Dongze Ye,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
Learning in POMDPs is known to be significantly harder than MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms…
▽ More
Learning in POMDPs is known to be significantly harder than MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs. We show that the Bayesian regret of the proposed algorithm scales as the square root of the number of episodes, matching the lower bound, and is polynomial in the other parameters. In a general setting, its regret scales exponentially in the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, when the POMDP is undercomplete and weakly revealing (a common assumption in the recent literature), we establish a polynomial Bayesian regret bound. We finally propose a posterior sampling algorithm for multi-agent POMDPs, and show it too has sublinear regret.
△ Less
Submitted 23 May, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Leveraging Demonstrations to Improve Online Learning: Quality Matters
Authors:
Botao Hao,
Rahul Jain,
Tor Lattimore,
Benjamin Van Roy,
Zheng Wen
Abstract:
We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning…
▽ More
We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS algorithm through Bayesian bootstrap** and show substantial empirical regret reduction through experiments.
△ Less
Submitted 17 May, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Invisible Walls: Exploration of Microclimate Effects on Building Energy Consumption in New York City
Authors:
Thomas Dougherty,
Rishee Jain
Abstract:
The reduction of greenhouse gases from buildings forms the cornerstone of policy to mitigate the effects of climate change. However, the automation of urban scale building energy modeling systems required to meet global urban demand has proven challenging due to the bespoke characteristics of each city. One such point of uniqueness between cities is that of urban microclimate, which may play a maj…
▽ More
The reduction of greenhouse gases from buildings forms the cornerstone of policy to mitigate the effects of climate change. However, the automation of urban scale building energy modeling systems required to meet global urban demand has proven challenging due to the bespoke characteristics of each city. One such point of uniqueness between cities is that of urban microclimate, which may play a major role in altering the performance of energy efficiency in buildings. This research proposes a way to rapidly collect urban microclimate data through the utilization of satellite readings and climate reanalysis. We then demonstrate the potential utility of this data by composing an analysis against three years of monthly building energy consumption data from New York City. As a whole, microclimate in New York City may be responsible for large swings in urban energy consumption. We estimate that Central Park may reduce the electricity consumption of adjacent buildings by 5-10%, while vegetation overall seems to have no appreciable impact on gas consumption. We find that favorable urban microclimates may decrease the gas consumption of some buildings in New York by 71% while others may increase gas consumption by as much as 221%. Additionally, microclimates may be responsible for the decrease of electricity consumption by 28.6% in regions or increases of 77% consumption in others. This work provides a method of curating global, high resolution microclimate data, allowing researchers to explore the invisible walls of urban microclimate which interact with the buildings around them.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
NeuralArTS: Structuring Neural Architecture Search with Type Theory
Authors:
Robert Wu,
Nayan Saxena,
Rohan Jain
Abstract:
Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Develo** these search spaces is usually a manual affair with pre-optimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralA…
▽ More
Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Develo** these search spaces is usually a manual affair with pre-optimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralArTS) that categorizes the infinite set of network operations in a structured type system. We further demonstrate how NeuralArTS can be applied to convolutional layers and propose several future directions.
△ Less
Submitted 5 November, 2021; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Poisoning the Search Space in Neural Architecture Search
Authors:
Robert Wu,
Nayan Saxena,
Rohan Jain
Abstract:
Deep learning has proven to be a highly effective problem-solving tool for object detection and image segmentation across various domains such as healthcare and autonomous driving. At the heart of this performance lies neural architecture design which relies heavily on domain knowledge and prior experience on the researchers' behalf. More recently, this process of finding the most optimal architec…
▽ More
Deep learning has proven to be a highly effective problem-solving tool for object detection and image segmentation across various domains such as healthcare and autonomous driving. At the heart of this performance lies neural architecture design which relies heavily on domain knowledge and prior experience on the researchers' behalf. More recently, this process of finding the most optimal architectures, given an initial search space of possible operations, was automated by Neural Architecture Search (NAS). In this paper, we evaluate the robustness of one such algorithm known as Efficient NAS (ENAS) against data agnostic poisoning attacks on the original search space with carefully designed ineffective operations. By evaluating algorithm performance on the CIFAR-10 dataset, we empirically demonstrate how our novel search space poisoning (SSP) approach and multiple-instance poisoning attacks exploit design flaws in the ENAS controller to result in inflated prediction error rates for child networks. Our results provide insights into the challenges to surmount in using NAS for more adversarially robust architecture search.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
Authors:
Liyu Chen,
Mehdi Jafarnia-Jahromi,
Rahul Jain,
Haipeng Luo
Abstract:
We introduce a generic template for develo** regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finite-horizon approximation, which approximates the SSP model by a finite-horizon counterpart only in the analysis without explicit impl…
▽ More
We introduce a generic template for develo** regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finite-horizon approximation, which approximates the SSP model by a finite-horizon counterpart only in the analysis without explicit implementation. Using this template, we develop two new algorithms: the first one is model-free (the first in the literature to our knowledge) and minimax optimal under strictly positive costs; the second one is model-based and minimax optimal even with zero-cost state-action pairs, matching the best existing result from [Tarbouriech et al., 2021b]. Importantly, both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms. Moreover, both can be made completely parameter-free.
△ Less
Submitted 9 November, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints
Authors:
Krishna C. Kalagarla,
Rahul Jain,
Pierluigi Nuzzo
Abstract:
Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic plan…
▽ More
Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an $ε$-optimal policy, i.e., with resulting objective value within $ε$ of the optimal value and satisfying the constraints within $ε$-tolerance, with probability at least $1-δ$. The number of episodes needed is shown to be of the order $\tilde{\mathcal{O}}\big(\frac{|S||A|C^{2}H^{2}}{ε^{2}}\log\frac{1}δ\big)$, where $C$ is the upper bound on the number of possible successor states for a state-action pair. Therefore, if $C \ll |S|$, the number of episodes needed have a linear dependence on the state and action space sizes $|S|$ and $|A|$, respectively, and quadratic dependence on the time horizon $H$.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Authors:
Chen-Yu Wei,
Mehdi Jafarnia-Jahromi,
Haipeng Luo,
Rahul Jain
Abstract:
We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $\widetilde{O}(\sqrt{T})$ regret and another computationally efficient variant with…
▽ More
We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal $\widetilde{O}(\sqrt{T})$ regret and another computationally efficient variant with $\widetilde{O}(T^{3/4})$ regret, where $T$ is the number of interactions. Next, taking inspiration from adversarial linear bandits, we develop yet another efficient algorithm with $\widetilde{O}(\sqrt{T})$ regret under a different set of assumptions, improving the best existing result by Hao et al. (2020) with $\widetilde{O}(T^{2/3})$ regret. Moreover, we draw a connection between this algorithm and the Natural Policy Gradient algorithm proposed by Kakade (2002), and show that our analysis improves the sample complexity bound recently given by Agarwal et al. (2020).
△ Less
Submitted 26 April, 2021; v1 submitted 23 July, 2020;
originally announced July 2020.
-
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
Authors:
Mehdi Jafarnia-Jahromi,
Chen-Yu Wei,
Rahul Jain,
Haipeng Luo
Abstract:
Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose Exploration Enhanced Q-learning (EE-QL), a model-free algorithm for infinite-horizon average-reward Markov Decision Processes (MDPs) that achieves regret bound of $O(\sqrt{T})$ fo…
▽ More
Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose Exploration Enhanced Q-learning (EE-QL), a model-free algorithm for infinite-horizon average-reward Markov Decision Processes (MDPs) that achieves regret bound of $O(\sqrt{T})$ for the general class of weakly communicating MDPs, where $T$ is the number of interactions. EE-QL assumes that an online concentrating approximation of the optimal average reward is available. This is the first model-free learning algorithm that achieves $O(\sqrt T)$ regret without the ergodic assumption, and matches the lower bound in terms of $T$ except for logarithmic factors. Experiments show that the proposed algorithm performs as well as the best known model-based algorithms.
△ Less
Submitted 8 December, 2020; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Randomized Policy Learning for Continuous State and Action MDPs
Authors:
Hiteshi Sharma,
Rahul Jain
Abstract:
Deep reinforcement learning methods have achieved state-of-the-art results in a variety of challenging, high-dimensional domains ranging from video games to locomotion. The key to success has been the use of deep neural networks used to approximate the policy and value function. Yet, substantial tuning of weights is required for good results. We instead use randomized function approximation. Such…
▽ More
Deep reinforcement learning methods have achieved state-of-the-art results in a variety of challenging, high-dimensional domains ranging from video games to locomotion. The key to success has been the use of deep neural networks used to approximate the policy and value function. Yet, substantial tuning of weights is required for good results. We instead use randomized function approximation. Such networks are not only cheaper than training fully connected networks but also improve the numerical performance. We present \texttt{RANDPOL}, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm. Then we show the numerical performance on challenging environments and compare them with deep neural network based algorithms.
△ Less
Submitted 15 November, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Authors:
Chen-Yu Wei,
Mehdi Jafarnia-Jahromi,
Haipeng Luo,
Hiteshi Sharma,
Rahul Jain
Abstract:
Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under…
▽ More
Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019a) for ergodic MDPs in the infinite-horizon average-reward setting.
△ Less
Submitted 25 February, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
On Model Stability as a Function of Random Seed
Authors:
Pranava Madhyastha,
Rishabh Jain
Abstract:
In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradient-based and surrogate model based (LIME) interpretations. Our analysis suggests that…
▽ More
In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradient-based and surrogate model based (LIME) interpretations. Our analysis suggests that random seeds can adversely affect the consistency of models resulting in counterfactual interpretations. We propose a technique called Aggressive Stochastic Weight Averaging (ASWA)and an extension called Norm-filtered Aggressive Stochastic Weight Averaging (NASWA) which improves the stability of models over random seeds. With our ASWA and NASWA based optimization, we are able to improve the robustness of the original model, on average reducing the standard deviation of the model's performance by 72%.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Authors:
Yaman Kumar,
Rohit Jain,
Khwaja Mohd. Salik,
Rajiv Ratn Shah,
Yifang yin,
Roger Zimmermann
Abstract:
Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular lan…
▽ More
Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary map**. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper.
△ Less
Submitted 28 June, 2019;
originally announced July 2019.
-
Scrubbing Sensitive PHI Data from Medical Records made Easy by SpaCy -- A Scalable Model Implementation Comparisons
Authors:
Rashmi Jain,
Dinah Samuel Anand,
Vijayalakshmi Janakiraman
Abstract:
De-identification of clinical records is an extremely important process which enables the use of the wealth of information present in them. There are a lot of techniques available for this but none of the method implementation has evaluated the scalability, which is an important benchmark. We evaluated numerous deep learning techniques such as BiLSTM-CNN, IDCNN, CRF, BiLSTM-CRF, SpaCy, etc. on bot…
▽ More
De-identification of clinical records is an extremely important process which enables the use of the wealth of information present in them. There are a lot of techniques available for this but none of the method implementation has evaluated the scalability, which is an important benchmark. We evaluated numerous deep learning techniques such as BiLSTM-CNN, IDCNN, CRF, BiLSTM-CRF, SpaCy, etc. on both the performance and efficiency. We propose that the SpaCy model implementation for scrubbing sensitive PHI data from medical records is both well performing and extremely efficient compared to other published models.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Improving performance and inference on audio classification tasks using capsule networks
Authors:
Royal Jain
Abstract:
Classification of audio samples is an important part of many auditory systems. Deep learning models based on the Convolutional and the Recurrent layers are state-of-the-art in many such tasks. In this paper, we approach audio classification tasks using capsule networks trained by recently proposed dynamic routing-by-agreement mechanism. We propose an architecture for capsule networks fit for audio…
▽ More
Classification of audio samples is an important part of many auditory systems. Deep learning models based on the Convolutional and the Recurrent layers are state-of-the-art in many such tasks. In this paper, we approach audio classification tasks using capsule networks trained by recently proposed dynamic routing-by-agreement mechanism. We propose an architecture for capsule networks fit for audio classification tasks and study the impact of various parameters on classification accuracy. Further, we suggest modifications for regularization and multi-label classification. We also develop insights into the data using capsule outputs and show the utility of the learned network for transfer learning. We perform experiments on 7 datasets of different domains and sizes and show significant improvements in performance compared to strong baseline models. To the best of our knowledge, this is the first detailed study about the application of capsule networks in the audio domain.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Machine Learning for Anomaly Detection and Categorization in Multi-cloud Environments
Authors:
Tara Salman,
Deval Bhamare,
Aiman Erbad,
Raj Jain,
Mohammed Samaka
Abstract:
Recently, advances in machine learning techniques have attracted the attention of the research community to build intrusion detection systems (IDS) that can detect anomalies in the network traffic. Most of the research works, however, do not differentiate among different types of attacks. This is, in fact, necessary for appropriate countermeasures and defense against attacks. In this paper, we inv…
▽ More
Recently, advances in machine learning techniques have attracted the attention of the research community to build intrusion detection systems (IDS) that can detect anomalies in the network traffic. Most of the research works, however, do not differentiate among different types of attacks. This is, in fact, necessary for appropriate countermeasures and defense against attacks. In this paper, we investigate both detecting and categorizing anomalies rather than just detecting, which is a common trend in the contemporary research works. We have used a popular publicly available dataset to build and test learning models for both detection and categorization of different attacks. To be precise, we have used two supervised machine learning techniques, namely linear regression (LR) and random forest (RF). We show that even if detection is perfect, categorization can be less accurate due to similarities between attacks. Our results demonstrate more than 99% detection accuracy and categorization accuracy of 93.6%, with the inability to categorize some attacks. Further, we argue that such categorization can be applied to multi-cloud environments using the same machine learning techniques.
△ Less
Submitted 23 October, 2018;
originally announced December 2018.
-
Feasibility of Supervised Machine Learning for Cloud Security
Authors:
Deval Bhamare,
Tara Salman,
Mohammed Samaka,
Aiman Erbad,
Raj Jain
Abstract:
Cloud computing is gaining significant attention, however, security is the biggest hurdle in its wide acceptance. Users of cloud services are under constant fear of data loss, security threats and availability issues. Recently, learning-based methods for security applications are gaining popularity in the literature with the advents in machine learning techniques. However, the major challenge in t…
▽ More
Cloud computing is gaining significant attention, however, security is the biggest hurdle in its wide acceptance. Users of cloud services are under constant fear of data loss, security threats and availability issues. Recently, learning-based methods for security applications are gaining popularity in the literature with the advents in machine learning techniques. However, the major challenge in these methods is obtaining real-time and unbiased datasets. Many datasets are internal and cannot be shared due to privacy issues or may lack certain statistical characteristics. As a result of this, researchers prefer to generate datasets for training and testing purpose in the simulated or closed experimental environments which may lack comprehensiveness. Machine learning models trained with such a single dataset generally result in a semantic gap between results and their application. There is a dearth of research work which demonstrates the effectiveness of these models across multiple datasets obtained in different environments. We argue that it is necessary to test the robustness of the machine learning models, especially in diversified operating conditions, which are prevalent in cloud scenarios. In this work, we use the UNSW dataset to train the supervised machine learning models. We then test these models with ISOT dataset. We present our results and argue that more research in the field of machine learning is still required for its applicability to the cloud security.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Small Boxes Big Data: A Deep Learning Approach to Optimize Variable Sized Bin Packing
Authors:
Feng Mao,
Edgar Blanco,
Mingang Fu,
Rohit Jain,
Anurag Gupta,
Sebastien Mancel,
Rong Yuan,
Stephen Guo,
Sai Kumar,
Yayang Tian
Abstract:
Bin Packing problems have been widely studied because of their broad applications in different domains. Known as a set of NP-hard problems, they have different vari- ations and many heuristics have been proposed for obtaining approximate solutions. Specifically, for the 1D variable sized bin packing problem, the two key sets of optimization heuristics are the bin assignment and the bin allocation.…
▽ More
Bin Packing problems have been widely studied because of their broad applications in different domains. Known as a set of NP-hard problems, they have different vari- ations and many heuristics have been proposed for obtaining approximate solutions. Specifically, for the 1D variable sized bin packing problem, the two key sets of optimization heuristics are the bin assignment and the bin allocation. Usually the performance of a single static optimization heuristic can not beat that of a dynamic one which is tailored for each bin packing instance. Building such an adaptive system requires modeling the relationship between bin features and packing perform profiles. The primary drawbacks of traditional AI machine learnings for this task are the natural limitations of feature engineering, such as the curse of dimensionality and feature selection quality. We introduce a deep learning approach to overcome the drawbacks by applying a large training data set, auto feature selection and fast, accurate labeling. We show in this paper how to build such a system by both theoretical formulation and engineering practices. Our prediction system achieves up to 89% training accuracy and 72% validation accuracy to select the best heuristic that can generate a better quality bin packing solution.
△ Less
Submitted 14 February, 2017;
originally announced February 2017.
-
On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits
Authors:
Naumaan Nayyar,
Dileep Kalathil,
Rahul Jain
Abstract:
We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an unknown distribution. The objective is to design a policy that maximiz…
▽ More
We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an unknown distribution. The objective is to design a policy that maximizes the expected reward over a time horizon for a single player setting and the sum of expected rewards for the multiplayer setting. In the multiplayer setting, arms may give different rewards to different players. There is no separate channel for coordination among the players. Any attempt at communication is costly and adds to regret. We propose two decentralizable policies, $\tt E^3$ ($\tt E$-$\tt cubed$) and $\tt E^3$-$\tt TS$, that can be used in both single player and multiplayer settings. These policies are shown to yield expected regret that grows at most as O($\log^{1+ε} T$). It is well known that $\log T$ is the lower bound on the rate of growth of regret even in a centralized case. The proposed algorithms improve on prior work where regret grew at O($\log^2 T$). More fundamentally, these policies address the question of additional cost incurred in decentralized online learning, suggesting that there is at most an $ε$-factor cost in terms of order of regret. This solves a problem of relevance in many domains and had been open for a while.
△ Less
Submitted 1 December, 2016; v1 submitted 4 May, 2015;
originally announced May 2015.