Search | arXiv e-print repository

Decentralized Competing Bandits in Non-Stationary Matching Markets

Authors: Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran, Tara Javidi, Arya Mazumdar

Abstract: Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side a… ▽ More Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side agents have unknown and different preferences over the supply-side (arms), but the arms have fixed and known preference over the agents. We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms. The complexity in understanding such a system stems from the fact that the competing bandits choose their actions in an asynchronous fashion, and the lower ranked agents only get to learn from a set of arms, not \emph{dominated} by the higher ranked agents, which leads to \emph{forced exploration}. With carefully defined complexity parameters, we characterize this \emph{forced exploration} and obtain sub-linear (logarithmic) regret of \texttt{DNCB}. Furthermore, we validate our theoretical findings via experiments. △ Less

Submitted 31 May, 2022; originally announced June 2022.

arXiv:2203.06297 [pdf, other]

Instance-Dependent Regret Analysis of Kernelized Bandits

Authors: Shubhanshu Shekhar, Tara Javidi

Abstract: We study the kernelized bandit problem, that involves designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$ with a norm bounded by $M<\infty$ in a Reproducing Kernel Hilbert Space~(RKHS) associated with a positive definite kernel $K$. Prior results, working in a \emph{minimax framework}, have characterized the wo… ▽ More We study the kernelized bandit problem, that involves designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$ with a norm bounded by $M<\infty$ in a Reproducing Kernel Hilbert Space~(RKHS) associated with a positive definite kernel $K$. Prior results, working in a \emph{minimax framework}, have characterized the worst-case~(over all functions in the problem class) limits on regret achievable by \emph{any} algorithm, and have constructed algorithms with matching~(modulo polylogarithmic factors) worst-case performance for the \matern family of kernels. These results suffer from two drawbacks. First, the minimax lower bound gives no information about the limits of regret achievable by the commonly used algorithms on specific problem instances. Second, due to their worst-case nature, the existing upper bound analysis fails to adapt to easier problem instances within the function class. Our work takes steps to address both these issues. First, we derive \emph{instance-dependent} regret lower bounds for algorithms with uniformly~(over the function class) vanishing normalized cumulative regret. Our result, valid for all the practically relevant kernelized bandits algorithms, such as, GP-UCB, GP-TS and SupKernelUCB, identifies a fundamental complexity measure associated with every problem instance. We then address the second issue, by proposing a new minimax near-optimal algorithm which also adapts to easier problem instances. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 26 pages, 1 figure

arXiv:2110.15458 [pdf, ps, other]

Open Problem: Tight Online Confidence Intervals for RKHS Elements

Authors: Sattar Vakili, Jonathan Scarlett, Tara Javidi

Abstract: Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the exis… ▽ More Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2107.07211 [pdf, other]

Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo

Authors: Vyacheslav Kungurtsev, Adam Cobb, Tara Javidi, Brian Jalaian

Abstract: Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large para… ▽ More Federated learning performed by a decentralized networks of agents is becoming increasingly important with the prevalence of embedded software on autonomous devices. Bayesian approaches to learning benefit from offering more information as to the uncertainty of a random quantity, and Langevin and Hamiltonian methods are effective at realizing sampling from an uncertain distribution with large parameter dimensions. Such methods have only recently appeared in the decentralized setting, and either exclusively use stochastic gradient Langevin and Hamiltonian Monte Carlo approaches that require a diminishing stepsize to asymptotically sample from the posterior and are known in practice to characterize uncertainty less faithfully than constant step-size methods with a Metropolis adjustment, or assume strong convexity properties of the potential function. We present the first approach to incorporating constant stepsize Metropolis-adjusted HMC in the decentralized sampling framework, show theoretical guarantees for consensus and probability distance to the posterior stationary distribution, and demonstrate their effectiveness numerically on standard real world problems, including decentralized learning of neural networks which is known to be highly non-convex. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2012.04137 [pdf, ps, other]

Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach

Authors: Dhruva Kartik, Neeraj Sood, Urbashi Mitra, Tara Javidi

Abstract: The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existin… ▽ More The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existing approaches. The posterior distribution on the pmfs in the Bayesian setting allows for a tighter computation of upper confidence bounds which leads to significant performance gains in practice. Using this approach, adaptive sampling protocols are proposed for estimating SARS-CoV-2 seroprevalence in various groups such as location and ethnicity. The effectiveness of this strategy is discussed using data obtained from a seroprevalence survey in Los Angeles county. △ Less

Submitted 7 December, 2020; originally announced December 2020.

arXiv:2012.00077 [pdf, ps, other]

On Error Exponents of Almost-Fixed-Length Channel Codes and Hypothesis Tests

Authors: Anusha Lalitha, Tara Javidi

Abstract: We examine a new class of channel coding strategies, and hypothesis tests referred to as almost-fixed-length strategies that have little flexibility in the stop** time over fixed-length strategies. The stop** time of these strategies is allowed to be slightly large only on a rare set of sample paths with an exponentially small probability. We show that almost-fixed-length channel coding strate… ▽ More We examine a new class of channel coding strategies, and hypothesis tests referred to as almost-fixed-length strategies that have little flexibility in the stop** time over fixed-length strategies. The stop** time of these strategies is allowed to be slightly large only on a rare set of sample paths with an exponentially small probability. We show that almost-fixed-length channel coding strategies can achieve Burnashev's optimal error exponent. Similarly, almost-fixed length hypothesis tests are shown to bridge the gap between hypothesis testing with fixed sample size and sequential hypothesis testing and improve the trade-off between type-I and type-II error exponents. △ Less

Submitted 30 November, 2020; originally announced December 2020.

arXiv:2009.02326 [pdf, other]

doi 10.1145/3400302.3415671

CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

Authors: Mojan Javaheripi, Mohammad Samragh, Gregory Fields, Tara Javidi, Farinaz Koushanfar

Abstract: We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the groun… ▽ More We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the ground-truth class of Trojan samples without the need for labeled data, model retraining, or prior assumptions on the trigger or the attack. We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers. CLEANN is devised based on algorithm/hardware co-design and is equipped with specialized hardware to enable efficient real-time execution on resource-constrained embedded platforms. Proof of concept evaluations on CLEANN for the state-of-the-art Neural Trojan attacks on visual benchmarks demonstrate its competitive advantage in terms of attack resiliency and execution overhead. △ Less

Submitted 4 September, 2020; originally announced September 2020.

arXiv:2005.07814 [pdf, other]

Low Complexity Sequential Search with Size-Dependent Measurement Noise

Authors: Sung-En Chiu, Tara Javidi

Abstract: This paper considers a target localization problem where at any given time an agent can choose a region to query for the presence of the target in that region. The measurement noise is assumed to be increasing with the size of the query region the agent chooses. Motivated by practical applications such as initial beam alignment in array processing, heavy hitter detection in networking, and visual… ▽ More This paper considers a target localization problem where at any given time an agent can choose a region to query for the presence of the target in that region. The measurement noise is assumed to be increasing with the size of the query region the agent chooses. Motivated by practical applications such as initial beam alignment in array processing, heavy hitter detection in networking, and visual search in robotics, we consider practically important complexity constraints/metrics: \textit{time complexity}, \textit{computational and memory complexity}, and the complexity of possible query sets in terms of geometry and cardinality. Two novel search strategy, $dyaPM$ and $hiePM$, are proposed. Pertinent to the practicality of out solutions, $dyaPM$ and $hiePM$ are of a connected query geometry (i.e. query set is always a connected set) implemented with low computational and memory complexity. Additionally, $hiePM$ has a hierarchical structure and, hence, a further reduction in the cardinality of possible query sets, making $hiePM$ practically suitable for applications such as beamforming in array processing where memory limitations favors a smaller codebook size. Through a unified analysis with Extrinsic Jensen Shannon (EJS) Divergence, $dyaPM$ is shown to be asymptotically optimal in search time complexity (asymptotic in both resolution (rate) and error (reliability)). On the other hand, $hiePM$ is shown to be near-optimal in rate. In addition, both $hiePM$ and $dyaPM$ are shown to outperform prior work in the non-asymptotic regime. △ Less

Submitted 1 September, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: submitted revision to IEEE Transaction on Information Theory

arXiv:2005.04832 [pdf, other]

Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

Authors: Shubhanshu Shekhar, Tara Javidi

Abstract: We aim to optimize a black-box function $f:\mathcal{X} \mapsto \mathbb{R}$ under the assumption that $f$ is Hölder smooth and has bounded norm in the RKHS associated with a given kernel $K$. This problem is known to have an agnostic Gaussian Process (GP) bandit interpretation in which an appropriately constructed GP surrogate model with kernel $K$ is used to obtain an upper confidence bound (UCB)… ▽ More We aim to optimize a black-box function $f:\mathcal{X} \mapsto \mathbb{R}$ under the assumption that $f$ is Hölder smooth and has bounded norm in the RKHS associated with a given kernel $K$. This problem is known to have an agnostic Gaussian Process (GP) bandit interpretation in which an appropriately constructed GP surrogate model with kernel $K$ is used to obtain an upper confidence bound (UCB) algorithm. In this paper, we propose a new algorithm (\texttt{LP-GP-UCB}) where the usual GP surrogate model is augmented with Local Polynomial (LP) estimators of the Hölder smooth function $f$ to construct a multi-scale UCB guiding the search for the optimizer. We analyze this algorithm and derive high probability bounds on its simple and cumulative regret. We then prove that the elements of many common RKHS are Hölder smooth and obtain the corresponding Hölder smoothness parameters, and hence, specialize our regret bounds for several commonly used kernels. When specialized to the Squared Exponential (SE) kernel, \texttt{LP-GP-UCB} matches the optimal performance, while for the case of Matérn kernels $(K_ν)_{ν>0}$, it results in uniformly tighter regret bounds for all values of the smoothness parameter $ν>0$. Most notably, for certain ranges of $ν$, the algorithm achieves near-optimal bounds on simple and cumulative regrets, matching the algorithm-independent lower bounds up to polylog factors, and thus closing the large gap between the existing upper and lower bounds for these values of $ν$. Additionally, our analysis provides the first explicit regret bounds, in terms of the budget $n$, for the Rational-Quadratic (RQ) and Gamma-Exponential (GE). Finally, experiments with synthetic functions as well as a CNN hyperparameter tuning task demonstrate the practical benefits of our multi-scale partitioning approach over some existing algorithms numerically. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 20 pages, 2 figures. Preliminary version -- feedback welcome

arXiv:2004.04249 [pdf, other]

doi 10.1145/3377930.3390226

GeneCAI: Genetic Evolution for Acquiring Compact AI

Authors: Mojan Javaheripi, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar

Abstract: In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy such compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyper-parameters that require per-layer customization to ensure high accuracy.… ▽ More In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy such compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyper-parameters that require per-layer customization to ensure high accuracy. Choosing such hyper-parameters is cumbersome as the pertinent search space grows exponentially with model layers. This paper introduces GeneCAI, a novel optimization method that automatically learns how to tune per-layer compression hyper-parameters. We devise a bijective translation scheme that encodes compressed DNNs to the genotype space. The optimality of each genotype is measured using a multi-objective score based on accuracy and number of floating point operations. We develop customized genetic operations to iteratively evolve the non-dominated solutions towards the optimal Pareto front, thus, capturing the optimal trade-off between model accuracy and complexity. GeneCAI optimization method is highly scalable and can achieve a near-linear performance boost on distributed multi-GPU platforms. Our extensive evaluations demonstrate that GeneCAI outperforms existing rule-based and reinforcement learning methods in DNN compression by finding models that lie on a better accuracy-complexity Pareto curve. △ Less

Submitted 14 April, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

arXiv:1912.12738 [pdf, other]

Sequential Learning of CSI for MmWave Initial Alignment

Authors: Nancy Ronquillo, Sung-En Chiu, Tara Javidi

Abstract: MmWave communications aim to meet the demand for higher data rates by using highly directional beams with access to larger bandwidth. An inherent challenge is acquiring channel state information (CSI) necessary for mmWave transmission. We consider the problem of adaptive and sequential learning of the CSI during the mmWave initial alignment phase of communication. We focus on the single-user with… ▽ More MmWave communications aim to meet the demand for higher data rates by using highly directional beams with access to larger bandwidth. An inherent challenge is acquiring channel state information (CSI) necessary for mmWave transmission. We consider the problem of adaptive and sequential learning of the CSI during the mmWave initial alignment phase of communication. We focus on the single-user with a single dominant path scenario where the problem is equivalent to acquiring an optimal beamforming vector, where ideally, the resulting beams point in the direction of the angle of arrival with the desired resolution. We extend our prior by proposing two algorithms for adaptively and sequentially selecting beamforming vectors for learning of the CSI, and that formulate a Bayesian update to account for the time-varying fading model. Numerically, we analyze the outage probability and expected spectral efficiency of our proposed algorithms and demonstrate improvements over strategies that utilize a practical hierarchical codebook. △ Less

Submitted 29 December, 2019; originally announced December 2019.

Comments: To be published in the 53nd Asilomar Conference on Signals, Systems and Computers 2019

arXiv:1912.04977 [pdf, other]

Advances and Open Problems in Federated Learning

Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while kee** the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges. △ Less

Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

arXiv:1911.06471 [pdf, other]

ASCAI: Adaptive Sampling for acquiring Compact AI

Authors: Mojan Javaheripi, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar

Abstract: This paper introduces ASCAI, a novel adaptive sampling methodology that can learn how to effectively compress Deep Neural Networks (DNNs) for accelerated inference on resource-constrained platforms. Modern DNN compression techniques comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing such hyperparameters is cumbersome as the pertinent search spac… ▽ More This paper introduces ASCAI, a novel adaptive sampling methodology that can learn how to effectively compress Deep Neural Networks (DNNs) for accelerated inference on resource-constrained platforms. Modern DNN compression techniques comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing such hyperparameters is cumbersome as the pertinent search space grows exponentially with the number of model layers. To effectively traverse this large space, we devise an intelligent sampling mechanism that adapts the sampling strategy using customized operations inspired by genetic algorithms. As a special case, we consider the space of model compression as a vector space. The adaptively selected samples enable ASCAI to automatically learn how to tune per-layer compression hyperparameters to optimize the accuracy/model-size trade-off. Our extensive evaluations show that ASCAI outperforms rule-based and reinforcement learning methods in terms of compression rate and/or accuracy △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1910.12406 [pdf, other]

Adaptive Sampling for Estimating Multiple Probability Distributions

Authors: Shubhanshu Shekhar, Tara Javidi, Mohammad Ghavamzadeh

Abstract: We consider the problem of allocating samples to a finite set of discrete distributions in order to learn them uniformly well in terms of four common distance measures: $\ell_2^2$, $\ell_1$, $f$-divergence, and separation distance. To present a unified treatment of these distances, we first propose a general optimistic tracking algorithm and analyze its sample allocation performance w.r.t.~an orac… ▽ More We consider the problem of allocating samples to a finite set of discrete distributions in order to learn them uniformly well in terms of four common distance measures: $\ell_2^2$, $\ell_1$, $f$-divergence, and separation distance. To present a unified treatment of these distances, we first propose a general optimistic tracking algorithm and analyze its sample allocation performance w.r.t.~an oracle. We then instantiate this algorithm for the four distance measures and derive bounds on the regret of their resulting allocation schemes. We verify our theoretical findings through some experiments. Finally, we show that the techniques developed in the paper can be easily extended to the related setting of minimizing the average error (in terms of the four distances) in learning a set of distributions. △ Less

Submitted 6 December, 2019; v1 submitted 27 October, 2019; originally announced October 2019.

Comments: 40 pages, 3 figures

arXiv:1906.00303 [pdf, other]

Active Learning for Binary Classification with Abstention

Authors: Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi

Abstract: We construct and analyze active learning algorithms for the problem of binary classification with abstention. We consider three abstention settings: \emph{fixed-cost} and two variants of \emph{bounded-rate} abstention, and for each of them propose an active learning algorithm. All the proposed algorithms can work in the most commonly used active learning models, i.e., \emph{membership-query}, \emp… ▽ More We construct and analyze active learning algorithms for the problem of binary classification with abstention. We consider three abstention settings: \emph{fixed-cost} and two variants of \emph{bounded-rate} abstention, and for each of them propose an active learning algorithm. All the proposed algorithms can work in the most commonly used active learning models, i.e., \emph{membership-query}, \emph{pool-based}, and \emph{stream-based} sampling. We obtain upper-bounds on the excess risk of our algorithms in a general non-parametric framework and establish their minimax near-optimality by deriving matching lower-bounds. Since our algorithms rely on the knowledge of some smoothness parameters of the regression function, we then describe a new strategy to adapt to these unknown parameters in a data-driven manner. Since the worst case computational complexity of our proposed algorithms increases exponentially with the dimension of the input space, we conclude the paper with a computationally efficient variant of our algorithm whose computational complexity has a polynomial dependence over a smaller but rich class of learning problems. △ Less

Submitted 1 June, 2019; originally announced June 2019.

Comments: 42 pages, 1 figure

arXiv:1905.12791 [pdf, other]

The Label Complexity of Active Learning from Observational Data

Authors: Songbai Yan, Kamalika Chaudhuri, Tara Javidi

Abstract: Counterfactual learning from observational data involves learning a classifier on an entire population based on data that is observed conditioned on a selection policy. This work considers this problem in an active setting, where the learner additionally has access to unlabeled examples and can choose to get a subset of these labeled by an oracle. Prior work on this problem uses disagreement-bas… ▽ More Counterfactual learning from observational data involves learning a classifier on an entire population based on data that is observed conditioned on a selection policy. This work considers this problem in an active setting, where the learner additionally has access to unlabeled examples and can choose to get a subset of these labeled by an oracle. Prior work on this problem uses disagreement-based active learning, along with an importance weighted loss estimator to account for counterfactuals, which leads to a high label complexity. We show how to instead incorporate a more efficient counterfactual risk minimizer into the active learning algorithm. This requires us to modify both the counterfactual risk to make it amenable to active learning, as well as the active learning process to make it amenable to the risk. We provably demonstrate that the result of this is an algorithm which is statistically consistent as well as more label-efficient than prior work. △ Less

Submitted 27 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019

arXiv:1905.10466 [pdf, other]

Decentralized Bayesian Learning over Graphs

Authors: Anusha Lalitha, Xinghan Wang, Osman Kilinc, Yongxi Lu, Tara Javidi, Farinaz Koushanfar

Abstract: We propose a decentralized learning algorithm over a general social network. The algorithm leaves the training data distributed on the mobile devices while utilizing a peer to peer model aggregation method. The proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion. The proposed algorithm can be viewed as a Bayesian a… ▽ More We propose a decentralized learning algorithm over a general social network. The algorithm leaves the training data distributed on the mobile devices while utilizing a peer to peer model aggregation method. The proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion. The proposed algorithm can be viewed as a Bayesian and peer-to-peer variant of federated learning in which each agent keeps a "posterior probability distribution" over a global model parameters. The agent update its "posterior" based on 1) the local training data and 2) the asynchronous communication and model aggregation with their 1-hop neighbors. This Bayesian formulation allows for a systematic treatment of model aggregation over any arbitrary connected graph. Furthermore, it provides strong analytic guarantees on converge in the realizable case as well as a closed form characterization of the rate of convergence. We also show that our methodology can be combined with efficient Bayesian inference techniques to train Bayesian neural networks in a decentralized manner. By empirical studies we show that our theoretical analysis can guide the design of network/social interactions and data partitioning to achieve convergence. △ Less

Submitted 24 May, 2019; originally announced May 2019.

arXiv:1905.09561 [pdf, other]

Binary Classification with Bounded Abstention Rate

Authors: Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi

Abstract: We consider the problem of binary classification with abstention in the relatively less studied \emph{bounded-rate} setting. We begin by obtaining a characterization of the Bayes optimal classifier for an arbitrary input-label distribution $P_{XY}$. Our result generalizes and provides an alternative proof for the result first obtained by \cite{chow1957optimum}, and then re-derived by \citet{denis2… ▽ More We consider the problem of binary classification with abstention in the relatively less studied \emph{bounded-rate} setting. We begin by obtaining a characterization of the Bayes optimal classifier for an arbitrary input-label distribution $P_{XY}$. Our result generalizes and provides an alternative proof for the result first obtained by \cite{chow1957optimum}, and then re-derived by \citet{denis2015consistency}, under a continuity assumption on $P_{XY}$. We then propose a plug-in classifier that employs unlabeled samples to decide the region of abstention and derive an upper-bound on the excess risk of our classifier under standard \emph{Hölder smoothness} and \emph{margin} assumptions. Unlike the plug-in rule of \citet{denis2015consistency}, our constructed classifier satisfies the abstention constraint with high probability and can also deal with discontinuities in the empirical cdf. We also derive lower-bounds that demonstrate the minimax near-optimality of our proposed algorithm. To address the excessive complexity of the plug-in classifier in high dimensions, we propose a computationally efficient algorithm that builds upon prior work on convex loss surrogates, and obtain bounds on its excess risk in the \emph{realizable} case. We empirically compare the performance of the proposed algorithm with a baseline on a number of UCI benchmark datasets. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: 35 pages, 4 figures

arXiv:1902.09682 [pdf, ps, other]

Multiscale Gaussian Process Level Set Estimation

Authors: Shubhanshu Shekhar, Tara Javidi

Abstract: In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered. A new algorithm for this problem in the Bayesian framework with a Gaussian Process (GP) prior is proposed. The proposed algorithm employs a hierarchical sequence of partitions to explore different regions of the search space at varying levels of detail depending… ▽ More In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered. A new algorithm for this problem in the Bayesian framework with a Gaussian Process (GP) prior is proposed. The proposed algorithm employs a hierarchical sequence of partitions to explore different regions of the search space at varying levels of detail depending upon their proximity to the level set boundary. It is shown that this approach results in the algorithm having a low complexity implementation whose computational cost is significantly smaller than the existing algorithms for higher dimensional search space $\X$. Furthermore, high probability bounds on a measure of discrepancy between the estimated level set and the true level set for the the proposed algorithm are obtained, which are shown to be strictly better than the existing guarantees for a large class of GPs. In the process, a tighter characterization of the information gain of the proposed algorithm is obtained which takes into account the structured nature of the evaluation points. This approach improves upon the existing technique of bounding the information gain with maximum information gain. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: 15 pages

Journal ref: AISTATS 2019

arXiv:1901.11173 [pdf, other]

Peer-to-peer Federated Learning on Graphs

Authors: Anusha Lalitha, Osman Cihan Kilinc, Tara Javidi, Farinaz Koushanfar

Abstract: We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a distributed learning algorithm in which nodes update their belief by aggregate information from their one-hop neighbors to learn a model that best fits the obser… ▽ More We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a distributed learning algorithm in which nodes update their belief by aggregate information from their one-hop neighbors to learn a model that best fits the observations over the entire network. In addition, we also obtain sufficient conditions to ensure that the probability of error is small for every node in the network. We discuss approximations required for applying this algorithm to train Deep Neural Networks (DNNs). Experiments on training linear regression model and on training a DNN show that the proposed learning rule algorithm provides a significant improvement in the accuracy compared to the case where nodes learn without cooperation. △ Less

Submitted 30 January, 2019; originally announced January 2019.

arXiv:1812.07722 [pdf, other]

doi 10.1109/JSAC.2019.2933967

Active Learning and CSI Acquisition for mmWave Initial Alignment

Authors: Sung-En Chiu, Nancy Ronquillo, Tara Javidi

Abstract: Millimeter wave (mmWave) communication with large antenna arrays is a promising technique to enable extremely high data rates due to the large available bandwidth in mmWave frequency bands. In addition, given the knowledge of an optimal directional beamforming vector, large antenna arrays have been shown to overcome both the severe signal attenuation in mmWave as well as the interference problem.… ▽ More Millimeter wave (mmWave) communication with large antenna arrays is a promising technique to enable extremely high data rates due to the large available bandwidth in mmWave frequency bands. In addition, given the knowledge of an optimal directional beamforming vector, large antenna arrays have been shown to overcome both the severe signal attenuation in mmWave as well as the interference problem. However, fundamental limits on achievable learning rate of an optimal beamforming vector remain. This paper considers the problem of adaptive and sequential optimization of the beamforming vectors during the initial access phase of communication. With a single-path channel model, the problem is reduced to actively learning the Angle-of-Arrival (AoA) of the signal sent from the user to the Base Station (BS). Drawing on the recent results in the design of a hierarchical beamforming codebook [1], sequential measurement dependent noisy search strategies [2], and active learning from an imperfect labeler [3], an adaptive and sequential alignment algorithm is proposed. An upper bound on the expected search time of the proposed algorithm is derived via Extrinsic Jensen-Shannon Divergence. which demonstrates that the search time of the proposed algorithm asymptotically matches the performance of the noiseless bisection search up to a constant factor. Furthermore, the upper bound shows that the acquired AoA error probability decays exponentially fast with the search time with an exponent that is a decreasing function of the acquisition rate. Numerically, the proposed algorithm is compared with prior work where a significant improvement of the system communication rate is observed. Most notably, in the relevant regime of low (-10dB to 5dB) raw SNR, this establishes the first practically viable solution for initial access and, hence, the first demonstration of stand-alone mmWave communication △ Less

Submitted 3 September, 2019; v1 submitted 18 December, 2018; originally announced December 2018.

Comments: This paper appears in: IEEE Journal on Selected Areas in Communications On page(s): 1-16 Print ISSN: 0733-8716 Online ISSN: 1558-0008

arXiv:1811.09834 [pdf, other]

Efficient Video Understanding via Layered Multi Frame-Rate Analysis

Authors: Ziyao Tang, Yongxi Lu, Tara Javidi

Abstract: One of the greatest challenges in the design of a real-time perception system for autonomous driving vehicles and drones is the conflicting requirement of safety (high prediction accuracy) and efficiency. Traditional approaches use a single frame rate for the entire system. Motivated by the observation that the lack of robustness against environmental factors is the major weakness of compact ConvN… ▽ More One of the greatest challenges in the design of a real-time perception system for autonomous driving vehicles and drones is the conflicting requirement of safety (high prediction accuracy) and efficiency. Traditional approaches use a single frame rate for the entire system. Motivated by the observation that the lack of robustness against environmental factors is the major weakness of compact ConvNet architectures, we propose a dual frame-rate system that brings in the best of both worlds: A modulator stream that executes an expensive models robust to environmental factors at a low frame rate to extract slowly changing features describing the environment, and a prediction stream that executes a light-weight model at real-time to extract transient signals that describes particularities of the current frame. The advantage of our design is validated by our extensive empirical study, showing that our solution leads to consistent improvements using a variety of backbone architecture choice and input resolutions. These findings suggest multiple frame-rate systems as a promising direction in designing efficient perception for autonomous agents. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: under review

arXiv:1809.06023 [pdf, other]

Learning-based attacks in cyber-physical systems

Authors: Mohammad Javad Khojasteh, Anatoly Khina, Massimo Franceschetti, Tara Javidi

Abstract: We introduce the problem of learning-based attacks in a simple abstraction of cyber-physical systems---the case of a discrete-time, linear, time-invariant plant that may be subject to an attack that overrides the sensor readings and the controller actions. The attacker attempts to learn the dynamics of the plant and subsequently override the controller's actuation signal, to destroy the plant with… ▽ More We introduce the problem of learning-based attacks in a simple abstraction of cyber-physical systems---the case of a discrete-time, linear, time-invariant plant that may be subject to an attack that overrides the sensor readings and the controller actions. The attacker attempts to learn the dynamics of the plant and subsequently override the controller's actuation signal, to destroy the plant without being detected. The attacker can feed fictitious sensor readings to the controller using its estimate of the plant dynamics and mimic the legitimate plant operation. The controller, on the other hand, is constantly on the lookout for an attack; once the controller detects an attack, it immediately shuts the plant off. In the case of scalar plants, we derive an upper bound on the attacker's deception probability for any measurable control policy when the attacker uses an arbitrary learning algorithm to estimate the system dynamics. We then derive lower bounds for the attacker's deception probability for both scalar and vector plants by assuming a specific authentication test that inspects the empirical variance of the system disturbance. We also show how the controller can improve the security of the system by superimposing a carefully crafted privacy-enhancing signal on top of the "nominal control policy." Finally, for nonlinear scalar dynamics that belong to the Reproducing Kernel Hilbert Space (RKHS), we investigate the performance of attacks based on nonlinear Gaussian-processes (GP) learning algorithms. △ Less

Submitted 27 June, 2020; v1 submitted 17 September, 2018; originally announced September 2018.

arXiv:1802.09069 [pdf, other]

Active Learning with Logged Data

Authors: Songbai Yan, Kamalika Chaudhuri, Tara Javidi

Abstract: We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy. Prior work addresses this problem either when only logged data is available, or purely in a controlled random experimentation setting where the logged data is ignored… ▽ More We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy. Prior work addresses this problem either when only logged data is available, or purely in a controlled random experimentation setting where the logged data is ignored. In this work, we combine both approaches to provide an algorithm that uses logged data to bootstrap and inform experimentation, thus achieving the best of both worlds. Our work is inspired by a connection between controlled random experimentation and active learning, and modifies existing disagreement-based active learning algorithms to exploit logged data. △ Less

Submitted 13 June, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

Comments: ICML 2018

arXiv:1712.05865 [pdf, other]

doi 10.1109/JSTSP.2018.2850751

Improved Target Acquisition Rates with Feedback Codes

Authors: Anusha Lalitha, Nancy Ronquillo, Tara Javidi

Abstract: This paper considers the problem of acquiring an unknown target location (among a finite number of locations) via a sequence of measurements, where each measurement consists of simultaneously probing a group of locations. The resulting observation consists of a sum of an indicator of the target's presence in the probed region, and a zero mean Gaussian noise term whose variance is a function of the… ▽ More This paper considers the problem of acquiring an unknown target location (among a finite number of locations) via a sequence of measurements, where each measurement consists of simultaneously probing a group of locations. The resulting observation consists of a sum of an indicator of the target's presence in the probed region, and a zero mean Gaussian noise term whose variance is a function of the measurement vector. An equivalence between the target acquisition problem and channel coding over a binary input additive white Gaussian noise (BAWGN) channel with state and feedback is established. Utilizing this information theoretic perspective, a two-stage adaptive target search strategy based on the sorted Posterior Matching channel coding strategy is proposed. Furthermore, using information theoretic converses, the fundamental limits on the target acquisition rate for adaptive and non-adaptive strategies are characterized. As a corollary to the non-asymptotic upper bound of the expected number of measurements under the proposed two-stage strategy, and to non-asymptotic lower bound of the expected number of measurements for optimal non-adaptive search strategy, a lower bound on the adaptivity gain is obtained. The adaptivity gain is further investigated in different asymptotic regimes of interest. △ Less

Submitted 15 December, 2017; originally announced December 2017.

arXiv:1712.01447 [pdf, other]

Gaussian Process bandits with adaptive discretization

Authors: Shubhanshu Shekhar, Tara Javidi

Abstract: In this paper, the problem of maximizing a black-box function $f:\mathcal{X} \to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process (GP) prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over a… ▽ More In this paper, the problem of maximizing a black-box function $f:\mathcal{X} \to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process (GP) prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented. △ Less

Submitted 5 January, 2018; v1 submitted 4 December, 2017; originally announced December 2017.

Comments: 34 pages, 2 figures

arXiv:1709.02538 [pdf, other]

DeepFense: Online Accelerated Defense Against Adversarial Deep Learning

Authors: Bita Darvish Rouhani, Mohammad Samragh, Mojan Javaheripi, Tara Javidi, Farinaz Koushanfar

Abstract: Recent advances in adversarial Deep Learning (DL) have opened up a largely unexplored surface for malicious attacks jeopardizing the integrity of autonomous DL systems. With the wide-spread usage of DL in critical and time-sensitive applications, including unmanned vehicles, drones, and video surveillance systems, online detection of malicious inputs is of utmost importance. We propose DeepFense,… ▽ More Recent advances in adversarial Deep Learning (DL) have opened up a largely unexplored surface for malicious attacks jeopardizing the integrity of autonomous DL systems. With the wide-spread usage of DL in critical and time-sensitive applications, including unmanned vehicles, drones, and video surveillance systems, online detection of malicious inputs is of utmost importance. We propose DeepFense, the first end-to-end automated framework that simultaneously enables efficient and safe execution of DL models. DeepFense formalizes the goal of thwarting adversarial attacks as an optimization problem that minimizes the rarely observed regions in the latent feature space spanned by a DL network. To solve the aforementioned minimization problem, a set of complementary but disjoint modular redundancies are trained to validate the legitimacy of the input samples in parallel with the victim DL model. DeepFense leverages hardware/software/algorithm co-design and customized acceleration to achieve just-in-time performance in resource-constrained settings. The proposed countermeasure is unsupervised, meaning that no adversarial sample is leveraged to train modular redundancies. We further provide an accompanying API to reduce the non-recurring engineering cost and ensure automated adaptation to various platforms. Extensive evaluations on FPGAs and GPUs demonstrate up to two orders of magnitude performance improvement while enabling online adversarial sample detection. △ Less

Submitted 20 August, 2018; v1 submitted 8 September, 2017; originally announced September 2017.

Comments: Adding hardware acceleration for real-time execution of defender modules

arXiv:1610.09730 [pdf, ps, other]

Active Learning from Imperfect Labelers

Authors: Songbai Yan, Kamalika Chaudhuri, Tara Javidi

Abstract: We study active learning where the labeler can not only return incorrect labels but also abstain from labeling. We consider different noise and abstention conditions of the labeler. We propose an algorithm which utilizes abstention responses, and analyze its statistical consistency and query complexity under fairly natural assumptions on the noise and abstention rate of the labeler. This algorithm… ▽ More We study active learning where the labeler can not only return incorrect labels but also abstain from labeling. We consider different noise and abstention conditions of the labeler. We propose an algorithm which utilizes abstention responses, and analyze its statistical consistency and query complexity under fairly natural assumptions on the noise and abstention rate of the labeler. This algorithm is adaptive in a sense that it can automatically request less queries with a more informed or less noisy labeler. We couple our algorithm with lower bounds to show that under some technical conditions, it achieves nearly optimal query complexity. △ Less

Submitted 30 October, 2016; originally announced October 2016.

Comments: To appear in NIPS 2016

Showing 1–28 of 28 results for author: Javidi, T