Search | arXiv e-print repository

Optimal Top-Two Method for Best Arm Identification and Fluid Analysis

Authors: Agniv Bandyopadhyay, Sandeep Juneja, Shubhada Agrawal

Abstract: Top-$2$ methods have become popular in solving the best arm identification (BAI) problem. The best arm, or the arm with the largest mean amongst finitely many, is identified through an algorithm that at any sequential step independently pulls the empirical best arm, with a fixed probability $β$, and pulls the best challenger arm otherwise. The probability of incorrect selection is guaranteed to li… ▽ More Top-$2$ methods have become popular in solving the best arm identification (BAI) problem. The best arm, or the arm with the largest mean amongst finitely many, is identified through an algorithm that at any sequential step independently pulls the empirical best arm, with a fixed probability $β$, and pulls the best challenger arm otherwise. The probability of incorrect selection is guaranteed to lie below a specified $δ>0$. Information theoretic lower bounds on sample complexity are well known for BAI problem and are matched asymptotically as $δ\rightarrow 0$ by computationally demanding plug-in methods. The above top 2 algorithm for any $β\in (0,1)$ has sample complexity within a constant of the lower bound. However, determining the optimal $β$ that matches the lower bound has proven difficult. In this paper, we address this and propose an optimal top-2 type algorithm. We consider a function of allocations anchored at a threshold. If it exceeds the threshold then the algorithm samples the empirical best arm. Otherwise, it samples the challenger arm. We show that the proposed algorithm is optimal as $δ\rightarrow 0$. Our analysis relies on identifying a limiting fluid dynamics of allocations that satisfy a series of ordinary differential equations pasted together and that describe the asymptotic path followed by our algorithm. We rely on the implicit function theorem to show existence and uniqueness of these fluid ode's and to show that the proposed algorithm remains close to the ode solution. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.07851 [pdf, other]

Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NCEP-NWP forecasts

Authors: Apoorva Narula, Aastha Jain, Jatin Batra, Sandeep Juneja

Abstract: In this draft we consider the problem of forecasting rainfall across India during the four monsoon months, one day as well as three days in advance. We train neural networks using historical daily gridded precipitation data for India obtained from IMD for the time period $1901- 2022$, at a spatial resolution of $1^{\circ} \times 1^{\circ}$. This is compared with the numerical weather prediction (N… ▽ More In this draft we consider the problem of forecasting rainfall across India during the four monsoon months, one day as well as three days in advance. We train neural networks using historical daily gridded precipitation data for India obtained from IMD for the time period $1901- 2022$, at a spatial resolution of $1^{\circ} \times 1^{\circ}$. This is compared with the numerical weather prediction (NWP) forecasts obtained from NCEP (National Centre for Environmental Prediction) available for the period 2011-2022. We conduct a detailed country wide analysis and separately analyze some of the most populated cities in India. Our conclusion is that forecasts obtained by applying deep learning to historical rainfall data are more accurate compared to NWP forecasts as well as predictions based on persistence. On average, compared to our predictions, forecasts from NCEP-NWP model have about 34% higher error for a single day prediction, and over 68% higher error for a three day prediction. Similarly, persistence estimates report a 29% higher error in a single day forecast, and over 54% error in a three day forecast. We further observe that data up to 20 days in the past is useful in reducing errors of one and three day forecasts, when a transformer based learning architecture, and to a lesser extent when an LSTM is used. A key conclusion suggested by our preliminary analysis is that NWP forecasts can be substantially improved upon through more and diverse data relevant to monsoon prediction combined with carefully selected neural network architecture. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2310.18149 [pdf, other]

Game of arrivals at a two queue network with heterogeneous customer routes

Authors: Agniv Bandyopadhyay, Sandeep Juneja

Abstract: We consider a queuing network that opens at a specified time, where customers are non-atomic and belong to different classes. Each class has its own route, and as is typical in the literature, the costs are a linear function of waiting and service completion time. We restrict ourselves to a two class, two queue network: this simplification is well motivated as the diversity in solution structure a… ▽ More We consider a queuing network that opens at a specified time, where customers are non-atomic and belong to different classes. Each class has its own route, and as is typical in the literature, the costs are a linear function of waiting and service completion time. We restrict ourselves to a two class, two queue network: this simplification is well motivated as the diversity in solution structure as a function of problem parameters is substantial even in this simple setting (e.g., a specific routing structure involves eight different regimes), suggesting a combinatorial blow up as the number of queues, routes and customer classes increase. We identify the unique Nash equilibrium customer arrival profile when the customer linear cost preferences are different. This profile is a function of problem parameters including the size of each class, service rates at each queue, and customer cost preferences. When customer cost preferences match, under certain parametric settings, the equilibrium arrival profiles may not be unique and may lie in a convex set. We further make a surprising observation that in some parametric settings, customers in one class may arrive in disjoint intervals. Further, the two classes may arrive in contiguous intervals or in overlap** intervals, and at varying rates within an interval, depending upon the problem parameters. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2306.09048 [pdf, other]

Optimal Best-Arm Identification in Bandits with Access to Offline Data

Authors: Shubhada Agrawal, Sandeep Juneja, Karthikeyan Shanmugam, Arun Sai Suggala

Abstract: Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic $K$-armed bandit problem, where our goal is to identify the arm with the highest mean in the… ▽ More Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic $K$-armed bandit problem, where our goal is to identify the arm with the highest mean in the presence of relevant offline data, with confidence $1-δ$. We conduct a lower bound analysis on policies that provide such $1-δ$ probabilistic correctness guarantees. We develop algorithms that match the lower bound on sample complexity when $δ$ is small. Our algorithms are computationally efficient with an average per-sample acquisition cost of $\tilde{O}(K)$, and rely on a careful characterization of the optimality conditions of the lower bound problem. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 45 pages, 5 figures

arXiv:2303.07627 [pdf, ps, other]

Best arm identification in rare events

Authors: Anirban Bhattacharjee, Sushant Vijayan, Sandeep K Juneja

Abstract: We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly pro… ▽ More We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. As we observe, these algorithms can be computationally prohibitive. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity. We analyze the problem in an asymptotic regime as rarity of reward occurrence reduces to zero, and reward amounts increase to infinity. This helps illustrate the benefits of the proposed algorithm. It also sheds light on the underlying structure of the optimal BAI algorithms in the rare event setting. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 32 pages

arXiv:2208.07934 [pdf, other]

Measuring Statistical Dependencies via Maximum Norm and Characteristic Functions

Authors: Povilas Daniušis, Shubham Juneja, Lukas Kuzma, Virginijus Marcinkevičius

Abstract: In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal characteristic functions. The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions, is differentiabl… ▽ More In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal characteristic functions. The proposed measure can detect arbitrary statistical dependence between two random vectors of possibly different dimensions, is differentiable, and easily integrable into modern machine learning and deep learning pipelines. We also conduct experiments both with simulated and real data. Our simulations show, that the proposed method can measure statistical dependencies in high-dimensional, non-linear data, and is less affected by the curse of dimensionality, compared to the previous work in this line of research. The experiments with real data demonstrate the potential applicability of our statistical measure for two different empirical inference scenarios, showing statistically significant improvement in the performance characteristics when applied for supervised feature extraction and deep neural network regularization. In addition, we provide a link to the accompanying open-source repository https://bit.ly/3d4ch5I. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2102.03734 [pdf, other]

Regret Minimization in Heavy-Tailed Bandits

Authors: Shubhada Agrawal, Sandeep Juneja, Wouter M. Koolen

Abstract: We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of orde… ▽ More We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+ε)$ are uniformly bounded by a known constant B, for some given $ε> 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost. △ Less

Submitted 7 February, 2021; originally announced February 2021.

Comments: 35 pages, 2 figures

arXiv:2008.07606 [pdf, other]

Optimal Best-Arm Identification Methods for Tail-Risk Measures

Authors: Shubhada Agrawal, Wouter M. Koolen, Sandeep Juneja

Abstract: Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely m… ▽ More Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely many that has the smallest CVaR, VaR, or weighted sum of CVaR and mean. The latter captures the risk-return trade-off common in finance. Our main contribution is an optimal $δ$-correct algorithm that acts on general arms, including heavy-tailed distributions, and matches the lower bound on the expected number of samples needed, asymptotically (as $δ$ approaches $0$). The algorithm requires solving a non-convex optimization problem in the space of probability measures, that requires delicate analysis. En-route, we develop new non-asymptotic empirical likelihood-based concentration inequalities for tail-risk measures which are tighter than those for popular truncation-based empirical estimators. △ Less

Submitted 21 June, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: 55 pages, 4 figures

arXiv:2008.04849 [pdf, other]

doi 10.1007/s41745-020-00211-3

City-Scale Agent-Based Simulators for the Study of Non-Pharmaceutical Interventions in the Context of the COVID-19 Epidemic

Authors: Shubhada Agrawal, Siddharth Bhandari, Anirban Bhattacharjee, Anand Deo, Narendra M. Dixit, Prahladh Harsha, Sandeep Juneja, Poonam Kesarwani, Aditya Krishna Swamy, Preetam Patil, Nihesh Rathod, Ramprasad Saptharishi, Sharad Shriram, Piyush Srivastava, Rajesh Sundaresan, Nidhin Koshy Vaidhiyan, Sarath Yasodharan

Abstract: We highlight the usefulness of city-scale agent-based simulators in studying various non-pharmaceutical interventions to manage an evolving pandemic. We ground our studies in the context of the COVID-19 pandemic and demonstrate the power of the simulator via several exploratory case studies in two metropolises, Bengaluru and Mumbai. Such tools become common-place in any city administration's tool… ▽ More We highlight the usefulness of city-scale agent-based simulators in studying various non-pharmaceutical interventions to manage an evolving pandemic. We ground our studies in the context of the COVID-19 pandemic and demonstrate the power of the simulator via several exploratory case studies in two metropolises, Bengaluru and Mumbai. Such tools become common-place in any city administration's tool kit in our march towards digital health. △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: 56 pages

Journal ref: Journal of the Indian Institute of Science, volume 100, pages 809-847, 2020

arXiv:2006.03375 [pdf, other]

COVID-19 Epidemic Study II: Phased Emergence From the Lockdown in Mumbai

Authors: Prahladh Harsha, Sandeep Juneja, Preetam Patil, Nihesh Rathod, Ramprasad Saptharishi, A. Y. Sarath, Sharad Sriram, Piyush Srivastava, Rajesh Sundaresan, Nidhin Koshy Vaidhiyan

Abstract: The nation-wide lockdown starting 25 March 2020, aimed at suppressing the spread of the COVID-19 disease, was extended until 31 May 2020 in three subsequent orders by the Government of India. The extended lockdown has had significant social and economic consequences and `lockdown fatigue' has likely set in. Phased reopening began from 01 June 2020 onwards. Mumbai, one of the most crowded cities in… ▽ More The nation-wide lockdown starting 25 March 2020, aimed at suppressing the spread of the COVID-19 disease, was extended until 31 May 2020 in three subsequent orders by the Government of India. The extended lockdown has had significant social and economic consequences and `lockdown fatigue' has likely set in. Phased reopening began from 01 June 2020 onwards. Mumbai, one of the most crowded cities in the world, has witnessed both the largest number of cases and deaths among all the cities in India (41986 positive cases and 1368 deaths as of 02 June 2020). Many tough decisions are going to be made on re-opening in the next few days. In an earlier IISc-TIFR Report, we presented an agent-based city-scale simulator(ABCS) to model the progression and spread of the infection in large metropolises like Mumbai and Bengaluru. As discussed in IISc-TIFR Report 1, ABCS is a useful tool to model interactions of city residents at an individual level and to capture the impact of non-pharmaceutical interventions on the infection spread. In this report we focus on Mumbai. Using our simulator, we consider some plausible scenarios for phased emergence of Mumbai from the lockdown, 01 June 2020 onwards. These include phased and gradual opening of the industry, partial opening of public transportation (modelling of infection spread in suburban trains), impact of containment zones on controlling infections, and the role of compliance with respect to various intervention measures including use of masks, case isolation, home quarantine, etc. The main takeaway of our simulation results is that a phased opening of workplaces, say at a conservative attendance level of 20 to 33\%, is a good way to restart economic activity while ensuring that the city's medical care capacity remains adequate to handle the possible rise in the number of COVID-19 patients in June and July. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: 34 pages

arXiv:2004.05442 [pdf, other]

Discriminative Learning via Adaptive Questioning

Authors: Achal Bassamboo, Vikas Deep, Sandeep Juneja, Assaf Zeevi

Abstract: We consider the problem of designing an adaptive sequence of questions that optimally classify a candidate's ability into one of several categories or discriminative grades. A candidate's ability is modeled as an unknown parameter, which, together with the difficulty of the question asked, determines the likelihood with which s/he is able to answer a question correctly. The learning algorithm is o… ▽ More We consider the problem of designing an adaptive sequence of questions that optimally classify a candidate's ability into one of several categories or discriminative grades. A candidate's ability is modeled as an unknown parameter, which, together with the difficulty of the question asked, determines the likelihood with which s/he is able to answer a question correctly. The learning algorithm is only able to observe these noisy responses to its queries. We consider this problem from a fixed confidence-based $δ$-correct framework, that in our setting seeks to arrive at the correct ability discrimination at the fastest possible rate while guaranteeing that the probability of error is less than a pre-specified and small $δ$. In this setting we develop lower bounds on any sequential questioning strategy and develop geometrical insights into the problem structure both from primal and dual formulation. In addition, we arrive at algorithms that essentially match these lower bounds. Our key conclusions are that, asymptotically, any candidate needs to be asked questions at most at two (candidate ability-specific) levels, although, in a reasonably general framework, questions need to be asked only at a single level. Further, and interestingly, the problem structure facilitates endogenous exploration, so there is no need for a separately designed exploration stage in the algorithm. △ Less

Submitted 11 April, 2020; originally announced April 2020.

Comments: 3 figures

arXiv:1910.06658 [pdf, other]

doi 10.1007/s10514-021-09980-x

Topological Navigation Graph Framework

Authors: Povilas Daniusis, Shubham Juneja, Lukas Valatka, Linas Petkevicius

Abstract: We focus on the utilisation of reactive trajectory imitation controllers for goal-directed mobile robot navigation. We propose a topological navigation graph (TNG) - an imitation-learning-based framework for navigating through environments with intersecting trajectories. The TNG framework represents the environment as a directed graph composed of deep neural networks. Each vertex of the graph corr… ▽ More We focus on the utilisation of reactive trajectory imitation controllers for goal-directed mobile robot navigation. We propose a topological navigation graph (TNG) - an imitation-learning-based framework for navigating through environments with intersecting trajectories. The TNG framework represents the environment as a directed graph composed of deep neural networks. Each vertex of the graph corresponds to a trajectory and is represented by a trajectory identification classifier and a trajectory imitation controller. For trajectory following, we propose the novel use of neural object detection architectures. The edges of TNG correspond to intersections between trajectories and are all represented by a classifier. We provide empirical evaluation of the proposed navigation framework and its components in simulated and real-world environments, demonstrating that TNG allows us to utilise non-goal-directed, imitation-learning methods for goal-directed autonomous navigation. △ Less

Submitted 13 May, 2021; v1 submitted 15 October, 2019; originally announced October 2019.

ACM Class: I.2.9

arXiv:1908.09094 [pdf, other]

Optimal $δ$-Correct Best-Arm Selection for Heavy-Tailed Distributions

Authors: Shubhada Agrawal, Sandeep Juneja, Peter Glynn

Abstract: Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a $δ$-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified $δ$) that has minimum sample complexity. Lower bounds for $δ$-correct algorithms are well known. $δ$-correct algorithms that match the low… ▽ More Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a $δ$-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified $δ$) that has minimum sample complexity. Lower bounds for $δ$-correct algorithms are well known. $δ$-correct algorithms that match the lower bound asymptotically as $δ$ reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise, under a $δ$-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a $δ$-correct algorithm that matches the lower bound as $δ$ reduces to zero under the mild restriction that a known bound on the expectation of $(1+ε)^{th}$ moment of the underlying random variables exists, for $ε> 0$. We also propose batch processing and identify near-optimal batch sizes to speed up the proposed algorithm substantially. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well-studied classic problem in the simulation community. △ Less

Submitted 24 November, 2023; v1 submitted 24 August, 2019; originally announced August 2019.

Comments: Updated version of work that appeared in ALT 2020

MSC Class: 65C05; 60-08

arXiv:1811.05654 [pdf, other]

Sample complexity of partition identification using multi-armed bandits

Authors: Sandeep Juneja, Subhashini Krishnasamy

Abstract: Given a vector of probability distributions, or arms, each of which can be sampled independently, we consider the problem of identifying the partition to which this vector belongs from a finitely partitioned universe of such vector of distributions. We study this as a pure exploration problem in multi armed bandit settings and develop sample complexity bounds on the total mean number of samples re… ▽ More Given a vector of probability distributions, or arms, each of which can be sampled independently, we consider the problem of identifying the partition to which this vector belongs from a finitely partitioned universe of such vector of distributions. We study this as a pure exploration problem in multi armed bandit settings and develop sample complexity bounds on the total mean number of samples required for identifying the correct partition with high probability. This framework subsumes well studied problems such as finding the best arm or the best few arms. We consider distributions belonging to the single parameter exponential family and primarily consider partitions where the vector of means of arms lie either in a given set or its complement. The sets considered correspond to distributions where there exists a mean above a specified threshold, where the set is a half space and where either the set or its complement is a polytope, or more generally, a convex set. In these settings, we characterize the lower bounds on mean number of samples for each arm highlighting their dependence on the problem geometry. Further, inspired by the lower bounds, we propose algorithms that can match these bounds asymptotically with decreasing probability of error. Applications of this framework may be diverse. We briefly discuss one associated with finance. △ Less

Submitted 5 February, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

Showing 1–14 of 14 results for author: Juneja, S