Search | arXiv e-print repository

Adaptive Experimentation When You Can't Experiment

Authors: Yao Zhao, Kwang-Sung Jun, Tanner Fiez, Lalit Jain

Abstract: This paper introduces the \emph{confounded pure exploration transductive linear bandit} (\texttt{CPET-LB}) problem. As a motivating example, often online services cannot directly assign users to specific control or treatment experiences either for business or practical reasons. In these settings, naively comparing treatment and control groups that may result from self-selection can lead to biased… ▽ More This paper introduces the \emph{confounded pure exploration transductive linear bandit} (\texttt{CPET-LB}) problem. As a motivating example, often online services cannot directly assign users to specific control or treatment experiences either for business or practical reasons. In these settings, naively comparing treatment and control groups that may result from self-selection can lead to biased estimates of underlying treatment effects. Instead, online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment. Our methodology provides online services with an adaptive experimental design approach for learning the best-performing treatment for such \textit{encouragement designs}. We consider a more general underlying model captured by a linear structural equation and formulate pure exploration linear bandits in this setting. Though pure exploration has been extensively studied in standard adaptive experimental design settings, we believe this is the first work considering a setting where noise is confounded. Elimination-style algorithms using experimental design methods in combination with a novel finite-time confidence interval on an instrumental variable style estimator are presented with sample complexity upper bounds nearly matching a minimax lower bound. Finally, experiments are conducted that demonstrate the efficacy of our approach. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2402.10870 [pdf, other]

Best of Three Worlds: Adaptive Experimentation for Digital Marketing in Practice

Authors: Tanner Fiez, Houssam Nassif, Yu-Cheng Chen, Sergio Gamez, Lalit Jain

Abstract: Adaptive experimental design (AED) methods are increasingly being used in industry as a tool to boost testing throughput or reduce experimentation cost relative to traditional A/B/N testing methods. However, the behavior and guarantees of such methods are not well-understood beyond idealized stationary settings. This paper shares lessons learned regarding the challenges of naively using AED system… ▽ More Adaptive experimental design (AED) methods are increasingly being used in industry as a tool to boost testing throughput or reduce experimentation cost relative to traditional A/B/N testing methods. However, the behavior and guarantees of such methods are not well-understood beyond idealized stationary settings. This paper shares lessons learned regarding the challenges of naively using AED systems in industrial settings where non-stationarity is prevalent, while also providing perspectives on the proper objectives and system specifications in such settings. We developed an AED framework for counterfactual inference based on these experiences, and tested it in a commercial environment. △ Less

Submitted 26 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Journal ref: The Web Conference (WWW), Singapore, 2024

arXiv:2310.04390 [pdf, other]

Experimental Designs for Heteroskedastic Variance

Authors: Justin Weltz, Tanner Fiez, Alexander Volfovsky, Eric Laber, Blake Mason, Houssam Nassif, Lalit Jain

Abstract: Most linear experimental design problems assume homogeneous variance although heteroskedastic noise is present in many realistic settings. Let a learner have access to a finite set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$ that can be probed to receive noisy linear responses of the form $y=x^{\top}θ^{\ast}+η$. Here $θ^{\ast}\in \mathbb{R}^d$ is an unknown parameter vector, and $η$ i… ▽ More Most linear experimental design problems assume homogeneous variance although heteroskedastic noise is present in many realistic settings. Let a learner have access to a finite set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$ that can be probed to receive noisy linear responses of the form $y=x^{\top}θ^{\ast}+η$. Here $θ^{\ast}\in \mathbb{R}^d$ is an unknown parameter vector, and $η$ is independent mean-zero $σ_x^2$-sub-Gaussian noise defined by a flexible heteroskedastic variance model, $σ_x^2 = x^{\top}Σ^{\ast}x$. Assuming that $Σ^{\ast}\in \mathbb{R}^{d\times d}$ is an unknown matrix, we propose, analyze and empirically evaluate a novel design for uniformly bounding estimation error of the variance parameters, $σ_x^2$. We demonstrate the benefits of this method with two adaptive experimental design problems under heteroskedastic noise, fixed confidence transductive best-arm identification and level-set identification and prove the first instance-dependent lower bounds in these settings. Lastly, we construct near-optimal algorithms and demonstrate the large improvements in sample complexity gained from accounting for heteroskedastic variance in these designs empirically. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Journal ref: Conference on Neural Information Processing Systems (NeurIPS'23), New Orleans, 2023

arXiv:2302.01416 [pdf, other]

doi 10.1145/3580305.3599875

Neural Insights for Digital Marketing Content Design

Authors: Fanjie Kong, Yuan Li, Houssam Nassif, Tanner Fiez, Ricardo Henao, Shreya Chakrabarti

Abstract: In digital marketing, experimenting with new website content is one of the key levers to improve customer engagement. However, creating successful marketing content is a manual and time-consuming process that lacks clear guiding principles. This paper seeks to close the loop between content creation and online experimentation by offering marketers AI-driven actionable insights based on historical… ▽ More In digital marketing, experimenting with new website content is one of the key levers to improve customer engagement. However, creating successful marketing content is a manual and time-consuming process that lacks clear guiding principles. This paper seeks to close the loop between content creation and online experimentation by offering marketers AI-driven actionable insights based on historical data to improve their creative process. We present a neural-network-based system that scores and extracts insights from a marketing content design, namely, a multimodal neural network predicts the attractiveness of marketing contents, and a post-hoc attribution method generates actionable insights for marketers to improve their content in specific marketing locations. Our insights not only point out the advantages and drawbacks of a given current content, but also provide design recommendations based on historical data. We show that our scoring model and insights work well both quantitatively and qualitatively. △ Less

Submitted 7 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Journal ref: International Conference on Knowledge Discovery and Data Mining (KDD'23), Long Beach, CA, pp. 4320-4332, 2023

arXiv:2210.14369 [pdf, other]

Adaptive Experimental Design and Counterfactual Inference

Authors: Tanner Fiez, Sergio Gamez, Arick Chen, Houssam Nassif, Lalit Jain

Abstract: Adaptive experimental design methods are increasingly being used in industry as a tool to boost testing throughput or reduce experimentation cost relative to traditional A/B/N testing methods. This paper shares lessons learned regarding the challenges and pitfalls of naively using adaptive experimentation systems in industrial settings where non-stationarity is prevalent, while also providing pers… ▽ More Adaptive experimental design methods are increasingly being used in industry as a tool to boost testing throughput or reduce experimentation cost relative to traditional A/B/N testing methods. This paper shares lessons learned regarding the challenges and pitfalls of naively using adaptive experimentation systems in industrial settings where non-stationarity is prevalent, while also providing perspectives on the proper objectives and system specifications in these settings. We developed an adaptive experimental design framework for counterfactual inference based on these experiences, and tested it in a commercial environment. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: In Workshops of the Conference on Recommender Systems (RecSys), 2022

arXiv:2111.03377 [pdf, other]

Online Learning in Periodic Zero-Sum Games

Authors: Tanner Fiez, Ryann Sim, Stratis Skoulakis, Georgios Piliouras, Lillian Ratliff

Abstract: A seminal result in game theory is von Neumann's minmax theorem, which states that zero-sum games admit an essentially unique equilibrium solution. Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games. In the past several years, a key research direction has focused on characterizing the day-to-d… ▽ More A seminal result in game theory is von Neumann's minmax theorem, which states that zero-sum games admit an essentially unique equilibrium solution. Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games. In the past several years, a key research direction has focused on characterizing the day-to-day behavior of such dynamics. General results in this direction show that broad classes of online learning dynamics are cyclic, and formally Poincaré recurrent, in zero-sum games. We analyze the robustness of these online learning behaviors in the case of periodic zero-sum games with a time-invariant equilibrium. This model generalizes the usual repeated game formulation while also being a realistic and natural model of a repeated competition between players that depends on exogenous environmental variations such as time-of-day effects, week-to-week trends, and seasonality. Interestingly, time-average convergence may fail even in the simplest such settings, in spite of the equilibrium being fixed. In contrast, using novel analysis methods, we show that Poincaré recurrence provably generalizes despite the complex, non-autonomous nature of these dynamical systems. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: To appear at NeurIPS 2021

arXiv:2109.12286 [pdf, other]

Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

Authors: Liyuan Zheng, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, Lillian J. Ratliff

Abstract: The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelbe… ▽ More The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient theorem for the refined update and provide a local convergence guarantee for the Stackelberg actor-critic algorithms to a local Stackelberg equilibrium. From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, extensive experiments on OpenAI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts. △ Less

Submitted 25 September, 2021; originally announced September 2021.

arXiv:2106.01488 [pdf, other]

Minimax Optimization with Smooth Algorithmic Adversaries

Authors: Tanner Fiez, Chi **, Praneeth Netrapalli, Lillian J. Ratliff

Abstract: This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$. Though such optimization problems arise in many machine learning paradigms including training generative adversarial networks (GANs) and adversarially robust models, many fundamental issues remain in theory, such as the absence of efficiently co… ▽ More This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$. Though such optimization problems arise in many machine learning paradigms including training generative adversarial networks (GANs) and adversarially robust models, many fundamental issues remain in theory, such as the absence of efficiently computable optimality notions, and cyclic or diverging behavior of existing algorithms. Our framework sprouts from the practical consideration that under a computational budget, the max-player can not fully maximize $f(x,\cdot)$ since nonconcave maximization is NP-hard in general. So, we propose a new algorithm for the min-player to play against smooth algorithms deployed by the adversary (i.e., the max-player) instead of against full maximization. Our algorithm is guaranteed to make monotonic progress (thus having no limit cycles), and to find an appropriate "stationary point" in a polynomial number of iterations. Our framework covers practical settings where the smooth algorithms deployed by the adversary are multi-step stochastic gradient ascent, and its accelerated version. We further provide complementing experiments that confirm our theoretical findings and demonstrate the effectiveness of the proposed approach in practice. △ Less

Submitted 2 June, 2021; originally announced June 2021.

arXiv:2012.08382 [pdf, other]

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

Authors: Stratis Skoulakis, Tanner Fiez, Ryann Sim, Georgios Piliouras, Lillian Ratliff

Abstract: The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game. In this paper, we move away from the artificial divide between dynamic agents and static games, to introduce and analyze a large class of competitive settings where both the agents and the g… ▽ More The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game. In this paper, we move away from the artificial divide between dynamic agents and static games, to introduce and analyze a large class of competitive settings where both the agents and the games they play evolve strategically over time. We focus on arguably the most archetypal game-theoretic setting -- zero-sum games (as well as network generalizations) -- and the most studied evolutionary learning dynamic -- replicator, the continuous-time analogue of multiplicative weights. Populations of agents compete against each other in a zero-sum competition that itself evolves adversarially to the current population mixture. Remarkably, despite the chaotic coevolution of agents and games, we prove that the system exhibits a number of regularities. First, the system has conservation laws of an information-theoretic flavor that couple the behavior of all agents and games. Secondly, the system is Poincaré recurrent, with effectively all possible initializations of agents and games lying on recurrent orbits that come arbitrarily close to their initial conditions infinitely often. Thirdly, the time-average agent behavior and utility converge to the Nash equilibrium values of the time-average game. Finally, we provide a polynomial time algorithm to efficiently predict this time-average behavior for any such coevolving network game. △ Less

Submitted 15 December, 2020; originally announced December 2020.

Comments: To appear in AAAI 2021

arXiv:2009.14820 [pdf, other]

Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

Authors: Tanner Fiez, Lillian Ratliff

Abstract: We study the role that a finite timescale separation parameter $τ$ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $γ_1$ and the learning rate of player 2 is defined to be $γ_2=τγ_1$. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of pla… ▽ More We study the role that a finite timescale separation parameter $τ$ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $γ_1$ and the learning rate of player 2 is defined to be $γ_2=τγ_1$. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate ($τ=1$) and the maximizing player approximately converging between each update of the minimizing player ($τ\rightarrow \infty$). For the parameter choice of $τ=1$, it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, ** et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as $τ\rightarrow\infty$. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter $τ^{\ast}$ such that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all $τ\in (τ^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing $τ^{\ast}$ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point $x^{\ast}$ that is not a strict local minmax equilibrium, then there exists a finite timescale separation $τ_0$ such that $x^{\ast}$ is unstable for all $τ\in (τ_0, \infty)$. Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance. △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2007.07079 [pdf, other]

A SUPER* Algorithm to Optimize Paper Bidding in Peer Review

Authors: Tanner Fiez, Nihar B. Shah, Lillian Ratliff

Abstract: A number of applications involve sequential arrival of users, and require showing each user an ordering of items. A prime example (which forms the focus of this paper) is the bidding process in conference peer review where reviewers enter the system sequentially, each reviewer needs to be shown the list of submitted papers, and the reviewer then "bids" to review some papers. The order of the paper… ▽ More A number of applications involve sequential arrival of users, and require showing each user an ordering of items. A prime example (which forms the focus of this paper) is the bidding process in conference peer review where reviewers enter the system sequentially, each reviewer needs to be shown the list of submitted papers, and the reviewer then "bids" to review some papers. The order of the papers shown has a significant impact on the bids due to primacy effects. In deciding on the ordering of papers to show, there are two competing goals: (i) obtaining sufficiently many bids for each paper, and (ii) satisfying reviewers by showing them relevant items. In this paper, we begin by develo** a framework to study this problem in a principled manner. We present an algorithm called SUPER*, inspired by the A* algorithm, for this goal. Theoretically, we show a local optimality guarantee of our algorithm and prove that popular baselines are considerably suboptimal. Moreover, under a community model for the similarities, we prove that SUPER* is near-optimal whereas the popular baselines are considerably suboptimal. In experiments on real data from ICLR 2018 and synthetic data, we find that SUPER* considerably outperforms baselines deployed in existing systems, consistently reducing the number of papers with fewer than requisite bids by 50-75% or more, and is also robust to various real world complexities. △ Less

Submitted 31 July, 2020; v1 submitted 27 June, 2020; originally announced July 2020.

arXiv:1906.08399 [pdf, other]

Sequential Experimental Design for Transductive Linear Bandits

Authors: Tanner Fiez, Lalit Jain, Kevin Jamieson, Lillian Ratliff

Abstract: In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $δ$, and an unknown vector $θ^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\topθ^\ast$ with probability $1-δ$ by making as few sequentially chosen noisy meas… ▽ More In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $δ$, and an unknown vector $θ^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\topθ^\ast$ with probability $1-δ$ by making as few sequentially chosen noisy measurements of the form $x^\topθ^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound. △ Less

Submitted 19 June, 2019; originally announced June 2019.

arXiv:1906.01217 [pdf, other]

Convergence of Learning Dynamics in Stackelberg Games

Authors: Tanner Fiez, Benjamin Chasnov, Lillian J. Ratliff

Abstract: This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We establish a number of connections between the Nash and Stackelberg equilibrium concepts and characterize conditions under which attracting critical points of simultaneous gra… ▽ More This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We establish a number of connections between the Nash and Stackelberg equilibrium concepts and characterize conditions under which attracting critical points of simultaneous gradient descent are Stackelberg equilibria in zero-sum games. Moreover, we show that the only stable critical points of the Stackelberg gradient dynamics are Stackelberg equilibria in zero-sum games. Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games. As a result, the learning rule provably converges to a Stackelberg equilibria given an initialization in the region of attraction of a stable critical point. We then consider a follower employing a gradient-play update rule instead of a best response strategy and propose a two-timescale algorithm with similar asymptotic convergence guarantees. For this algorithm, we also provide finite-time high probability bounds for local convergence to a neighborhood of a stable Stackelberg equilibrium in general-sum games. Finally, we present extensive numerical results that validate our theory, provide insights into the optimization landscape of generative adversarial networks, and demonstrate that the learning dynamics we propose can effectively train generative adversarial networks. △ Less

Submitted 6 November, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: This version includes numerical results training generative adversarial networks

MSC Class: math.OC

arXiv:1807.02297 [pdf, other]

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

Authors: Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

Abstract: The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three di… ▽ More The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bike-sharing platform) examples. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Comments: Published as a conference paper in Conference on Uncertainty in Artificial Intelligence (UAI) 2018

arXiv:1806.05749 [pdf, other]

Adaptive Incentive Design

Authors: Lillian J. Ratliff, Tanner Fiez

Abstract: We apply control theoretic and optimization techniques to adaptively design incentives. In particular, we consider the problem of a planner with an objective that depends on data from strategic decision makers. The planner does not know the process by which the strategic agents make decisions. Under the assumption that the agents are utility maximizers, we model their interactions as a non-coopera… ▽ More We apply control theoretic and optimization techniques to adaptively design incentives. In particular, we consider the problem of a planner with an objective that depends on data from strategic decision makers. The planner does not know the process by which the strategic agents make decisions. Under the assumption that the agents are utility maximizers, we model their interactions as a non-cooperative game and utilize the Nash equilibrium concept as well as myopic update rules to model the selection of their decision. By parameterizing the agents' utility functions and the incentives offered, we develop an algorithm that the planner can employ to learn the agents' decision-making processes while simultaneously designing incentives to change their response to a more desirable response from the planner's perspective. We provide convergence results for this algorithm both in the noise-free and noisy cases and present illustrative examples. △ Less

Submitted 14 June, 2018; originally announced June 2018.

arXiv:1803.04008 [pdf, other]

Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback

Authors: Tanner Fiez, Shreyas Sekar, Lillian J. Ratliff

Abstract: We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Different than much of the work on related problems, in our formulation a learning algorithm does not have access to either a priori information or observations of the state of the Markov chain and only observes smoothed reward feedback following time interva… ▽ More We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Different than much of the work on related problems, in our formulation a learning algorithm does not have access to either a priori information or observations of the state of the Markov chain and only observes smoothed reward feedback following time intervals we refer to as epochs. We demonstrate that existing methods such as UCB and $\varepsilon$-greedy can suffer linear regret in such an environment. Employing mixing-time bounds on Markov chains, we develop algorithms called EpochUCB and EpochGreedy that draw inspiration from the aforementioned methods, yet which admit sublinear regret guarantees for the problem formulation. Our proposed algorithms proceed in epochs in which an arm is played repeatedly for a number of iterations that grows linearly as a function of the number of times an arm has been played in the past. We analyze these algorithms under two types of smoothed reward feedback at the end of each epoch: a reward that is the discount-average of the discounted rewards within an epoch, and a reward that is the time-average of the rewards within an epoch. △ Less

Submitted 1 March, 2019; v1 submitted 11 March, 2018; originally announced March 2018.

Comments: Significant revision of prior version including deeper discussion of related work, gap-independent regret bounds, and regret bounds for discounted rewards

arXiv:1712.01263 [pdf, other]

Data-Driven Spatio-Temporal Analysis of Curbside Parking Demand: A Case-Study in Seattle

Authors: Tanner Fiez, Lillian Ratliff

Abstract: Due to rapid expansion of urban areas in recent years, management of curbside parking has become increasingly important. To mitigate congestion, while meeting a city's diverse needs, performance-based pricing schemes have received a significant amount of attention. However, several recent studies suggest location, time-of-day, and awareness of policies are the primary factors that drive parking de… ▽ More Due to rapid expansion of urban areas in recent years, management of curbside parking has become increasingly important. To mitigate congestion, while meeting a city's diverse needs, performance-based pricing schemes have received a significant amount of attention. However, several recent studies suggest location, time-of-day, and awareness of policies are the primary factors that drive parking decisions. In light of this, we provide an extensive data-driven study of the spatio-temporal characteristics of curbside parking. This work advances the understanding of where and when to set pricing policies, as well as where to target information and incentives to drivers looking to park. Harnessing data provided by the Seattle Department of Transportation, we develop a Gaussian mixture model based technique to identify zones with similar spatial parking demand as quantified by spatial autocorrelation. In support of this technique, we introduce a metric based on the repeatability of our Gaussian mixture model to investigate temporal consistency. △ Less

Submitted 2 December, 2017; originally announced December 2017.

Comments: Submitted to IEEE Transactions on Intelligent Transportation Systems

arXiv:1703.07802 [pdf, other]

Optimizing Curbside Parking Resources Subject to Congestion Constraints

Authors: Chase Dowling, Tanner Fiez, Lillian Ratliff, Baosen Zhang

Abstract: To gain theoretical insight into the relationship between parking scarcity and congestion, we describe block-faces of curbside parking as a network of queues. Due to the nature of this network, canonical queueing network results are not available to us. We present a new kind of queueing network subject to customer rejection due to the lack of available servers. We provide conditions for such netwo… ▽ More To gain theoretical insight into the relationship between parking scarcity and congestion, we describe block-faces of curbside parking as a network of queues. Due to the nature of this network, canonical queueing network results are not available to us. We present a new kind of queueing network subject to customer rejection due to the lack of available servers. We provide conditions for such networks to be stable, a computationally tractable "single node" view of such a network, and show that maximizing the occupancy through price control of such queues, and subject to constraints on the allowable congestion between queues searching for an available server, is a convex optimization problem. We demonstrate an application of this method in the Mission District of San Francisco; our results suggest congestion due to drivers searching for parking stems from an inefficient spatial utilization of parking resources. △ Less

Submitted 22 March, 2017; originally announced March 2017.

Comments: Submitted to IEEE CDC, 2017. 17 pages, 9 figures

arXiv:1702.06156 [pdf, other]

How Much Urban Traffic is Searching for Parking? Simulating Curbside Parking as a Network of Finite Capacity Queues

Authors: Chase Dowling, Tanner Fiez, Lillian Ratliff, Baosen Zhang

Abstract: With the increasing availability of transaction data collected by digital parking meters, paid curbside parking can be advantageously modeled as a network of interdependent queues. In this article we introduce methods for analyzing a special class of networks of finite capacity queues, where tasks arrive from an exogenous source, join the queue if there is an available server or are rejected and m… ▽ More With the increasing availability of transaction data collected by digital parking meters, paid curbside parking can be advantageously modeled as a network of interdependent queues. In this article we introduce methods for analyzing a special class of networks of finite capacity queues, where tasks arrive from an exogenous source, join the queue if there is an available server or are rejected and move to another queue in search of service according to the network topology. Such networks can be useful for modeling curbside parking since queues in the network perform the same function and drivers searching for an available server are under combinatorial constraints and jockeying is not instantaneous. Further, we provide a motivating example for such networks of finite capacity queues in the context of drivers searching for parking in the neighborhood of Belltown in Seattle, Washington, USA. Lastly, since the stationary distribution of such networks used to model parking are difficult to satisfactorily characterize, we also introduce a simulation tool for the purpose of testing the assumptions made to estimate interesting performance metrics. Our results suggest that a Markovian relaxation of the problem when solving for the mean rate metrics is comparable to deterministic service times reflective of a driver's tendency to park for the maximum allowable time. △ Less

Submitted 11 May, 2018; v1 submitted 20 February, 2017; originally announced February 2017.

Comments: Updated May 11, 2018 (fixed formatting errors)

Showing 1–19 of 19 results for author: Fiez, T