-
Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective
Authors:
Ningsheng Zhao,
Jia Yuan Yu,
Krzysztof Dzieciolowski,
Trang Bui
Abstract:
Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outp…
▽ More
Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outputs. Moreover, the mechanism and consequences of these drawbacks have not been discussed systematically. In this paper, we propose a novel error theoretical analysis framework, in which the explanation errors of SVAs are decomposed into two components: observation bias and structural bias. We further clarify the underlying causes of these two biases and demonstrate that there is a trade-off between them. Based on this error analysis framework, we develop two novel concepts: over-informative and underinformative explanations. We demonstrate how these concepts can be effectively used to understand potential errors of existing SVA methods. In particular, for the widely deployed assumption-based SVAs, we find that they can easily be under-informative due to the distribution drift caused by distributional assumptions. We propose a measurement tool to quantify such a distribution drift. Finally, our experiments illustrate how different existing SVA methods can be over- or under-informative. Our work sheds light on how errors incur in the estimation of SVAs and encourages new less error-prone methods.
△ Less
Submitted 29 May, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
Reward Modeling for Mitigating Toxicity in Transformer-based Language Models
Authors:
Farshid Faal,
Ketra Schmitt,
Jia Yuan Yu
Abstract:
Transformer-based language models are able to generate fluent text and be efficiently adapted across various natural language generation tasks. However, language models that are pretrained on large unlabeled web text corpora have been shown to suffer from degenerating toxic content and social bias behaviors, consequently hindering their safe deployment. Various detoxification methods were proposed…
▽ More
Transformer-based language models are able to generate fluent text and be efficiently adapted across various natural language generation tasks. However, language models that are pretrained on large unlabeled web text corpora have been shown to suffer from degenerating toxic content and social bias behaviors, consequently hindering their safe deployment. Various detoxification methods were proposed to mitigate the language model's toxicity; however, these methods struggled to detoxify language models when conditioned on prompts that contain specific social identities related to gender, race, or religion. In this study, we propose Reinforce-Detoxify; A reinforcement learning-based method for mitigating toxicity in language models. We address the challenge of safety in language models and propose a new reward model that is able to detect toxic content and mitigate unintended bias towards social identities in toxicity prediction. The experiments demonstrate that the Reinforce-Detoxify method for language model detoxification outperforms existing detoxification approaches in automatic evaluation metrics, indicating the ability of our approach in language model detoxification and less prone to unintended bias toward social identities in generated content.
△ Less
Submitted 27 July, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
Multi-resource allocation for federated settings: A non-homogeneous Markov chain model
Authors:
Syed Eqbal Alam,
Fabian Wirth,
Jia Yuan Yu
Abstract:
In a federated setting, agents coordinate with a central agent or a server to solve an optimization problem in which agents do not share their information with each other. Wirth and his co-authors, in a recent paper, describe how the basic additive-increase multiplicative-decrease (AIMD) algorithm can be modified in a straightforward manner to solve a class of optimization problems for federated s…
▽ More
In a federated setting, agents coordinate with a central agent or a server to solve an optimization problem in which agents do not share their information with each other. Wirth and his co-authors, in a recent paper, describe how the basic additive-increase multiplicative-decrease (AIMD) algorithm can be modified in a straightforward manner to solve a class of optimization problems for federated settings for a single shared resource with no inter-agent communication. The AIMD algorithm is one of the most successful distributed resource allocation algorithms currently deployed in practice. It is best known as the backbone of the Internet and is also widely explored in other application areas. We extend the single-resource algorithm to multiple heterogeneous shared resources that emerge in smart cities, sharing economy, and many other applications. Our main results show the convergence of the average allocations to the optimal values. We model the system as a non-homogeneous Markov chain with place-dependent probabilities. Furthermore, simulation results are presented to demonstrate the efficacy of the algorithms and to highlight the main features of our analysis.
△ Less
Submitted 24 May, 2021; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Generating Embroidery Patterns Using Image-to-Image Translation
Authors:
Mohammad Akif Beg,
Jia Yuan Yu
Abstract:
In many scenarios in computer vision, machine learning, and computer graphics, there is a requirement to learn the map** from an image of one domain to an image of another domain, called Image-to-image translation. For example, style transfer, object transfiguration, visually altering the appearance of weather conditions in an image, changing the appearance of a day image into a night image or v…
▽ More
In many scenarios in computer vision, machine learning, and computer graphics, there is a requirement to learn the map** from an image of one domain to an image of another domain, called Image-to-image translation. For example, style transfer, object transfiguration, visually altering the appearance of weather conditions in an image, changing the appearance of a day image into a night image or vice versa, photo enhancement, to name a few. In this paper, we propose two machine learning techniques to solve the embroidery image-to-image translation. Our goal is to generate a preview image which looks similar to an embroidered image, from a user-uploaded image. Our techniques are modifications of two existing techniques, neural style transfer, and cycle-consistent generative-adversarial network. Neural style transfer renders the semantic content of an image from one domain in the style of a different image in another domain, whereas a cycle-consistent generative adversarial network learns the map** from an input image to output image without any paired training data, and also learn a loss function to train this map**. Furthermore, the techniques we propose are independent of any embroidery attributes, such as elevation of the image, light-source, start, and endpoints of a stitch, type of stitch used, fabric type, etc. Given the user image, our techniques can generate a preview image which looks similar to an embroidered image. We train and test our propose techniques on an embroidery dataset which consist of simple 2D images. To do so, we prepare an unpaired embroidery dataset with more than 8000 user-uploaded images along with embroidered images. Empirical results show that these techniques successfully generate an approximate preview of an embroidered version of a user image, which can help users in decision making.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Risk-Averse Action Selection Using Extreme Value Theory Estimates of the CVaR
Authors:
Dylan Troop,
Frédéric Godin,
Jia Yuan Yu
Abstract:
In a wide variety of sequential decision making problems, it can be important to estimate the impact of rare events in order to minimize risk exposure. A popular risk measure is the conditional value-at-risk (CVaR), which is commonly estimated by averaging observations that occur beyond a quantile at a given confidence level. When this confidence level is very high, this estimation method can exhi…
▽ More
In a wide variety of sequential decision making problems, it can be important to estimate the impact of rare events in order to minimize risk exposure. A popular risk measure is the conditional value-at-risk (CVaR), which is commonly estimated by averaging observations that occur beyond a quantile at a given confidence level. When this confidence level is very high, this estimation method can exhibit high variance due to the limited number of samples above the corresponding quantile. To mitigate this problem, extreme value theory can be used to derive an estimator for the CVaR that uses extrapolation beyond available samples. This estimator requires the selection of a threshold parameter to work well, which is a difficult challenge that has been widely studied in the extreme value theory literature. In this paper, we present an estimation procedure for the CVaR that combines extreme value theory and a recently introduced method of automated threshold selection by \cite{bader2018automated}. Under appropriate conditions, we estimate the tail risk using a generalized Pareto distribution. We compare empirically this estimation procedure with the commonly used method of sample averaging, and show an improvement in performance for some distributions. We finally show how the estimation procedure can be used in reinforcement learning by applying our method to the multi-arm bandit problem where the goal is to avoid catastrophic risk.
△ Less
Submitted 10 December, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
A Price-Based Iterative Double Auction for Charger Sharing Markets
Authors:
Jie Gao,
Terrence Wong,
Chun Wang,
Jia Yuan Yu
Abstract:
The unprecedented growth of demand for charging electric vehicles (EVs) calls for novel expansion solutions to today's charging networks. Riding on the wave of the proliferation of sharing economy, Airbnb-like charger sharing markets opens the opportunity to expand the existing charging networks without requiring costly and time-consuming infrastructure investments, yet the successful design of su…
▽ More
The unprecedented growth of demand for charging electric vehicles (EVs) calls for novel expansion solutions to today's charging networks. Riding on the wave of the proliferation of sharing economy, Airbnb-like charger sharing markets opens the opportunity to expand the existing charging networks without requiring costly and time-consuming infrastructure investments, yet the successful design of such markets relies on innovations at the interface between game theory, mechanism design, and large scale optimization. In this paper, we propose a price-based iterative double auction for charger sharing markets where charger owners rent out their under-utilized chargers to the charge-needing EV drivers. Charger owners and EV drivers form a two-sided market which is cleared by a price-based double auction. Chargers' locations, availability, and time unit costs as well as the EV drivers' time, distance constraints, and preferences are considered in the allocation and scheduling process. The goal is to compute social welfare maximizing allocations which benefits both charger owners and EV drivers and, in turn, ensure the continuous growth of the market. We prove that the proposed double auction is budget balanced, individually rational, and that it is a weakly dominant strategy for EV drivers and charger owners to truthfully report their charging time constraints. In addition, results from our computation study show that the double auction achieves on average 94% efficiency compared with the optimal solutions and scales well to larger problem instances.
△ Less
Submitted 30 September, 2019;
originally announced October 2019.
-
Variance-Based Risk Estimations in Markov Processes via Transformation with State Lum**
Authors:
Shuai Ma,
Jia Yuan Yu
Abstract:
Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic tr…
▽ More
Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lum**, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lum** in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
A Scheme for Dynamic Risk-Sensitive Sequential Decision Making
Authors:
Shuai Ma,
Jia Yuan Yu,
Ahmet Satir
Abstract:
We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the map** from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and v…
▽ More
We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the map** from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and variance of return, we generate a synthetic dataset as training data. Parameters defining a targeted process might be dynamic, i.e., they might vary over time, so we sample them within specified intervals to deal with these dynamics. We show that: i). Most risk measures can be estimated using return variance; ii). By virtue of the state-augmentation transformation, practical problems modeled by Markov decision processes with stochastic rewards can be solved in a risk-sensitive scenario; and iii). The proposed scheme is validated by a numerical experiment.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Derandomized Distributed Multi-resource Allocation with Little Communication Overhead
Authors:
Syed Eqbal Alam,
Robert Shorten,
Fabian Wirth,
Jia Yuan Yu
Abstract:
We study a class of distributed optimization problems for multiple shared resource allocation in Internet-connected devices. We propose a derandomized version of an existing stochastic additive-increase and multiplicative-decrease (AIMD) algorithm. The proposed solution uses one bit feedback signal for each resource between the system and the Internet-connected devices and does not require inter-d…
▽ More
We study a class of distributed optimization problems for multiple shared resource allocation in Internet-connected devices. We propose a derandomized version of an existing stochastic additive-increase and multiplicative-decrease (AIMD) algorithm. The proposed solution uses one bit feedback signal for each resource between the system and the Internet-connected devices and does not require inter-device communication. Additionally, the Internet-connected devices do not compromise their privacy and the solution does not dependent on the number of participating devices. In the system, each Internet-connected device has private cost functions which are strictly convex, twice continuously differentiable and increasing. We show empirically that the long-term average allocations of multiple shared resources converge to optimal allocations and the system achieves minimum social cost. Furthermore, we show that the proposed derandomized AIMD algorithm converges faster than the stochastic AIMD algorithm and both the approaches provide approximately same solutions.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Distributed Algorithms for Internet-of-Things-enabled Prosumer Markets: A Control Theoretic Perspective
Authors:
Syed Eqbal Alam,
Robert Shorten,
Fabian Wirth,
Jia Yuan Yu
Abstract:
Internet-of-Things (IoT) enables the development of sharing economy applications. In many sharing economy scenarios, agents both produce as well as consume a resource; we call them prosumers. A community of prosumers agrees to sell excess resource to another community in a prosumer market. In this chapter, we propose a control theoretic approach to regulate the number of prosumers in a prosumer co…
▽ More
Internet-of-Things (IoT) enables the development of sharing economy applications. In many sharing economy scenarios, agents both produce as well as consume a resource; we call them prosumers. A community of prosumers agrees to sell excess resource to another community in a prosumer market. In this chapter, we propose a control theoretic approach to regulate the number of prosumers in a prosumer community, where each prosumer has a cost function that is coupled through its time-averaged production and consumption of the resource. Furthermore, each prosumer runs its distributed algorithm and takes only binary decisions in a probabilistic way, whether to produce one unit of the resource or not and to consume one unit of the resource or not. In the proposed approach, prosumers do not explicitly exchange information with each other due to privacy reasons, but little exchange of information is required for feedback signals, broadcast by a central agency. In the proposed approach, prosumers achieve the optimal values asymptotically. Furthermore, the proposed approach is suitable to implement in an IoT context with minimal demands on infrastructure. We describe two use cases; community-based car sharing and collaborative energy storage for prosumer markets. We also present simulation results to check the efficacy of the algorithms.
△ Less
Submitted 25 March, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Distributed and Efficient Resource Balancing Among Many Suppliers and Consumers
Authors:
Kamal Chaturvedi,
Jia Yuan Yu,
Shrisha Rao
Abstract:
Achieving a balance of supply and demand in a multi-agent system with many individual self-interested and rational agents that act as suppliers and consumers is a natural problem in a variety of real-life domains---smart power grids, data centers, and others. In this paper, we address the profit-maximization problem for a group of distributed supplier and consumer agents, with no inter-agent commu…
▽ More
Achieving a balance of supply and demand in a multi-agent system with many individual self-interested and rational agents that act as suppliers and consumers is a natural problem in a variety of real-life domains---smart power grids, data centers, and others. In this paper, we address the profit-maximization problem for a group of distributed supplier and consumer agents, with no inter-agent communication. We simulate a scenario of a market with $S$ suppliers and $C$ consumers such that at every instant, each supplier agent supplies a certain quantity and simultaneously, each consumer agent consumes a certain quantity. The information about the total amount supplied and consumed is only kept with the center. The proposed algorithm is a combination of the classical additive-increase multiplicative-decrease (AIMD) algorithm in conjunction with a probabilistic rule for the agents to respond to a capacity signal. This leads to a nonhomogeneous Markov chain and we show almost sure convergence of this chain to the social optimum, for our market of distributed supplier and consumer agents. Employing this AIMD-type algorithm, the center sends a feedback message to the agents in the supplier side if there is a scenario of excess supply, or to the consumer agents if there is excess consumption. Each agent has a concave utility function whose derivative tends to 0 when an optimum quantity is supplied/consumed. Hence when social convergence is reached, each agent supplies or consumes a quantity which leads to its individual maximum profit, without the need of any communication. So eventually, each agent supplies or consumes a quantity which leads to its individual maximum profit, without communicating with any other agents. Our simulations show the efficacy of this approach.
△ Less
Submitted 14 September, 2018;
originally announced September 2018.
-
Efficient Single-Shot Multibox Detector for Construction Site Monitoring
Authors:
Viral Thakar,
Himani Saini,
Walid Ahmed,
Mohammad M Soltani,
Ahmed Aly,
Jia Yuan Yu
Abstract:
Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring…
▽ More
Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering --- APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge.
△ Less
Submitted 19 August, 2018; v1 submitted 16 August, 2018;
originally announced August 2018.
-
Ensemble-based Adaptive Single-shot Multi-box Detector
Authors:
Viral Thakar,
Walid Ahmed,
Mohammad M Soltani,
Jia Yuan Yu
Abstract:
We propose two improvements to the SSD---single shot multibox detector. First, we propose an adaptive approach for default box selection in SSD. This uses data to reduce the uncertainty in the selection of best aspect ratios for the default boxes and improves performance of SSD for datasets containing small and complex objects (e.g., equipments at construction sites). We do so by finding the distr…
▽ More
We propose two improvements to the SSD---single shot multibox detector. First, we propose an adaptive approach for default box selection in SSD. This uses data to reduce the uncertainty in the selection of best aspect ratios for the default boxes and improves performance of SSD for datasets containing small and complex objects (e.g., equipments at construction sites). We do so by finding the distribution of aspect ratios of the given training dataset, and then choosing representative values. Secondly, we propose an ensemble algorithm, using SSD as components, which improves the performance of SSD, especially for small amount of training datasets. Compared to the conventional SSD algorithm, adaptive box selection improves mean average precision by 3%, while ensemble-based SSD improves it by 8%.
△ Less
Submitted 16 August, 2018;
originally announced August 2018.
-
Distributed, Private, and Derandomized Allocation Algorithm for EV Charging
Authors:
Hamid Nabati,
Jia Yuan Yu
Abstract:
Efficient resource allocation is challenging when privacy of users is important. Distributed approaches have recently been used extensively to find a solution for such problems. In this work, the efficiency of distributed AIMD algorithm for allocation of subsidized goods is studied. First, a suitable utility function is assigned to each user describing the amount of satisfaction that it has from a…
▽ More
Efficient resource allocation is challenging when privacy of users is important. Distributed approaches have recently been used extensively to find a solution for such problems. In this work, the efficiency of distributed AIMD algorithm for allocation of subsidized goods is studied. First, a suitable utility function is assigned to each user describing the amount of satisfaction that it has from allocated resource. Then the resource allocation is defined as a total utilitarianism problem that is an optimization problem of sum of users utility functions subjected to capacity constraint. Recently, a stochastic state-dependent variant of AIMD algorithm is used for allocation of common goods among users with strictly increasing and concave utility functions. Here, the stochastic AIMD algorithm is derandomized and its efficiency is compared with the stochastic version. Moreover, the algorithm is improved to allocate subsidized goods to users with concave and non-monotone utility functions as well as users with Sigmoidal utility functions. To illustrate the effectiveness of the proposed solutions, simulation results is presented for a public renewable-energy powered charging station in which the electric vehicles (EV) compete to be recharged.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning
Authors:
Shuai Ma,
Jia Yuan Yu
Abstract:
In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected cumulative reward only, this simpl…
▽ More
In the framework of MDP, although the general reward function takes three arguments-current state, action, and successor state; it is often simplified to a function of two arguments-current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected cumulative reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We present state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the direct use of Q-learning, and show how the SATs enable a variance formula to work on Markov processes with general reward functions.
△ Less
Submitted 29 November, 2018; v1 submitted 16 April, 2018;
originally announced April 2018.
-
The Merits of Sharing a Ride
Authors:
Pooyan Ehsani,
Jia Yuan Yu
Abstract:
The culture of sharing instead of ownership is sharply increasing in individuals behaviors. Particularly in transportation, concepts of sharing a ride in either carpooling or ridesharing have been recently adopted. An efficient optimization approach to match passengers in real-time is the core of any ridesharing system. In this paper, we model ridesharing as an online matching problem on general g…
▽ More
The culture of sharing instead of ownership is sharply increasing in individuals behaviors. Particularly in transportation, concepts of sharing a ride in either carpooling or ridesharing have been recently adopted. An efficient optimization approach to match passengers in real-time is the core of any ridesharing system. In this paper, we model ridesharing as an online matching problem on general graphs such that passengers do not drive private cars and use shared taxis. We propose an optimization algorithm to solve it. The outlined algorithm calculates the optimal waiting time when a passenger arrives. This leads to a matching with minimal overall overheads while maximizing the number of partnerships. To evaluate the behavior of our algorithm, we used NYC taxi real-life data set. Results represent a substantial reduction in overall overheads.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Distributed Multi-resource Allocation with Little Communication Overhead
Authors:
Syed Eqbal Alam,
Robert Shorten,
Fabian Wirth,
Jia Yuan Yu
Abstract:
We propose a distributed algorithm to solve a special distributed multi-resource allocation problem with no direct inter-agent communication. We do so by extending a recently introduced additive-increase multiplicative-decrease (AIMD) algorithm, which only uses very little communication between the system and agents. Namely, a control unit broadcasts a one-bit signal to agents whenever one of the…
▽ More
We propose a distributed algorithm to solve a special distributed multi-resource allocation problem with no direct inter-agent communication. We do so by extending a recently introduced additive-increase multiplicative-decrease (AIMD) algorithm, which only uses very little communication between the system and agents. Namely, a control unit broadcasts a one-bit signal to agents whenever one of the allocated resources exceeds capacity. Agents then respond to this signal in a probabilistic manner. In the proposed algorithm, each agent is unaware of the resource allocation of other agents. We also propose a version of the AIMD algorithm for multiple binary resources (e.g., parking spaces). Binary resources are indivisible unit-demand resources, and each agent either allocated one unit of the resource or none. In empirical results, we observe that in both cases, the average allocations converge over time to optimal allocations.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk
Authors:
Shuai Ma,
Jia Yuan Yu
Abstract:
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with t…
▽ More
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov reward process with the reward function on current state only, we present a transformation algorithm for the Markov reward process with the reward function on current and next states, in order to estimate the VaR function with an intact total reward distribution.
△ Less
Submitted 29 November, 2018; v1 submitted 6 December, 2016;
originally announced December 2016.
-
Pricing Vehicle Sharing with Proximity Information
Authors:
Jakub Marecek,
Robert Shorten,
Jia Yuan Yu
Abstract:
For vehicle sharing schemes, where drop-off positions are not fixed, we propose a pricing scheme, where the price depends in part on the distance between where a vehicle is being dropped off and where the closest shared vehicle is parked. Under certain restrictive assumptions, we show that this pricing leads to a socially optimal spread of the vehicles within a region.
For vehicle sharing schemes, where drop-off positions are not fixed, we propose a pricing scheme, where the price depends in part on the distance between where a vehicle is being dropped off and where the closest shared vehicle is parked. Under certain restrictive assumptions, we show that this pricing leads to a socially optimal spread of the vehicles within a region.
△ Less
Submitted 25 January, 2016;
originally announced January 2016.
-
Two Phase $Q-$learning for Bidding-based Vehicle Sharing
Authors:
Yinlam Chow,
Jia Yuan Yu,
Marco Pavone
Abstract:
We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and…
▽ More
We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.
△ Less
Submitted 19 October, 2015; v1 submitted 29 September, 2015;
originally announced September 2015.
-
Signalling and obfuscation for congestion control
Authors:
Jakub Marecek,
Robert Shorten,
Jia Yuan Yu
Abstract:
We aim to reduce the social cost of congestion in many smart city applications. In our model of congestion, agents interact over limited resources after receiving signals from a central agent that observes the state of congestion in real time. Under natural models of agent populations, we develop new signalling schemes and show that by introducing a non-trivial amount of uncertainty in the signals…
▽ More
We aim to reduce the social cost of congestion in many smart city applications. In our model of congestion, agents interact over limited resources after receiving signals from a central agent that observes the state of congestion in real time. Under natural models of agent populations, we develop new signalling schemes and show that by introducing a non-trivial amount of uncertainty in the signals, we reduce the social cost of congestion, i.e., improve social welfare. The signalling schemes are efficient in terms of both communication and computation, and are consistent with past observations of the congestion. Moreover, the resulting population dynamics converge under reasonable assumptions.
△ Less
Submitted 4 May, 2016; v1 submitted 30 June, 2014;
originally announced June 2014.
-
Functional Bandits
Authors:
Long Tran-Thanh,
Jia Yuan Yu
Abstract:
We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combin…
▽ More
We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combines functional estimation and arm elimination, to tackle this problem. This method achieves provably efficient performance guarantees. In addition, we illustrate this method on a number of important functionals in risk management and information theory, and refine our generic theoretical results in those cases.
△ Less
Submitted 10 May, 2014;
originally announced May 2014.
-
r-Extreme Signalling for Congestion Control
Authors:
Jakub Marecek,
Robert Shorten,
Jia Yuan Yu
Abstract:
In many "smart city" applications, congestion arises in part due to the nature of signals received by individuals from a central authority. In the model of Marecek et al. [arXiv:1406.7639, Int. J. Control 88(10), 2015], each agent uses one out of multiple resources at each time instant. The per-use cost of a resource depends on the number of concurrent users. A central authority has up-to-date kno…
▽ More
In many "smart city" applications, congestion arises in part due to the nature of signals received by individuals from a central authority. In the model of Marecek et al. [arXiv:1406.7639, Int. J. Control 88(10), 2015], each agent uses one out of multiple resources at each time instant. The per-use cost of a resource depends on the number of concurrent users. A central authority has up-to-date knowledge of the congestion across all resources and uses randomisation to provide a scalar or an interval for each resource at each time. In this paper, the interval to broadcast per resource is obtained by taking the minima and maxima of costs observed within a time window of length r, rather than by randomisation. We show that the resulting distribution of agents across resources also converges in distribution, under plausible assumptions about the evolution of the population over time.
△ Less
Submitted 31 March, 2016; v1 submitted 9 April, 2014;
originally announced April 2014.
-
Adaptive and optimal online linear regression on $\ell^1$-balls
Authors:
Sébastien Gerchinovitz,
Jia Yuan Yu
Abstract:
We consider the problem of online linear regression on individual sequences. The goal in this paper is for the forecaster to output sequential predictions which are, after $T$ time rounds, almost as good as the ones output by the best linear predictor in a given $\ell^1$-ball in $\\R^d$. We consider both the cases where the dimension~$d$ is small and large relative to the time horizon $T$. We firs…
▽ More
We consider the problem of online linear regression on individual sequences. The goal in this paper is for the forecaster to output sequential predictions which are, after $T$ time rounds, almost as good as the ones output by the best linear predictor in a given $\ell^1$-ball in $\\R^d$. We consider both the cases where the dimension~$d$ is small and large relative to the time horizon $T$. We first present regret bounds with optimal dependencies on $d$, $T$, and on the sizes $U$, $X$ and $Y$ of the $\ell^1$-ball, the input data and the observations. The minimax regret is shown to exhibit a regime transition around the point $d = \sqrt{T} U X / (2 Y)$. Furthermore, we present efficient algorithms that are adaptive, \ie, that do not require the knowledge of $U$, $X$, $Y$, and $T$, but still achieve nearly optimal regret bounds.
△ Less
Submitted 16 January, 2019; v1 submitted 20 May, 2011;
originally announced May 2011.