-
Predicting Grades
Authors:
Yannick Meier,
Jie Xu,
Onur Atan,
Mihaela van der Schaar
Abstract:
To increase efficacy in traditional classroom courses as well as in Massive Open Online Courses (MOOCs), automated systems supporting the instructor are needed. One important problem is to automatically detect students that are going to do poorly in a course early enough to be able to take remedial actions. Existing grade prediction systems focus on maximizing the accuracy of the prediction while…
▽ More
To increase efficacy in traditional classroom courses as well as in Massive Open Online Courses (MOOCs), automated systems supporting the instructor are needed. One important problem is to automatically detect students that are going to do poorly in a course early enough to be able to take remedial actions. Existing grade prediction systems focus on maximizing the accuracy of the prediction while overseeing the importance of issuing timely and personalized predictions. This paper proposes an algorithm that predicts the final grade of each student in a class. It issues a prediction for each student individually, when the expected accuracy of the prediction is sufficient. The algorithm learns online what is the optimal prediction and time to issue a prediction based on past history of students' performance in a course. We derive a confidence estimate for the prediction accuracy and demonstrate the performance of our algorithm on a dataset obtained based on the performance of approximately 700 UCLA undergraduate students who have taken an introductory digital signal processing over the past 7 years. We demonstrate that for 85% of the students we can predict with 76% accuracy whether they are going do well or poorly in the class after the 4th course week. Using data obtained from a pilot course, our methodology suggests that it is effective to perform early in-class assessments such as quizzes, which result in timely performance prediction for each student, thereby enabling timely interventions by the instructor (at the student or class level) when necessary.
△ Less
Submitted 18 March, 2016; v1 submitted 16 August, 2015;
originally announced August 2015.
-
Episodic Multi-armed Bandits
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
We introduce a new class of reinforcement learning methods referred to as {\em episodic multi-armed bandits} (eMAB). In eMAB the learner proceeds in {\em episodes}, each composed of several {\em steps}, in which it chooses an action and observes a feedback signal. Moreover, in each step, it can take a special action, called the $stop$ action, that ends the current episode. After the $stop$ action…
▽ More
We introduce a new class of reinforcement learning methods referred to as {\em episodic multi-armed bandits} (eMAB). In eMAB the learner proceeds in {\em episodes}, each composed of several {\em steps}, in which it chooses an action and observes a feedback signal. Moreover, in each step, it can take a special action, called the $stop$ action, that ends the current episode. After the $stop$ action is taken, the learner collects a terminal reward, and observes the costs and terminal rewards associated with each step of the episode. The goal of the learner is to maximize its cumulative gain (i.e., the terminal reward minus costs) over all episodes by learning to choose the best sequence of actions based on the feedback. First, we define an {\em oracle} benchmark, which sequentially selects the actions that maximize the expected immediate gain. Then, we propose our online learning algorithm, named {\em FeedBack Adaptive Learning} (FeedBAL), and prove that its regret with respect to the benchmark is bounded with high probability and increases logarithmically in expectation. Moreover, the regret only has polynomial dependence on the number of steps, actions and states. eMAB can be used to model applications that involve humans in the loop, ranging from personalized medical screening to personalized web-based education, where sequences of actions are taken in each episode, and optimal behavior requires adapting the chosen actions based on the feedback.
△ Less
Submitted 11 March, 2018; v1 submitted 3 August, 2015;
originally announced August 2015.
-
Evolution of Social Networks: A Microfounded Model
Authors:
Ahmed M. Alaa,
Kartik Ahuja,
Mihaela van der Schaar
Abstract:
Many societies are organized in networks that are formed by people who meet and interact over time. In this paper, we present a first model to capture the micro-foundations of social networks evolution, where boundedly rational agents of different types join the network; meet other agents stochastically over time; and consequently decide to form social ties. A basic premise of our model is that in…
▽ More
Many societies are organized in networks that are formed by people who meet and interact over time. In this paper, we present a first model to capture the micro-foundations of social networks evolution, where boundedly rational agents of different types join the network; meet other agents stochastically over time; and consequently decide to form social ties. A basic premise of our model is that in real-world networks, agents form links by reasoning about the benefits that agents they meet over time can bestow. We study the evolution of the emerging networks in terms of friendship and popularity acquisition given the following exogenous parameters: structural opportunism, type distribution, homophily, and social gregariousness. We show that the time needed for an agent to find "friends" is influenced by the exogenous parameters: agents who are more gregarious, more homophilic, less opportunistic, or belong to a type "minority" spend a longer time on average searching for friendships. Moreover, we show that preferential attachment is a consequence of an emerging doubly preferential meeting process: a process that guides agents of a certain type to meet more popular similar-type agents with a higher probability, thereby creating asymmetries in the popularity evolution of different types of agents.
△ Less
Submitted 14 August, 2015; v1 submitted 2 August, 2015;
originally announced August 2015.
-
Reputational Learning and Network Dynamics
Authors:
Simpson Zhang,
Mihaela van der Schaar
Abstract:
In many real world networks agents are initially unsure of each other's qualities and must learn about each other over time via repeated interactions. This paper is the first to provide a methodology for studying the dynamics of such networks, taking into account that agents differ from each other, that they begin with incomplete information, and that they must learn through past experiences which…
▽ More
In many real world networks agents are initially unsure of each other's qualities and must learn about each other over time via repeated interactions. This paper is the first to provide a methodology for studying the dynamics of such networks, taking into account that agents differ from each other, that they begin with incomplete information, and that they must learn through past experiences which connections/links to form and which to break. The network dynamics in our model vary drastically from the dynamics in models of complete information. With incomplete information and learning, agents who provide high benefits will develop high reputations and remain in the network, while agents who provide low benefits will drop in reputation and become ostracized. We show, among many other things, that the information to which agents have access and the speed at which they learn and act can have a tremendous impact on the resulting network dynamics. Using our model, we can also compute the ex ante social welfare given an arbitrary initial network, which allows us to characterize the socially optimal network structures for different sets of agents. Importantly, we show through examples that the optimal network structure depends sharply on both the initial beliefs of the agents, as well as the rate of learning by the agents. Due to the potential negative consequences of ostracism, it may be necessary to place agents with lower initial reputations at less central positions within the network.
△ Less
Submitted 8 June, 2016; v1 submitted 14 July, 2015;
originally announced July 2015.
-
Efficient Interference Management Policies for Femtocell Networks
Authors:
Kartik Ahuja,
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
Managing interference in a network of macrocells underlaid with femtocells presents an important, yet challenging problem. A majority of spatial (frequency/time) reuse based approaches partition the users based on coloring the interference graph, which is shown to be suboptimal. Some spatial time reuse based approaches schedule the maximal independent sets (MISs) in a cyclic, (weighted) round-robi…
▽ More
Managing interference in a network of macrocells underlaid with femtocells presents an important, yet challenging problem. A majority of spatial (frequency/time) reuse based approaches partition the users based on coloring the interference graph, which is shown to be suboptimal. Some spatial time reuse based approaches schedule the maximal independent sets (MISs) in a cyclic, (weighted) round-robin fashion, which is inefficient for delay-sensitive applications. Our proposed policies schedule the MISs in a non-cyclic fashion, which aim to optimize any given network performance criterion for delay-sensitive applications while fulfilling minimum throughput requirements of the users. Importantly, we do not take the interference graph as given as in existing works; we propose an optimal construction of the interference graph. We prove that under certain conditions, the proposed policy achieves the optimal network performance. For large networks, we propose a low-complexity algorithm for computing the proposed policy. We show that the policy computed achieves a constant competitive ratio (with respect to the optimal network performance), which is independent of the network size, under wide range of deployment scenarios. The policy can be implemented in a decentralized manner by the users. Compared to the existing policies, our proposed policies can achieve improvement of up to 130 % in large-scale deployments.
△ Less
Submitted 27 April, 2015;
originally announced April 2015.
-
Global Bandits
Authors:
Onur Atan,
Cem Tekin,
Mihaela van der Schaar
Abstract:
Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the reward distributions of each arm are independent. But in a wide variety of decision problems -- from drug dosage to dynamic pricing -- the expected rewards of di…
▽ More
Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the reward distributions of each arm are independent. But in a wide variety of decision problems -- from drug dosage to dynamic pricing -- the expected rewards of different arms are correlated, so that selecting one arm provides information about the expected rewards of other arms as well. We propose and analyze a class of models of such decision problems, which we call {\em global bandits}. In the case in which rewards of all arms are deterministic functions of a single unknown parameter, we construct a greedy policy that achieves {\em bounded regret}, with a bound that depends on the single true parameter of the problem. Hence, this policy selects suboptimal arms only finitely many times with probability one. For this case we also obtain a bound on regret that is {\em independent of the true parameter}; this bound is sub-linear, with an exponent that depends on the informativeness of the arms. We also propose a variant of the greedy policy that achieves $\tilde{\mathcal{O}}(\sqrt{T})$ worst-case and $\mathcal{O}(1)$ parameter dependent regret. Finally, we perform experiments on dynamic pricing and show that the proposed algorithms achieve significant gains with respect to the well-known benchmarks.
△ Less
Submitted 21 March, 2018; v1 submitted 28 March, 2015;
originally announced March 2015.
-
Self-organizing Networks of Information Gathering Cognitive Agents
Authors:
Ahmed M. Alaa,
Kartik Ahuja,
Mihaela Van der Schaar
Abstract:
In many scenarios, networks emerge endogenously as cognitive agents establish links in order to exchange information. Network formation has been widely studied in economics, but only on the basis of simplistic models that assume that the value of each additional piece of information is constant. In this paper we present a first model and associated analysis for network formation under the much mor…
▽ More
In many scenarios, networks emerge endogenously as cognitive agents establish links in order to exchange information. Network formation has been widely studied in economics, but only on the basis of simplistic models that assume that the value of each additional piece of information is constant. In this paper we present a first model and associated analysis for network formation under the much more realistic assumption that the value of each additional piece of information depends on the type of that piece of information and on the information already possessed: information may be complementary or redundant. We model the formation of a network as a non-cooperative game in which the actions are the formation of links and the benefit of forming a link is the value of the information exchanged minus the cost of forming the link. We characterize the topologies of the networks emerging at a Nash equilibrium (NE) of this game and compare the efficiency of equilibrium networks with the efficiency of centrally designed networks. To quantify the impact of information redundancy and linking cost on social information loss, we provide estimates for the Price of Anarchy (PoA); to quantify the impact on individual information loss we introduce and provide estimates for a measure we call Maximum Information Loss (MIL). Finally, we consider the setting in which agents are not endowed with information, but must produce it. We show that the validity of the well-known "law of the few" depends on how information aggregates; in particular, the "law of the few" fails when information displays complementarities.
△ Less
Submitted 12 August, 2015; v1 submitted 16 March, 2015;
originally announced March 2015.
-
Contextual Online Learning for Multimedia Content Aggregation
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
The last decade has witnessed a tremendous growth in the volume as well as the diversity of multimedia content generated by a multitude of sources (news agencies, social media, etc.). Faced with a variety of content choices, consumers are exhibiting diverse preferences for content; their preferences often depend on the context in which they consume content as well as various exogenous events. To s…
▽ More
The last decade has witnessed a tremendous growth in the volume as well as the diversity of multimedia content generated by a multitude of sources (news agencies, social media, etc.). Faced with a variety of content choices, consumers are exhibiting diverse preferences for content; their preferences often depend on the context in which they consume content as well as various exogenous events. To satisfy the consumers' demand for such diverse content, multimedia content aggregators (CAs) have emerged which gather content from numerous multimedia sources. A key challenge for such systems is to accurately predict what type of content each of its consumers prefers in a certain context, and adapt these predictions to the evolving consumers' preferences, contexts and content characteristics. We propose a novel, distributed, online multimedia content aggregation framework, which gathers content generated by multiple heterogeneous producers to fulfill its consumers' demand for content. Since both the multimedia content characteristics and the consumers' preferences and contexts are unknown, the optimal content aggregation strategy is unknown a priori. Our proposed content aggregation algorithm is able to learn online what content to gather and how to match content and users by exploiting similarities between consumer types. We prove bounds for our proposed learning algorithms that guarantee both the accuracy of the predictions as well as the learning speed. Importantly, our algorithms operate efficiently even when feedback from consumers is missing or content and preferences evolve over time. Illustrative results highlight the merits of the proposed content aggregation system in a variety of settings.
△ Less
Submitted 24 March, 2015; v1 submitted 7 February, 2015;
originally announced February 2015.
-
RELEAF: An Algorithm for Learning and Exploiting Relevance
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
Recommender systems, medical diagnosis, network security, etc., require on-going learning and decision-making in real time. These -- and many others -- represent perfect examples of the opportunities and difficulties presented by Big Data: the available information often arrives from a variety of sources and has diverse features so that learning from all the sources may be valuable but integrating…
▽ More
Recommender systems, medical diagnosis, network security, etc., require on-going learning and decision-making in real time. These -- and many others -- represent perfect examples of the opportunities and difficulties presented by Big Data: the available information often arrives from a variety of sources and has diverse features so that learning from all the sources may be valuable but integrating what is learned is subject to the curse of dimensionality. This paper develops and analyzes algorithms that allow efficient learning and decision-making while avoiding the curse of dimensionality. We formalize the information available to the learner/decision-maker at a particular time as a context vector which the learner should consider when taking actions. In general the context vector is very high dimensional, but in many settings, the most relevant information is embedded into only a few relevant dimensions. If these relevant dimensions were known in advance, the problem would be simple -- but they are not. Moreover, the relevant dimensions may be different for different actions. Our algorithm learns the relevant dimensions for each action, and makes decisions based in what it has learned. Formally, we build on the structure of a contextual multi-armed bandit by adding and exploiting a relevance relation. We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces to $\tilde{O}(T^{2(\sqrt{2}-1)})$; in the absence of a relevance relation, the best known contextual bandit algorithms achieve regret $\tilde{O}(T^{(D+1)/(D+2)})$, where $D$ is the full dimension of the context vector.
△ Less
Submitted 7 February, 2015; v1 submitted 4 February, 2015;
originally announced February 2015.
-
Information-Sharing over Adaptive Networks with Self-interested Agents
Authors:
Chung-Kai Yu,
Mihaela van der Schaar,
Ali H. Sayed
Abstract:
We examine the behavior of multi-agent networks where information-sharing is subject to a positive communications cost over the edges linking the agents. We consider a general mean-square-error formulation where all agents are interested in estimating the same target vector. We first show that, in the absence of any incentives to cooperate, the optimal strategy for the agents is to behave in a sel…
▽ More
We examine the behavior of multi-agent networks where information-sharing is subject to a positive communications cost over the edges linking the agents. We consider a general mean-square-error formulation where all agents are interested in estimating the same target vector. We first show that, in the absence of any incentives to cooperate, the optimal strategy for the agents is to behave in a selfish manner with each agent seeking the optimal solution independently of the other agents. Pareto inefficiency arises as a result of the fact that agents are not using historical data to predict the behavior of their neighbors and to know whether they will reciprocate and participate in sharing information. Motivated by this observation, we develop a reputation protocol to summarize the opponent's past actions into a reputation score, which can then be used to form a belief about the opponent's subsequent actions. The reputation protocol entices agents to cooperate and turns their optimal strategy into an action-choosing strategy that enhances the overall social benefit of the network. In particular, we show that when the communications cost becomes large, the expected social benefit of the proposed protocol outperforms the social benefit that is obtained by cooperative agents that always share data. We perform a detailed mean-square-error analysis of the evolution of the network over three domains: far field, near-field, and middle-field, and show that the network behavior is stable for sufficiently small step-sizes. The various theoretical results are illustrated by numerical simulations.
△ Less
Submitted 18 June, 2015; v1 submitted 3 December, 2014;
originally announced December 2014.
-
Towards a Theory of Societal Co-Evolution: Individualism versus Collectivism
Authors:
Kartik Ahuja,
Simpson Zhang,
Mihaela van der Schaar
Abstract:
Substantial empirical research has shown that the level of individualism vs. collectivism is one of the most critical and important determinants of societal traits, such as economic growth, economic institutions and health conditions. But the exact nature of this impact has thus far not been well understood in an analytical setting. In this work, we develop one of the first theoretical models that…
▽ More
Substantial empirical research has shown that the level of individualism vs. collectivism is one of the most critical and important determinants of societal traits, such as economic growth, economic institutions and health conditions. But the exact nature of this impact has thus far not been well understood in an analytical setting. In this work, we develop one of the first theoretical models that analytically studies the impact of individualism-collectivism on the society. We model the growth of an individual's welfare (wealth, resources and health) as depending not only on himself, but also on the level of collectivism, i.e. the level of dependence on the rest of the individuals in the society, which leads to a co-evolutionary setting. Based on our model, we are able to predict the impact of individualism-collectivism on various societal metrics, such as average welfare, average life-time, total population, cumulative welfare and average inequality. We analytically show that individualism has a positive impact on average welfare and cumulative welfare, but comes with the drawbacks of lower average life-time, lower total population and higher average inequality.
△ Less
Submitted 18 November, 2014;
originally announced November 2014.
-
Distributed Interference Management Policies for Heterogeneous Small Cell Networks
Authors:
Kartik Ahuja,
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
We study the problem of interference management in large-scale small cell networks, where each user equipment (UE) needs to determine in a distributed manner when and at what power level it should transmit to its serving small cell base station (SBS) such that a given network performance criterion is maximized subject to minimum quality of service (QoS) requirements by the UEs. We first propose a…
▽ More
We study the problem of interference management in large-scale small cell networks, where each user equipment (UE) needs to determine in a distributed manner when and at what power level it should transmit to its serving small cell base station (SBS) such that a given network performance criterion is maximized subject to minimum quality of service (QoS) requirements by the UEs. We first propose a distributed algorithm for the UE-SBS pairs to find a subset of weakly interfering UE-SBS pairs, namely the maximal independent sets (MISs) of the interference graph in logarithmic time (with respect to the number of UEs). Then we propose a novel problem formulation which enables UE-SBS pairs to determine the optimal fractions of time occupied by each MIS in a distributed manner. We analytically bound the performance of our distributed policy in terms of the competitive ratio with respect to the optimal network performance, which is obtained in a centralized manner with NP (non-deterministic polynomial time) complexity. Remarkably, the competitive ratio is independent of the network size, which guarantees scalability in terms of performance for arbitrarily large networks. Through simulations, we show that our proposed policies achieve significant performance improvements (from 150% to 700%) over the existing policies.
△ Less
Submitted 7 March, 2015; v1 submitted 18 November, 2014;
originally announced November 2014.
-
Jamming Bandits
Authors:
SaiDhiraj Amuru,
Cem Tekin,
Mihaela van der Schaar,
R. Michael Buehrer
Abstract:
Can an intelligent jammer learn and adapt to unknown environments in an electronic warfare-type scenario? In this paper, we answer this question in the positive, by develo** a cognitive jammer that adaptively and optimally disrupts the communication between a victim transmitter-receiver pair. We formalize the problem using a novel multi-armed bandit framework where the jammer can choose various…
▽ More
Can an intelligent jammer learn and adapt to unknown environments in an electronic warfare-type scenario? In this paper, we answer this question in the positive, by develo** a cognitive jammer that adaptively and optimally disrupts the communication between a victim transmitter-receiver pair. We formalize the problem using a novel multi-armed bandit framework where the jammer can choose various physical layer parameters such as the signaling scheme, power level and the on-off/pulsing duration in an attempt to obtain power efficient jamming strategies. We first present novel online learning algorithms to maximize the jamming efficacy against static transmitter-receiver pairs and prove that our learning algorithm converges to the optimal (in terms of the error rate inflicted at the victim and the energy used) jamming strategy. Even more importantly, we prove that the rate of convergence to the optimal jamming strategy is sub-linear, i.e. the learning is fast in comparison to existing reinforcement learning algorithms, which is particularly important in dynamically changing wireless environments. Also, we characterize the performance of the proposed bandit-based learning algorithm against multiple static and adaptive transmitter-receiver pairs.
△ Less
Submitted 13 November, 2014;
originally announced November 2014.
-
Incentive Design in Peer Review: Rating and Repeated Endogenous Matching
Authors:
Yuanzhang Xiao,
Florian Dörfler,
Mihaela van der Schaar
Abstract:
Peer review (e.g., grading assignments in Massive Open Online Courses (MOOCs), academic paper review) is an effective and scalable method to evaluate the products (e.g., assignments, papers) of a large number of agents when the number of dedicated reviewing experts (e.g., teaching assistants, editors) is limited. Peer review poses two key challenges: 1) identifying the reviewers' intrinsic capabil…
▽ More
Peer review (e.g., grading assignments in Massive Open Online Courses (MOOCs), academic paper review) is an effective and scalable method to evaluate the products (e.g., assignments, papers) of a large number of agents when the number of dedicated reviewing experts (e.g., teaching assistants, editors) is limited. Peer review poses two key challenges: 1) identifying the reviewers' intrinsic capabilities (i.e., adverse selection) and 2) incentivizing the reviewers to exert high effort (i.e., moral hazard). Some works in mechanism design address pure adverse selection using one-shot matching rules, and pure moral hazard was addressed in repeated games with exogenously given and fixed matching rules. However, in peer review systems exhibiting both adverse selection and moral hazard, one-shot or exogenous matching rules do not link agents' current behavior with future matches and future payoffs, and as we prove, will induce myopic behavior (i.e., exerting the lowest effort) resulting in the lowest review quality.
In this paper, we propose for the first time a solution that simultaneously solves adverse selection and moral hazard. Our solution exploits the repeated interactions of agents, utilizes ratings to summarize agents' past review quality, and designs matching rules that endogenously depend on agents' ratings. Our proposed matching rules are easy to implement and require no knowledge about agents' private information (e.g., their benefit and cost functions). Yet, they are effective in guiding the system to an equilibrium where the agents are incentivized to exert high effort and receive ratings that precisely reflect their review quality. Using several illustrative examples, we quantify the significant performance gains obtained by our proposed mechanism as compared to existing one-shot or exogenous matching rules.
△ Less
Submitted 8 November, 2014;
originally announced November 2014.
-
A Dynamic Network Formation Model for Understanding Bacterial Self-Organization into Micro-Colonies
Authors:
Luca Canzian,
Kun Zhao,
Gerard C. L. Wong,
Mihaela van der Schaar
Abstract:
We propose a general parametrizable model to capture the dynamic interaction among bacteria in the formation of micro-colonies. micro-colonies represent the first social step towards the formation of structured multicellular communities known as bacterial biofilms, which protect the bacteria against antimicrobials. In our model, bacteria can form links in the form of intercellular adhesins (such a…
▽ More
We propose a general parametrizable model to capture the dynamic interaction among bacteria in the formation of micro-colonies. micro-colonies represent the first social step towards the formation of structured multicellular communities known as bacterial biofilms, which protect the bacteria against antimicrobials. In our model, bacteria can form links in the form of intercellular adhesins (such as polysaccharides) to collaborate in the production of resources that are fundamental to protect them against antimicrobials. Since maintaining a link can be costly, we assume that each bacterium forms and maintains a link only if the benefit received from the link is larger than the cost, and we formalize the interaction among bacteria as a dynamic network formation game. We rigorously characterize some of the key properties of the network evolution depending on the parameters of the system. In particular, we derive the parameters under which it is guaranteed that all bacteria will join micro-colonies and the parameters under which it is guaranteed that some bacteria will not join micro-colonies. Importantly, our study does not only characterize the properties of networks emerging in equilibrium, but it also provides important insights on how the network dynamically evolves and on how the formation history impacts the emerging networks in equilibrium. This analysis can be used to develop methods to influence on- the-fly the evolution of the network, and such methods can be useful to treat or prevent biofilm-related diseases.
△ Less
Submitted 23 October, 2014;
originally announced October 2014.
-
Global Bandits with Holder Continuity
Authors:
Onur Atan,
Cem Tekin,
Mihaela van der Schaar
Abstract:
Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, we introduce and formalize the Glob…
▽ More
Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, we introduce and formalize the Global MAB (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time steps with probability one. In addition, we also study how the informativeness of the arms about each other's rewards affects the speed of learning. Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the informativeness of the arms. We also prove a sublinear in time Bayesian risk bound for the GMAB which reduces to the well-known Bayesian risk bound for linearly parameterized bandits when the arms are fully informative. GMABs have applications ranging from drug and treatment discovery to dynamic pricing.
△ Less
Submitted 29 October, 2014;
originally announced October 2014.
-
eTutor: Online Learning for Personalized Education
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
Given recent advances in information technology and artificial intelligence, web-based education systems have became complementary and, in some cases, viable alternatives to traditional classroom teaching. The popularity of these systems stems from their ability to make education available to a large demographics (see MOOCs). However, existing systems do not take advantage of the personalization w…
▽ More
Given recent advances in information technology and artificial intelligence, web-based education systems have became complementary and, in some cases, viable alternatives to traditional classroom teaching. The popularity of these systems stems from their ability to make education available to a large demographics (see MOOCs). However, existing systems do not take advantage of the personalization which becomes possible when web-based education is offered: they continue to be one-size-fits-all. In this paper, we aim to provide a first systematic method for designing a personalized web-based education system. Personalizing education is challenging: (i) students need to be provided personalized teaching and training depending on their contexts (e.g. classes already taken, methods of learning preferred, etc.), (ii) for each specific context, the best teaching and training method (e.g type and order of teaching materials to be shown) must be learned, (iii) teaching and training should be adapted online, based on the scores/feedback (e.g. tests, quizzes, final exam, likes/dislikes etc.) of the students. Our personalized online system, e-Tutor, is able to address these challenges by learning how to adapt the teaching methodology (in this case what sequence of teaching material to present to a student) to maximize her performance in the final exam, while minimizing the time spent by the students to learn the course (and possibly dropouts). We illustrate the efficiency of the proposed method on a real-world eTutor platform which is used for remedial training for a Digital Signal Processing (DSP) course.
△ Less
Submitted 14 October, 2014;
originally announced October 2014.
-
Adaptive Prioritized Random Linear Coding and Scheduling for Layered Data Delivery from Multiple Servers
Authors:
Nikolaos Thomos,
Eymen Kurdoglu,
Pascal Frossard,
Mihaela Van der Schaar
Abstract:
In this paper, we deal with the problem of jointly determining the optimal coding strategy and the scheduling decisions when receivers obtain layered data from multiple servers. The layered data is encoded by means of Prioritized Random Linear Coding (PRLC) in order to be resilient to channel loss while respecting the unequal levels of importance in the data, and data blocks are transmitted simult…
▽ More
In this paper, we deal with the problem of jointly determining the optimal coding strategy and the scheduling decisions when receivers obtain layered data from multiple servers. The layered data is encoded by means of Prioritized Random Linear Coding (PRLC) in order to be resilient to channel loss while respecting the unequal levels of importance in the data, and data blocks are transmitted simultaneously in order to reduce decoding delays and improve the delivery performance. We formulate the optimal coding and scheduling decisions problem in our novel framework with the help of Markov Decision Processes (MDP), which are effective tools for modeling adapting streaming systems. Reinforcement learning approaches are then proposed to derive reduced computational complexity solutions to the adaptive coding and scheduling problems. The novel reinforcement learning approaches and the MDP solution are examined in an illustrative example for scalable video transmission. Our methods offer large performance gains over competing methods that deliver the data blocks sequentially. The experimental evaluation also shows that our novel algorithms offer continuous playback and guarantee small quality variations which is not the case for baseline solutions. Finally, our work highlights the advantages of reinforcement learning algorithms to forecast the temporal evolution of data demands and to decide the optimal coding and scheduling decisions.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.
-
Forecasting Popularity of Videos using Social Media
Authors:
Jie Xu,
Mihaela van der Schaar,
Jiangchuan Liu,
Haitao Li
Abstract:
This paper presents a systematic online prediction method (Social-Forecast) that is capable to accurately forecast the popularity of videos promoted by social media. Social-Forecast explicitly considers the dynamically changing and evolving propagation patterns of videos in social media when making popularity forecasts, thereby being situation and context aware. Social-Forecast aims to maximize th…
▽ More
This paper presents a systematic online prediction method (Social-Forecast) that is capable to accurately forecast the popularity of videos promoted by social media. Social-Forecast explicitly considers the dynamically changing and evolving propagation patterns of videos in social media when making popularity forecasts, thereby being situation and context aware. Social-Forecast aims to maximize the forecast reward, which is defined as a tradeoff between the popularity prediction accuracy and the timeliness with which a prediction is issued. The forecasting is performed online and requires no training phase or a priori knowledge. We analytically bound the prediction performance loss of Social-Forecast as compared to that obtained by an omniscient oracle and prove that the bound is sublinear in the number of video arrivals, thereby guaranteeing its short-term performance as well as its asymptotic convergence to the optimal performance. In addition, we conduct extensive experiments using real-world data traces collected from the videos shared in RenRen, one of the largest online social networks in China. These experiments show that our proposed method outperforms existing view-based approaches for popularity prediction (which are not context-aware) by more than 30% in terms of prediction rewards.
△ Less
Submitted 21 March, 2014;
originally announced March 2014.
-
Foresighted Demand Side Management
Authors:
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
We consider a smart grid with an independent system operator (ISO), and distributed aggregators who have energy storage and purchase energy from the ISO to serve its customers. All the entities in the system are foresighted: each aggregator seeks to minimize its own long-term payments for energy purchase and operational costs of energy storage by deciding how much energy to buy from the ISO, and t…
▽ More
We consider a smart grid with an independent system operator (ISO), and distributed aggregators who have energy storage and purchase energy from the ISO to serve its customers. All the entities in the system are foresighted: each aggregator seeks to minimize its own long-term payments for energy purchase and operational costs of energy storage by deciding how much energy to buy from the ISO, and the ISO seeks to minimize the long-term total cost of the system (e.g. energy generation costs and the aggregators' costs) by dispatching the energy production among the generators. The decision making of the entities is complicated for two reasons. First, the information is decentralized: the ISO does not know the aggregators' states (i.e. their energy consumption requests from customers and the amount of energy in their storage), and each aggregator does not know the other aggregators' states or the ISO's state (i.e. the energy generation costs and the status of the transmission lines). Second, the coupling among the aggregators is unknown to them. Specifically, each aggregator's energy purchase affects the price, and hence the payments of the other aggregators. However, none of them knows how its decision influences the price because the price is determined by the ISO based on its state. We propose a design framework in which the ISO provides each aggregator with a conjectured future price, and each aggregator distributively minimizes its own long-term cost based on its conjectured price as well as its local information. The proposed framework can achieve the social optimum despite being decentralized and involving complex coupling among the various entities.
△ Less
Submitted 9 January, 2014;
originally announced January 2014.
-
Non-stationary Resource Allocation Policies for Delay-constrained Video Streaming: Application to Video over Internet-of-Things-enabled Networks
Authors:
Jie Xu,
Yiannis Andrepoulos,
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
Due to the high bandwidth requirements and stringent delay constraints of multi-user wireless video transmission applications, ensuring that all video senders have sufficient transmission opportunities to use before their delay deadlines expire is a longstanding research problem. We propose a novel solution that addresses this problem without assuming detailed packet-level knowledge, which is unav…
▽ More
Due to the high bandwidth requirements and stringent delay constraints of multi-user wireless video transmission applications, ensuring that all video senders have sufficient transmission opportunities to use before their delay deadlines expire is a longstanding research problem. We propose a novel solution that addresses this problem without assuming detailed packet-level knowledge, which is unavailable at resource allocation time. Instead, we translate the transmission delay deadlines of each sender's video packets into a monotonically-decreasing weight distribution within the considered time horizon. Higher weights are assigned to the slots that have higher probability for deadline-abiding delivery. Given the sets of weights of the senders' video streams, we propose the low-complexity Delay-Aware Resource Allocation (DARA) approach to compute the optimal slot allocation policy that maximizes the deadline-abiding delivery of all senders. A unique characteristic of the DARA approach is that it yields a non-stationary slot allocation policy that depends on the allocation of previous slots. We prove that the DARA approach is optimal for weight distributions that are exponentially decreasing in time. We further implement our framework for real-time video streaming in wireless personal area networks that are gaining significant traction within the new Internet-of-Things (IoT) paradigm. For multiple surveillance videos encoded with H.264/AVC and streamed via the 6tisch framework that simulates the IoT-oriented IEEE 802.15.4e TSCH medium access control, our solution is shown to be the only one that ensures all video bitstreams are delivered with acceptable quality in a deadline-abiding manner.
△ Less
Submitted 4 January, 2014;
originally announced January 2014.
-
Optimal Foresighted Multi-User Wireless Video
Authors:
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
Recent years have seen an explosion in wireless video communication systems. Optimization in such systems is crucial - but most existing methods intended to optimize the performance of multi-user wireless video transmission are inefficient. Some works (e.g. Network Utility Maximization (NUM)) are myopic: they choose actions to maximize instantaneous video quality while ignoring the future impact o…
▽ More
Recent years have seen an explosion in wireless video communication systems. Optimization in such systems is crucial - but most existing methods intended to optimize the performance of multi-user wireless video transmission are inefficient. Some works (e.g. Network Utility Maximization (NUM)) are myopic: they choose actions to maximize instantaneous video quality while ignoring the future impact of these actions. Such myopic solutions are known to be inferior to foresighted solutions that optimize the long-term video quality. Alternatively, foresighted solutions such as rate-distortion optimized packet scheduling focus on single-user wireless video transmission, while ignoring the resource allocation among the users.
In this paper, we propose an optimal solution for performing joint foresighted resource allocation and packet scheduling among multiple users transmitting video over a shared wireless network. A key challenge in develo** foresighted solutions for multiple video users is that the users' decisions are coupled. To decouple the users' decisions, we adopt a novel dual decomposition approach, which differs from the conventional optimization solutions such as NUM, and determines foresighted policies. Specifically, we propose an informationally-decentralized algorithm in which the network manager updates resource "prices" (i.e. the dual variables associated with the resource constraints), and the users make individual video packet scheduling decisions based on these prices. Because a priori knowledge of the system dynamics is almost never available at run-time, the proposed solution can learn online, concurrently with performing the foresighted optimization. Simulation results show 7 dB and 3 dB improvements in Peak Signal-to-Noise Ratio (PSNR) over myopic solutions and existing foresighted solutions, respectively.
△ Less
Submitted 17 November, 2013;
originally announced November 2013.
-
Demand Side Management in Smart Grids using a Repeated Game Framework
Authors:
Linqi Song,
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
Demand side management (DSM) is a key solution for reducing the peak-time power consumption in smart grids. To provide incentives for consumers to shift their consumption to off-peak times, the utility company charges consumers differential pricing for using power at different times of the day. Consumers take into account these differential prices when deciding when and how much power to consume d…
▽ More
Demand side management (DSM) is a key solution for reducing the peak-time power consumption in smart grids. To provide incentives for consumers to shift their consumption to off-peak times, the utility company charges consumers differential pricing for using power at different times of the day. Consumers take into account these differential prices when deciding when and how much power to consume daily. Importantly, while consumers enjoy lower billing costs when shifting their power usage to off-peak times, they also incur discomfort costs due to the altering of their power consumption patterns. Existing works propose stationary strategies for the myopic consumers to minimize their short-term billing and discomfort costs. In contrast, we model the interaction emerging among self-interested, foresighted consumers as a repeated energy scheduling game and prove that the stationary strategies are suboptimal in terms of long-term total billing and discomfort costs. Subsequently, we propose a novel framework for determining optimal nonstationary DSM strategies, in which consumers can choose different daily power consumption patterns depending on their preferences, routines, and needs. As a direct consequence of the nonstationary DSM policy, different subsets of consumers are allowed to use power in peak times at a low price. The subset of consumers that are selected daily to have their joint discomfort and billing costs minimized is determined based on the consumers' power consumption preferences as well as on the past history of which consumers have shifted their usage previously. Importantly, we show that the proposed strategies are incentive-compatible. Simulations confirm that, given the same peak-to-average ratio, the proposed strategy can reduce the total cost (billing and discomfort costs) by up to 50% compared to existing DSM strategies.
△ Less
Submitted 8 November, 2013;
originally announced November 2013.
-
Dynamic Network Formation with Incomplete Information
Authors:
Yangbo Song,
Mihaela van der Schaar
Abstract:
How do networks form and what is their ultimate topology? Most of the literature that addresses these questions assumes complete information: agents know in advance the value of linking to other agents, even with agents they have never met and with whom they have had no previous interaction (direct or indirect). This paper addresses the same questions under what seems to us to be the much more nat…
▽ More
How do networks form and what is their ultimate topology? Most of the literature that addresses these questions assumes complete information: agents know in advance the value of linking to other agents, even with agents they have never met and with whom they have had no previous interaction (direct or indirect). This paper addresses the same questions under what seems to us to be the much more natural assumption of incomplete information: agents do not know in advance -- but must learn -- the value of linking to agents they have never met. We show that the assumption of incomplete information has profound implications for the process of network formation and the topology of networks that ultimately form. Under complete information, the networks that form and are stable typically have a star, wheel or core-periphery form, with high-value agents in the core. Under incomplete information, the presence of positive externalities (the value of indirect links) implies that a much wider collection of network topologies can emerge and be stable. Moreover, even when the topologies that emerge are the same, the locations of agents can be very different. For instance, when information is incomplete, it is possible for a hub-and-spokes network with a low-value agent in the center to form and endure permanently: an agent can achieve a central position purely as the result of chance rather than as the result of merit. Perhaps even more strikingly: when information is incomplete, a connected network could form and persist even if, when information were complete, no links would ever form, so that the final form would be a totally disconnected network. All of this can occur even in settings where agents eventually learn everything so that information, although initially incomplete, eventually becomes complete.
△ Less
Submitted 19 March, 2014; v1 submitted 5 November, 2013;
originally announced November 2013.
-
Socially-Optimal Design of Service Exchange Platforms with Imperfect Monitoring
Authors:
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
In service exchange platforms, anonymous users exchange services with each other: clients request services and are matched to servers who provide services. Because providing good-quality services requires effort, in any single interaction a server will have no incentive to exert effort and will shirk. We show that if current servers will later become clients and want good-quality services, shirkin…
▽ More
In service exchange platforms, anonymous users exchange services with each other: clients request services and are matched to servers who provide services. Because providing good-quality services requires effort, in any single interaction a server will have no incentive to exert effort and will shirk. We show that if current servers will later become clients and want good-quality services, shirking can be eliminated by rating protocols, which maintain ratings for each user, prescribe behavior in each client-server interaction, and update ratings based on whether observed/reported behavior conforms with prescribed behavior. The rating protocols proposed are the first to achieve social optimum even when observation/reporting is imperfect (quality is incorrectly assessed/reported or reports are lost). The proposed protocols are remarkably simple, requiring only binary ratings and three possible prescribed behaviors. Key to the efficacy of the proposed protocols is that they are nonstationary, and tailor prescriptions to both current and past rating distributions.
△ Less
Submitted 8 October, 2013;
originally announced October 2013.
-
Incentive Design for Direct Load Control Programs
Authors:
Mahnoosh Alizadeh,
Yuanzhang Xiao,
Anna Scaglione,
Mihaela van der Schaar
Abstract:
We study the problem of optimal incentive design for voluntary participation of electricity customers in a Direct Load Scheduling (DLS) program, a new form of Direct Load Control (DLC) based on a three way communication protocol between customers, embedded controls in flexible appliances, and the central entity in charge of the program. Participation decisions are made in real-time on an event-bas…
▽ More
We study the problem of optimal incentive design for voluntary participation of electricity customers in a Direct Load Scheduling (DLS) program, a new form of Direct Load Control (DLC) based on a three way communication protocol between customers, embedded controls in flexible appliances, and the central entity in charge of the program. Participation decisions are made in real-time on an event-based basis, with every customer that needs to use a flexible appliance considering whether to join the program given current incentives. Customers have different interpretations of the level of risk associated with committing to pass over the control over the consumption schedule of their devices to an operator, and these risk levels are only privately known. The operator maximizes his expected profit of operating the DLS program by posting the right participation incentives for different appliance types, in a publicly available and dynamically updated table. Customers are then faced with the dynamic decision making problem of whether to take the incentives and participate or not. We define an optimization framework to determine the profit-maximizing incentives for the operator. In doing so, we also investigate the utility that the operator expects to gain from recruiting different types of devices. These utilities also provide an upper-bound on the benefits that can be attained from any type of demand response program.
△ Less
Submitted 1 October, 2013;
originally announced October 2013.
-
Distributed Online Learning in Social Recommender Systems
Authors:
Cem Tekin,
Simpson Zhang,
Mihaela van der Schaar
Abstract:
In this paper, we consider decentralized sequential decision making in distributed online recommender systems, where items are recommended to users based on their search query as well as their specific background including history of bought items, gender and age, all of which comprise the context information of the user. In contrast to centralized recommender systems, in which there is a single ce…
▽ More
In this paper, we consider decentralized sequential decision making in distributed online recommender systems, where items are recommended to users based on their search query as well as their specific background including history of bought items, gender and age, all of which comprise the context information of the user. In contrast to centralized recommender systems, in which there is a single centralized seller who has access to the complete inventory of items as well as the complete record of sales and user information, in decentralized recommender systems each seller/learner only has access to the inventory of items and user information for its own products and not the products and user information of other sellers, but can get commission if it sells an item of another seller. Therefore the sellers must distributedly find out for an incoming user which items to recommend (from the set of own items or items of another seller), in order to maximize the revenue from own sales and commissions. We formulate this problem as a cooperative contextual bandit problem, analytically bound the performance of the sellers compared to the best recommendation strategy given the complete realization of user arrivals and the inventory of items, as well as the context-dependent purchase probabilities of each item, and verify our results via numerical examples on a distributed data set adapted based on Amazon data. We evaluate the dependence of the performance of a seller on the inventory of items the seller has, the number of connections it has with the other sellers, and the commissions which the seller gets by selling items of other sellers to its users.
△ Less
Submitted 21 January, 2014; v1 submitted 25 September, 2013;
originally announced September 2013.
-
Information Sharing in Networks of Strategic Agents
Authors:
Jie Xu,
Yangbo Song,
Mihaela van der Schaar
Abstract:
To ensure that social networks (e.g. opinion consensus, cooperative estimation, distributed learning and adaptation etc.) proliferate and efficiently operate, the participating agents need to collaborate with each other by repeatedly sharing information. However, sharing information is often costly for the agents while resulting in no direct immediate benefit for them. Hence, lacking incentives to…
▽ More
To ensure that social networks (e.g. opinion consensus, cooperative estimation, distributed learning and adaptation etc.) proliferate and efficiently operate, the participating agents need to collaborate with each other by repeatedly sharing information. However, sharing information is often costly for the agents while resulting in no direct immediate benefit for them. Hence, lacking incentives to collaborate, strategic agents who aim to maximize their own individual utilities will withhold rather than share information, leading to inefficient operation or even collapse of networks. In this paper, we develop a systematic framework for designing distributed rating protocols aimed at incentivizing the strategic agents to collaborate with each other by sharing information. The proposed incentive protocols exploit the ongoing nature of the agents' interactions to assign ratings and through them, determine future rewards and punishments: agents that have behaved as directed enjoy high ratings -- and hence greater future access to the information of others; agents that have not behaved as directed enjoy low ratings -- and hence less future access to the information of others. Unlike existing rating protocols, the proposed protocol operates in a distributed manner, online, and takes into consideration the underlying interconnectivity of agents as well as their heterogeneity. We prove that in many deployment scenarios the price of anarchy (PoA) obtained by adopting the proposed rating protocols is one. In settings in which the PoA is larger than one, we show that the proposed rating protocol still significantly outperforms existing incentive mechanisms such as Tit-for-Tat. Importantly, the proposed rating protocols can also operate efficiently in deployment scenarios where the strategic agents interact over time-varying network topologies where new agents join the network over time.
△ Less
Submitted 16 September, 2013; v1 submitted 6 September, 2013;
originally announced September 2013.
-
Designing Efficient Resource Sharing For Impatient Players Using Limited Monitoring
Authors:
Mihaela van der Schaar,
Yuanzhang Xiao,
William Zame
Abstract:
The problem of efficient sharing of a resource is nearly ubiquitous. Except for pure public goods, each agent's use creates a negative externality; often the negative externality is so strong that efficient sharing is impossible in the short run. We show that, paradoxically, the impossibility of efficient sharing in the short run enhances the possibility of efficient sharing in the long run, even…
▽ More
The problem of efficient sharing of a resource is nearly ubiquitous. Except for pure public goods, each agent's use creates a negative externality; often the negative externality is so strong that efficient sharing is impossible in the short run. We show that, paradoxically, the impossibility of efficient sharing in the short run enhances the possibility of efficient sharing in the long run, even if outcomes depend stochastically on actions, monitoring is limited and users are not patient. We base our analysis on the familiar framework of repeated games with imperfect public monitoring, but we extend the framework to view the monitoring structure as chosen by a designer who balances the benefits and costs of more accurate observations and reports. Our conclusions are much stronger than in the usual folk theorems: we do not require a rich signal structure or patient users and provide an explicit online construction of equilibrium strategies.
△ Less
Submitted 1 September, 2013;
originally announced September 2013.
-
Ensemble of Distributed Learners for Online Classification of Dynamic Data Streams
Authors:
Luca Canzian,
Yu Zhang,
Mihaela van der Schaar
Abstract:
We present an efficient distributed online learning scheme to classify data captured from distributed, heterogeneous, and dynamic data sources. Our scheme consists of multiple distributed local learners, that analyze different streams of data that are correlated to a common event that needs to be classified. Each learner uses a local classifier to make a local prediction. The local predictions are…
▽ More
We present an efficient distributed online learning scheme to classify data captured from distributed, heterogeneous, and dynamic data sources. Our scheme consists of multiple distributed local learners, that analyze different streams of data that are correlated to a common event that needs to be classified. Each learner uses a local classifier to make a local prediction. The local predictions are then collected by each learner and combined using a weighted majority rule to output the final prediction. We propose a novel online ensemble learning algorithm to update the aggregation rule in order to adapt to the underlying data dynamics. We rigorously determine a bound for the worst case misclassification probability of our algorithm which depends on the misclassification probabilities of the best static aggregation rule, and of the best local classifier. Importantly, the worst case misclassification probability of our algorithm tends asymptotically to 0 if the misclassification probability of the best static aggregation rule or the misclassification probability of the best local classifier tend to 0. Then we extend our algorithm to address challenges specific to the distributed implementation and we prove new bounds that apply to these settings. Finally, we test our scheme by performing an evaluation study on several data sets. When applied to data sets widely used by the literature dealing with dynamic data streams and concept drift, our scheme exhibits performance gains ranging from 34% to 71% with respect to state of the art solutions.
△ Less
Submitted 23 August, 2013;
originally announced August 2013.
-
Distributed Online Learning via Cooperative Contextual Bandits
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
In this paper we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost…
▽ More
In this paper we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions - taking into account what it has learned about the value of assistance from each other learner. We develop distributed online learning algorithms and provide analytic bounds to compare the efficiency of these with algorithms with the complete knowledge (oracle) benchmark (in which the expected reward of every action in every context is known by every learner). Our estimates show that regret - the loss incurred by the algorithm - is sublinear in time. Our theoretical framework can be used in many practical applications including Big Data mining, event detection in surveillance sensor networks and distributed online recommendation systems.
△ Less
Submitted 23 March, 2015; v1 submitted 21 August, 2013;
originally announced August 2013.
-
Decentralized Online Big Data Classification - a Bandit Framework
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at…
▽ More
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by hel** each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We assume that the classification functions available to each processing element are fixed, but their prediction accuracy for various types of incoming data are unknown and can change dynamically over time, and thus they need to be learned online. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop distributed online learning algorithms for which we can prove that they have sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithms.
△ Less
Submitted 25 August, 2013; v1 submitted 21 August, 2013;
originally announced August 2013.
-
To Relay or Not to Relay: Learning Device-to-Device Relaying Strategies in Cellular Networks
Authors:
Nicholas Mastronarde,
Viral Patel,
Jie Xu,
Lingjia Liu,
Mihaela van der Schaar
Abstract:
We consider a cellular network where mobile transceiver devices that are owned by self-interested users are incentivized to cooperate with each other using tokens, which they exchange electronically to "buy" and "sell" downlink relay services, thereby increasing the network's capacity compared to a network that only supports base station-to-device (B2D) communications. We investigate how an indivi…
▽ More
We consider a cellular network where mobile transceiver devices that are owned by self-interested users are incentivized to cooperate with each other using tokens, which they exchange electronically to "buy" and "sell" downlink relay services, thereby increasing the network's capacity compared to a network that only supports base station-to-device (B2D) communications. We investigate how an individual device in the network can learn its optimal cooperation policy online, which it uses to decide whether or not to provide downlink relay services for other devices in exchange for tokens. We propose a supervised learning algorithm that devices can deploy to learn their optimal cooperation strategies online given their experienced network environment. We then systematically evaluate the learning algorithm in various deployment scenarios. Our simulation results suggest that devices have the greatest incentive to cooperate when the network contains (i) many devices with high energy budgets for relaying, (ii) many highly mobile users (e.g., users in motor vehicles), and (iii) neither too few nor too many tokens. Additionally, within the token system, self-interested devices can effectively learn to cooperate online, and achieve over 20% higher throughput on average than with B2D communications alone, all while selfishly maximizing their own utilities.
△ Less
Submitted 28 December, 2014; v1 submitted 4 June, 2013;
originally announced August 2013.
-
Distributed Online Big Data Classification Using Context Information
Authors:
Cem Tekin,
Mihaela van der Schaar
Abstract:
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at…
▽ More
Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by hel** each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm.
△ Less
Submitted 2 July, 2013;
originally announced July 2013.
-
Energy-Efficient Nonstationary Spectrum Sharing
Authors:
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
We develop a novel design framework for energy-efficient spectrum sharing among autonomous users who aim to minimize their energy consumptions subject to minimum throughput requirements. Most existing works proposed stationary spectrum sharing policies, in which users transmit at fixed power levels. Since users transmit simultaneously under stationary policies, to fulfill minimum throughput requir…
▽ More
We develop a novel design framework for energy-efficient spectrum sharing among autonomous users who aim to minimize their energy consumptions subject to minimum throughput requirements. Most existing works proposed stationary spectrum sharing policies, in which users transmit at fixed power levels. Since users transmit simultaneously under stationary policies, to fulfill minimum throughput requirements, they need to transmit at high power levels to overcome interference. To improve energy efficiency, we construct nonstationary spectrum sharing policies, in which the users transmit at time-varying power levels. Specifically, we focus on TDMA (time-division multiple access) policies in which one user transmits at each time (but not in a round-robin fashion). The proposed policy can be implemented by each user running a low-complexity algorithm in a decentralized manner. It achieves high energy efficiency even when the users have erroneous and binary feedback about their interference levels. Moreover, it can adapt to the dynamic entry and exit of users. The proposed policy is also deviation-proof, namely autonomous users will find it in their self-interests to follow it. Compared to existing policies, the proposed policy can achieve an energy saving of up to 90% when the number of users is high.
△ Less
Submitted 9 January, 2014; v1 submitted 17 November, 2012;
originally announced November 2012.
-
Pricing and Intervention in Slotted-Aloha: Technical Report
Authors:
Luca Canzian,
Yuanzhang Xiao,
Michele Zorzi,
Mihaela van der Schaar
Abstract:
In many wireless communication networks a common channel is shared by multiple users who must compete to gain access to it. The operation of the network by self-interested and strategic users usually leads to the overuse of the channel resources and to substantial inefficiencies. Hence, incentive schemes are needed to overcome the inefficiencies of non-cooperative equilibrium. In this work we cons…
▽ More
In many wireless communication networks a common channel is shared by multiple users who must compete to gain access to it. The operation of the network by self-interested and strategic users usually leads to the overuse of the channel resources and to substantial inefficiencies. Hence, incentive schemes are needed to overcome the inefficiencies of non-cooperative equilibrium. In this work we consider a slotted-Aloha like random access protocol and two incentive schemes: pricing and intervention. We provide some criteria for the designer of the protocol to choose one scheme between them and to design the best policy for the selected scheme, depending on the system parameters. Our results show that intervention can achieve the maximum efficiency in the perfect monitoring scenario. In the imperfect monitoring scenario, instead, the performance of the system depends on the information held by the different entities and, in some cases, there exists a threshold for the number of users such that, for a number of users lower than the threshold, intervention outperforms pricing, whereas, for a number of users higher than the threshold pricing outperforms intervention.
△ Less
Submitted 15 November, 2012;
originally announced November 2012.
-
Designing Rating Systems to Promote Mutual Security for Interconnected Networks
Authors:
Jie Xu,
Yu Zhang,
Mihaela van der Schaar
Abstract:
Interconnected autonomous systems often share security risks. However, an autonomous system lacks the incentive to make (sufficient) security investments if the cost exceeds its own benefit even though doing that would be socially beneficial. In this paper, we develop a systematic and rigorous framework for analyzing and significantly improving the mutual security of a collection of ASs that inter…
▽ More
Interconnected autonomous systems often share security risks. However, an autonomous system lacks the incentive to make (sufficient) security investments if the cost exceeds its own benefit even though doing that would be socially beneficial. In this paper, we develop a systematic and rigorous framework for analyzing and significantly improving the mutual security of a collection of ASs that interact frequently over a long period of time. Using this framework, we show that simple incentive schemes based on rating systems can be designed to encourage the autonomous systems' security investments, thereby significantly improving their mutual security.
△ Less
Submitted 9 November, 2012;
originally announced November 2012.
-
Designing Information Revelation and Intervention with an Application to Flow Control
Authors:
Luca Canzian,
Yuanzhang Xiao,
William Zame,
Michele Zorzi,
Mihaela van der Schaar
Abstract:
There are many familiar situations in which a manager seeks to design a system in which users share a resource, but outcomes depend on the information held and actions taken by users. If communication is possible, the manager can ask users to report their private information and then, using this information, instruct them on what actions they should take. If the users are compliant, this reduces t…
▽ More
There are many familiar situations in which a manager seeks to design a system in which users share a resource, but outcomes depend on the information held and actions taken by users. If communication is possible, the manager can ask users to report their private information and then, using this information, instruct them on what actions they should take. If the users are compliant, this reduces the manager's optimization problem to a well-studied problem of optimal control. However, if the users are self-interested and not compliant, the problem is much more complicated: when asked to report their private information, the users might lie; upon receiving instructions, the users might disobey. Here we ask whether the manager can design the system to get around both of these difficulties. To do so, the manager must provide for the users the incentives to report truthfully and to follow the instructions, despite the fact that the users are self-interested. For a class of environments that includes many resource allocation games in communication networks, we provide tools for the manager to design an efficient system. In addition to reports and recommendations, the design we employ allows the manager to intervene in the system after the users take actions. In an abstracted environment, we find conditions under which the manager can achieve the same outcome it could if users were compliant, and conditions under which it does not. We then apply our framework and results to design a flow control management system.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
Entry and Spectrum Sharing Scheme Selection in Femtocell Markets
Authors:
Shaolei Ren,
Jaeok Park,
Mihaela van der Schaar
Abstract:
Focusing on a femtocell communications market, we study the entrant network service provider's (NSP's) long-term decision: whether to enter the market and which spectrum sharing technology to select to maximize its profit. This long-term decision is closely related to the entrant's pricing strategy and the users' aggregate demand, which we model as medium-term and short-term decisions, respectivel…
▽ More
Focusing on a femtocell communications market, we study the entrant network service provider's (NSP's) long-term decision: whether to enter the market and which spectrum sharing technology to select to maximize its profit. This long-term decision is closely related to the entrant's pricing strategy and the users' aggregate demand, which we model as medium-term and short-term decisions, respectively. We consider two markets, one with no incumbent and the other with one incumbent. For both markets, we show the existence and uniqueness of an equilibrium point in the user subscription dynamics, and provide a sufficient condition for the convergence of the dynamics. For the market with no incumbent, we derive upper and lower bounds on the optimal price and market share that maximize the entrant's revenue, based on which the entrant selects an available technology to maximize its long-term profit. For the market with one incumbent, we model competition between the two NSPs as a non-cooperative game, in which the incumbent and the entrant choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium. Finally, we formalize the problem of entry and spectrum sharing scheme selection for the entrant and provide numerical results to complement our analysis.
△ Less
Submitted 19 April, 2012;
originally announced April 2012.
-
Dynamic Spectrum Sharing Among Repeatedly Interacting Selfish Users With Imperfect Monitoring
Authors:
Yuanzhang Xiao,
Mihaela van der Schaar
Abstract:
We develop a novel design framework for dynamic distributed spectrum sharing among secondary users (SUs) who adjust their power levels to compete for spectrum opportunities while satisfying the interference temperature (IT) constraints imposed by primary users. The considered interaction among the SUs is characterized by the following three features. First, since the SUs are decentralized, they ar…
▽ More
We develop a novel design framework for dynamic distributed spectrum sharing among secondary users (SUs) who adjust their power levels to compete for spectrum opportunities while satisfying the interference temperature (IT) constraints imposed by primary users. The considered interaction among the SUs is characterized by the following three features. First, since the SUs are decentralized, they are selfish and aim to maximize their own long-term payoffs from utilizing the network rather than obeying the prescribed allocation of a centralized controller. Second, the SUs interact with each other repeatedly and they can coexist in the system for a long time. Third, the SUs have limited and imperfect monitoring ability: they only observe whether the IT constraints are violated, and their observation is imperfect due to the erroneous measurements. To capture these features, we model the interaction of the SUs as a repeated game with imperfect monitoring. We first characterize the set of Pareto optimal payoffs that can be achieved by deviation-proof spectrum sharing policies, which are policies that the selfish users find it in their interest to comply with. Next, for any given payoff in this set, we show how to construct a deviation-proof policy to achieve it. The constructed deviation-proof policy is amenable to distributed implementation, and allows users to transmit in a time-division multiple-access (TDMA) fashion. In the presence of strong multi-user interference, our policy outperforms existing spectrum sharing policies that dictate users to transmit at constant power levels simultaneously. Moreover, our policy can achieve Pareto optimality even when the SUs have limited and imperfect monitoring ability, as opposed to existing solutions based on repeated games, which require perfect monitoring abilities.
△ Less
Submitted 8 August, 2012; v1 submitted 16 January, 2012;
originally announced January 2012.
-
Social Norm Design for Information Exchange Systems with Limited Observations
Authors:
Jie Xu,
Mihaela van der Schaar
Abstract:
Information exchange systems differ in many ways, but all share a common vulnerability to selfish behavior and free-riding. In this paper, we build incentives schemes based on social norms. Social norms prescribe a social strategy for the users in the system to follow and deploy reputation schemes to reward or penalize users depending on their behaviors. Because users in these systems often have o…
▽ More
Information exchange systems differ in many ways, but all share a common vulnerability to selfish behavior and free-riding. In this paper, we build incentives schemes based on social norms. Social norms prescribe a social strategy for the users in the system to follow and deploy reputation schemes to reward or penalize users depending on their behaviors. Because users in these systems often have only limited capability to observe the global system information, e.g. the reputation distribution of the users participating in the system, their beliefs about the reputation distribution are heterogeneous and biased. Such belief heterogeneity causes a positive fraction of users to not follow the social strategy. In such practical scenarios, the standard equilibrium analysis deployed in the economics literature is no longer directly applicable and hence, the system design needs to consider these differences. To investigate how the system designs need to change when the participating users have only limited observations, we focus on a simple social norm with binary reputation labels but allow adjusting the punishment severity through randomization. First, we model the belief heterogeneity using a suitable Bayesian belief function. Next, we formalize the users' optimal decision problems and derive in which scenarios they follow the prescribed social strategy. With this result, we then study the system dynamics and formally define equilibrium in the sense that the system is stable when users strategically optimize their decisions. By rigorously studying two specific cases where users' belief distribution is constant or is linearly influenced by the true reputation distribution, we prove that the optimal reputation update rule is to choose the mildest possible punishment. This result is further confirmed for higher order beliefs in simulations.
△ Less
Submitted 9 November, 2012; v1 submitted 13 January, 2012;
originally announced January 2012.
-
Markov Decision Process Based Energy-Efficient On-Line Scheduling for Slice-Parallel Video Decoders on Multicore Systems
Authors:
Nicholas Mastronarde,
Karim Kanoun,
David Atienza,
Pascal Frossard,
Mihaela van der Schaar
Abstract:
We consider the problem of energy-efficient on-line scheduling for slice-parallel video decoders on multicore systems. We assume that each of the processors are Dynamic Voltage Frequency Scaling (DVFS) enabled such that they can independently trade off performance for power, while taking the video decoding workload into account. In the past, scheduling and DVFS policies in multi-core systems have…
▽ More
We consider the problem of energy-efficient on-line scheduling for slice-parallel video decoders on multicore systems. We assume that each of the processors are Dynamic Voltage Frequency Scaling (DVFS) enabled such that they can independently trade off performance for power, while taking the video decoding workload into account. In the past, scheduling and DVFS policies in multi-core systems have been formulated heuristically due to the inherent complexity of the on-line multicore scheduling problem. The key contribution of this report is that we rigorously formulate the problem as a Markov decision process (MDP), which simultaneously takes into account the on-line scheduling and per-core DVFS capabilities; the power consumption of the processor cores and caches; and the loss tolerant and dynamic nature of the video decoder's traffic. In particular, we model the video traffic using a Direct Acyclic Graph (DAG) to capture the precedence constraints among frames in a Group of Pictures (GOP) structure, while also accounting for the fact that frames have different display/decoding deadlines and non-deterministic decoding complexities. The objective of the MDP is to minimize long-term power consumption subject to a minimum Quality of Service (QoS) constraint related to the decoder's throughput. Although MDPs notoriously suffer from the curse of dimensionality, we show that, with appropriate simplifications and approximations, the complexity of the MDP can be mitigated. We implement a slice-parallel version of H.264 on a multiprocessor ARM (MPARM) virtual platform simulator, which provides cycle-accurate and bus signal-accurate simulation for different processors. We use this platform to generate realistic video decoding traces with which we evaluate the proposed on-line scheduling algorithm in Matlab.
△ Less
Submitted 24 May, 2012; v1 submitted 17 December, 2011;
originally announced December 2011.
-
Repeated Games With Intervention: Theory and Applications in Communications
Authors:
Yuanzhang Xiao,
Jaeok Park,
Mihaela van der Schaar
Abstract:
In communication systems where users share common resources, users' selfish behavior usually results in suboptimal resource utilization. There have been extensive works that model communication systems with selfish users as one-shot games and propose incentive schemes to achieve Pareto optimal action profiles as non-cooperative equilibria. However, in many communication systems, due to strong nega…
▽ More
In communication systems where users share common resources, users' selfish behavior usually results in suboptimal resource utilization. There have been extensive works that model communication systems with selfish users as one-shot games and propose incentive schemes to achieve Pareto optimal action profiles as non-cooperative equilibria. However, in many communication systems, due to strong negative externalities among users, the sets of feasible payoffs in one-shot games are nonconvex. Thus, it is possible to expand the set of feasible payoffs by having users choose convex combinations of different payoffs. In this paper, we propose a repeated game model generalized by intervention. First, we use repeated games to convexify the set of feasible payoffs in one-shot games. Second, we combine conventional repeated games with intervention, originally proposed for one-shot games, to achieve a larger set of equilibrium payoffs and loosen requirements for users' patience to achieve it. We study the problem of maximizing a welfare function defined on users' equilibrium payoffs, subject to minimum payoff guarantees. Given the optimal equilibrium payoff, we derive the minimum intervention capability required and design corresponding equilibrium strategies. The proposed generalized repeated game model applies to various communication systems, such as power control and flow control.
△ Less
Submitted 10 November, 2011;
originally announced November 2011.
-
Production and Network Formation Games with Content Heterogeneity
Authors:
Yu Zhang,
Jaeok Park,
Mihaela van der Schaar
Abstract:
Online social networks (e.g. Facebook, Twitter, Youtube) provide a popular, cost-effective and scalable framework for sharing user-generated contents. This paper addresses the intrinsic incentive problems residing in social networks using a game-theoretic model where individual users selfishly trade off the costs of forming links (i.e. whom they interact with) and producing contents personally aga…
▽ More
Online social networks (e.g. Facebook, Twitter, Youtube) provide a popular, cost-effective and scalable framework for sharing user-generated contents. This paper addresses the intrinsic incentive problems residing in social networks using a game-theoretic model where individual users selfishly trade off the costs of forming links (i.e. whom they interact with) and producing contents personally against the potential rewards from doing so. Departing from the assumption that contents produced by difference users is perfectly substitutable, we explicitly consider heterogeneity in user-generated contents and study how it influences users' behavior and the structure of social networks. Given content heterogeneity, we rigorously prove that when the population of a social network is sufficiently large, every (strict) non-cooperative equilibrium should consist of either a symmetric network topology where each user produces the same amount of content and has the same degree, or a two-level hierarchical topology with all users belonging to either of the two types: influencers who produce large amounts of contents and subscribers who produce small amounts of contents and get most of their contents from influencers. Meanwhile, the law of the few disappears in such networks. Moreover, we prove that the social optimum is always achieved by networks with symmetric topologies, where the sum of users' utilities is maximized. To provide users with incentives for producing and mutually sharing the socially optimal amount of contents, a pricing scheme is proposed, with which we show that the social optimum can be achieved as a non-cooperative equilibrium with the pricing of content acquisition and link formation.
△ Less
Submitted 19 September, 2011;
originally announced September 2011.
-
Designing Practical Distributed Exchange for Online Communities
Authors:
Jie Xu,
Mihaela van der Schaar,
William Zame
Abstract:
In many online systems, individuals provide services for each other; the recipient of the service obtains a benefit but the provider of the service incurs a cost. If benefit exceeds cost, provision of the service increases social welfare and should therefore be encouraged -- but the individuals providing the service gain no (immediate) benefit from providing the service and hence have an incentive…
▽ More
In many online systems, individuals provide services for each other; the recipient of the service obtains a benefit but the provider of the service incurs a cost. If benefit exceeds cost, provision of the service increases social welfare and should therefore be encouraged -- but the individuals providing the service gain no (immediate) benefit from providing the service and hence have an incentive to withhold service. Hence there is scope for designing a system that improves welfare by encouraging exchange. To operate successfully within the confines of the online environment, such a system should be distributed, practicable, and consistent with individual incentives. This paper proposes and analyzes a simple such system that relies on the exchange of {\em tokens}; the emphasis is on the design of a protocol (number of tokens and suggested strategies). We provide estimates for the efficiency of such protocols and show that choosing the right protocol will lead to almost full efficiency if agents are sufficiently patient. However, choosing the wrong protocols may lead to an enormous loss of efficiency.
△ Less
Submitted 21 July, 2012; v1 submitted 30 August, 2011;
originally announced August 2011.
-
Strategic Learning and Robust Protocol Design for Online Communities with Selfish Users
Authors:
Yu Zhang,
Mihaela van der Schaar
Abstract:
This paper focuses on analyzing the free-riding behavior of self-interested users in online communities. Hence, traditional optimization methods for communities composed of compliant users such as network utility maximization cannot be applied here. In our prior work, we show how social reciprocation protocols can be designed in online communities which have populations consisting of a continuum o…
▽ More
This paper focuses on analyzing the free-riding behavior of self-interested users in online communities. Hence, traditional optimization methods for communities composed of compliant users such as network utility maximization cannot be applied here. In our prior work, we show how social reciprocation protocols can be designed in online communities which have populations consisting of a continuum of users and are stationary under stochastic permutations. Under these assumptions, we are able to prove that users voluntarily comply with the pre-determined social norms and cooperate with other users in the community by providing their services. In this paper, we generalize the study by analyzing the interactions of self-interested users in online communities with finite populations and are not stationary. To optimize their long-term performance based on their knowledge, users adapt their strategies to play their best response by solving individual stochastic control problems. The best-response dynamic introduces a stochastic dynamic process in the community, in which the strategies of users evolve over time. We then investigate the long-term evolution of a community, and prove that the community will converge to stochastically stable equilibria which are stable against stochastic permutations. Understanding the evolution of a community provides protocol designers with guidelines for designing social norms in which no user has incentives to adapt its strategy and deviate from the prescribed protocol, thereby ensuring that the adopted protocol will enable the community to achieve the optimal social welfare.
△ Less
Submitted 29 August, 2011;
originally announced August 2011.
-
Robust Stackelberg game in communication systems
Authors:
saeedeh parsaeefard,
Mihaela van der Schaar,
Ahmad R. Sharafat
Abstract:
This paper studies multi-user communication systems with two groups of users: leaders which possess system information, and followers which have no system information using the formulation of Stackelberg games. In such games, the leaders play and choose their actions based on their information about the system and the followers choose their actions myopically according to their observations of the…
▽ More
This paper studies multi-user communication systems with two groups of users: leaders which possess system information, and followers which have no system information using the formulation of Stackelberg games. In such games, the leaders play and choose their actions based on their information about the system and the followers choose their actions myopically according to their observations of the aggregate impact of other users. However, obtaining the exact value of these parameters is not practical in communication systems. To study the effect of uncertainty and preserve the players' utilities in these conditions, we introduce a robust equilibrium for Stackelberg games. In this framework, the leaders' information and the followers' observations are uncertain parameters, and the leaders and the followers choose their actions by solving the worst-case robust optimizations. We show that the followers' uncertain parameters always increase the leaders' utilities and decrease the followers' utilities. Conversely, the leaders' uncertain information reduces the leaders' utilities and increases the followers' utilities. We illustrate our theoretical results with the numerical results obtained based on the power control games in the interference channels.
△ Less
Submitted 25 August, 2011;
originally announced August 2011.
-
Reputation-based Incentive Protocols in Crowdsourcing Applications
Authors:
Yu Zhang,
Mihaela van der Schaar
Abstract:
Crowdsourcing websites (e.g. Yahoo! Answers, Amazon Mechanical Turk, and etc.) emerged in recent years that allow requesters from all around the world to post tasks and seek help from an equally global pool of workers. However, intrinsic incentive problems reside in crowdsourcing applications as workers and requester are selfish and aim to strategically maximize their own benefit. In this paper, w…
▽ More
Crowdsourcing websites (e.g. Yahoo! Answers, Amazon Mechanical Turk, and etc.) emerged in recent years that allow requesters from all around the world to post tasks and seek help from an equally global pool of workers. However, intrinsic incentive problems reside in crowdsourcing applications as workers and requester are selfish and aim to strategically maximize their own benefit. In this paper, we propose to provide incentives for workers to exert effort using a novel game-theoretic model based on repeated games. As there is always a gap in the social welfare between the non-cooperative equilibria emerging when workers pursue their self-interests and the desirable Pareto efficient outcome, we propose a novel class of incentive protocols based on social norms which integrates reputation mechanisms into the existing pricing schemes currently implemented on crowdsourcing websites, in order to improve the performance of the non-cooperative equilibria emerging in such applications. We first formulate the exchanges on a crowdsourcing website as a two-sided market where requesters and workers are matched and play gift-giving games repeatedly. Subsequently, we study the protocol designer's problem of finding an optimal and sustainable (equilibrium) protocol which achieves the highest social welfare for that website. We prove that the proposed incentives protocol can make the website operate close to Pareto efficiency. Moreover, we also examine an alternative scenario, where the protocol designer aims at maximizing the revenue of the website and evaluate the performance of the optimal protocol.
△ Less
Submitted 10 August, 2011;
originally announced August 2011.
-
Business Mode Selection in Digital Content Markets
Authors:
Shaolei Ren,
Jaeok Park,
Mihaela van der Schaar
Abstract:
In this paper, we consider a two-sided digital content market, and study which of the two business modes, i.e., Business-to-Customer (B2C) and Customer-to-Customer (C2C), should be selected and when it should be selected. The considered market is managed by an intermediary, through which content producers can sell their contents to consumers. The intermediary can select B2C or C2C as its business…
▽ More
In this paper, we consider a two-sided digital content market, and study which of the two business modes, i.e., Business-to-Customer (B2C) and Customer-to-Customer (C2C), should be selected and when it should be selected. The considered market is managed by an intermediary, through which content producers can sell their contents to consumers. The intermediary can select B2C or C2C as its business mode, while the content producers and consumers are rational agents that maximize their own utilities. The content producers are differentiated by their content qualities. First, given the intermediary's business mode, we show that there always exists a unique equilibrium at which neither the content producers nor the consumers change their decisions. Moreover, if there are a sufficiently large number of consumers, then the decision process based on the content producers' naive expectation can reach the unique equilibrium. Next, we show that in a market with only one intermediary, C2C should be selected if the intermediary aims at maximizing its profit. Then, by considering a particular scenario where the contents are not highly substitutable, we prove that when the intermediary chooses to maximize the social welfare, C2C should be selected if the content producers can receive sufficient compensation for content sales, and B2C should be selected otherwise.
△ Less
Submitted 19 July, 2011; v1 submitted 24 April, 2011;
originally announced April 2011.
-
Intervention in Power Control Games With Selfish Users
Authors:
Yuanzhang Xiao,
Jaeok Park,
Mihaela van der Schaar
Abstract:
We study the power control problem in wireless ad hoc networks with selfish users. Without incentive schemes, selfish users tend to transmit at their maximum power levels, causing significant interference to each other. In this paper, we study a class of incentive schemes based on intervention to induce selfish users to transmit at desired power levels. An intervention scheme can be implemented by…
▽ More
We study the power control problem in wireless ad hoc networks with selfish users. Without incentive schemes, selfish users tend to transmit at their maximum power levels, causing significant interference to each other. In this paper, we study a class of incentive schemes based on intervention to induce selfish users to transmit at desired power levels. An intervention scheme can be implemented by introducing an intervention device that can monitor the power levels of users and then transmit power to cause interference to users. We mainly consider first-order intervention rules based on individual transmit powers. We derive conditions on design parameters and the intervention capability to achieve a desired outcome as a (unique) Nash equilibrium and propose a dynamic adjustment process that the designer can use to guide users and the intervention device to the desired outcome. The effect of using intervention rules based on aggregate receive power is also analyzed. Our results show that with perfect monitoring intervention schemes can be designed to achieve any positive power profile while using interference from the intervention device only as a threat. We also analyze the case of imperfect monitoring and show that a performance loss can occur. Lastly, simulation results are presented to illustrate the performance improvement from using intervention rules and compare the performances of different intervention rules.
△ Less
Submitted 25 November, 2011; v1 submitted 6 April, 2011;
originally announced April 2011.