Skip to main content

Showing 1–50 of 54 results for author: Dimitrakakis, C

.
  1. arXiv:2406.00551  [pdf, other

    cs.LG cs.GT

    Strategic Linear Contextual Bandits

    Authors: Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu

    Abstract: Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can strategically misreport their privately observed contexts to the learner. We treat the algorithm design problem as one of mechanism design under uncertainty and propose the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  2. arXiv:2312.11663  [pdf, other

    cs.LG stat.ML

    Eliciting Kemeny Rankings

    Authors: Anne-Marie George, Christos Dimitrakakis

    Abstract: We formulate the problem of eliciting agents' preferences with the goal of finding a Kemeny ranking as a Dueling Bandits problem. Here the bandits' arms correspond to alternatives that need to be ranked and the feedback corresponds to a pairwise comparison between alternatives by a randomly sampled agent. We consider both sampling with and without replacement, i.e., the possibility to ask the same… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: This is a long version of the AAAI'24 publication under the same title

  3. arXiv:2311.15647  [pdf, other

    cs.LG cs.GT

    Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

    Authors: Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu

    Abstract: We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-rate of each arm… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  4. arXiv:2302.10831  [pdf, other

    cs.LG stat.ML

    Minimax-Bayes Reinforcement Learning

    Authors: Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya Grover, Emilio Jorge

    Abstract: While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) min… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  5. arXiv:2302.09273  [pdf, other

    cs.LG

    Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

    Authors: Hannes Eriksson, Debabrota Basu, Tommy Tram, Mina Alibeigi, Christos Dimitrakakis

    Abstract: In this paper, we study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We refer to it as \textit{Model Transfer Reinforcement Learning (MTRL)} problem. First, we formulate MTRL for discrete MDPs and Linear Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a generic two-stage algor… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: 27 pages, 7 figures

  6. arXiv:2210.14972  [pdf, other

    cs.LG cs.AI

    Environment Design for Inverse Reinforcement Learning

    Authors: Thomas Kleine Buening, Victor Villin, Christos Dimitrakakis

    Abstract: Learning a reward function from demonstrations suffers from low sample-efficiency. Even with abundant data, current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. We tackle these challenges through adaptive environment design. In our framework, the learner repeatedly interacts with the expert, w… ▽ More

    Submitted 14 May, 2024; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: to appear at ICML 2024

  7. arXiv:2203.10045  [pdf, other

    cs.LG cs.MA

    Risk-Sensitive Bayesian Games for Multi-Agent Reinforcement Learning under Policy Uncertainty

    Authors: Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis

    Abstract: In stochastic games with incomplete information, the uncertainty is evoked by the lack of knowledge about a player's own and the other players' types, i.e. the utility function and the policy space, and also the inherent stochasticity of different players' interactions. In existing literature, the risk in stochastic games has been studied in terms of the inherent uncertainty evoked by the variabil… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: 5 pages, 1 figure, 2 tables

  8. arXiv:2111.04698  [pdf, other

    cs.LG cs.GT

    Interactive Inverse Reinforcement Learning for Cooperative Games

    Authors: Thomas Kleine Buening, Anne-Marie George, Christos Dimitrakakis

    Abstract: We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting s… ▽ More

    Submitted 13 June, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: ICML 2022

  9. arXiv:2104.11834  [pdf, other

    cs.LG q-bio.QM stat.ML

    High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling

    Authors: Hannes Eriksson, Christos Dimitrakakis, Lars Carlsson

    Abstract: We study the problem of performing automated experiment design for drug screening through Bayesian inference and optimisation. In particular, we compare and contrast the behaviour of linear-Gaussian models and Gaussian processes, when used in conjunction with upper confidence bound algorithms, Thompson sampling, or bounded horizon tree search. We show that non-myopic sophisticated exploration tech… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 14 pages, 6 figures

  10. arXiv:2104.07276  [pdf, ps, other

    cs.AI

    Adaptive Belief Discretization for POMDP Planning

    Authors: Divya Grover, Christos Dimitrakakis

    Abstract: Partially Observable Markov Decision Processes (POMDP) is a widely used model to represent the interaction of an environment and an agent, under state uncertainty. Since the agent does not observe the environment state, its uncertainty is typically represented through a probabilistic belief. While the set of possible beliefs is infinite, making exact planning intractable, the belief space's comple… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  11. arXiv:2102.11932  [pdf, other

    cs.AI

    On Meritocracy in Optimal Set Selection

    Authors: Thomas Kleine Buening, Meirav Segal, Debabrota Basu, Christos Dimitrakakis, Anne-Marie George

    Abstract: Typically, merit is defined with respect to some intrinsic measure of worth. We instead consider a setting where an individual's worth is \emph{relative}: when a Decision Maker (DM) selects a set of individuals from a population to maximise expected utility, it is natural to consider the \emph{Expected Marginal Contribution} (EMC) of each person to the utility. We show that this notion satisfies a… ▽ More

    Submitted 9 September, 2022; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: EAAMO 2022

  12. arXiv:2102.11075  [pdf, other

    cs.LG

    SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning

    Authors: Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis

    Abstract: In this paper, we consider risk-sensitive sequential decision-making in Reinforcement Learning (RL). Our contributions are two-fold. First, we introduce a novel and coherent quantification of risk, namely composite risk, which quantifies the joint effect of aleatory and epistemic risk during the learning process. Existing works considered either aleatory or epistemic risk individually, or as an ad… ▽ More

    Submitted 29 June, 2022; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: 22 pages, 10 figures, 8 tables Accepted for UAI2022

  13. arXiv:2002.03098  [pdf, other

    cs.LG stat.ML

    Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning

    Authors: Hannes Eriksson, Emilio Jorge, Christos Dimitrakakis, Debabrota Basu, Divya Grover

    Abstract: Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning. While "model-based" BRL algorithms have focused either on maintaining a posterior distribution on models or value functions and combining this with approximate dynamic programming or tree search, previous Bayesian "model-free" value function distribution approaches implicitly make strong assumpti… ▽ More

    Submitted 1 July, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: 28 pages, 12 figures

  14. arXiv:1912.08523  [pdf, other

    eess.SY

    Privacy of Real-Time Pricing in Smart Grid

    Authors: Mahrokh GhoddousiBoroujeni, Dominik Fay, Christos Dimitrakakis, Maryam Kamgarpour

    Abstract: Installing smart meters to publish real-time electricity rates has been controversial while it might lead to privacy concerns. Dispatched rates include fine-grained data on aggregate electricity consumption in a zone and could potentially be used to infer a household's pattern of energy use or its occupancy. In this paper, we propose Blowfish privacy to protect the occupancy state of the houses co… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

  15. arXiv:1906.09114  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

    Authors: Aristide Tossou, Christos Dimitrakakis, Debabrota Basu

    Abstract: We tackle the problem of acting in an unknown finite and discrete Markov Decision Process (MDP) for which the expected shortest path from any state to any other state is bounded by a finite number $D$. An MDP consists of $S$ states and $A$ possible actions per state. Upon choosing an action $a_t$ at state $s_t$, one receives a real value reward $r_t$, then one transits to a next state $s_{t+1}$. T… ▽ More

    Submitted 9 July, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: Improved the text and added detailed proofs of claims Change title to better express the solution proposed

  16. arXiv:1906.06273  [pdf, other

    cs.LG cs.AI stat.ML

    Epistemic Risk-Sensitive Reinforcement Learning

    Authors: Hannes Eriksson, Christos Dimitrakakis

    Abstract: We develop a framework for interacting with uncertain environments in reinforcement learning (RL) by leveraging preferences in the form of utility functions. We claim that there is value in considering different risk measures during learning. In this framework, the preference for risk can be tuned by variation of the parameter $β$ and the resulting behavior can be risk-averse, risk-neutral or risk… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: 8 pages, 2 figures

    Journal ref: Proceedings of the 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2020) 339-344

  17. arXiv:1906.01609  [pdf, ps, other

    cs.LG cs.GT

    Near-Optimal Online Egalitarian learning in General Sum Repeated Matrix Games

    Authors: Aristide Tossou, Christos Dimitrakakis, Jaroslaw Rzepecki, Katja Hofmann

    Abstract: We study two-player general sum repeated finite games where the rewards of each player are generated from an unknown distribution. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our most important contribution is the derivation of an algorithm that achieves simultaneously, for both pl… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

  18. arXiv:1905.12425  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

    Authors: Aristide Tossou, Debabrota Basu, Christos Dimitrakakis

    Abstract: We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptio… ▽ More

    Submitted 11 December, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: the algorithm has been simplified (no need to look at lower bound of the reward and transitions). Proof has been significantly clean-up. The previous "assumption" is clarified as a condition of the algorithm well-known as sub-modularity. The proof that the bounds satisfy the submodularity is clean-up

  19. arXiv:1905.12298  [pdf, ps, other

    cs.LG stat.ML

    Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?

    Authors: Debabrota Basu, Christos Dimitrakakis, Aristide Tossou

    Abstract: Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We leverage a unified proving technique to achieve all the lower bound… ▽ More

    Submitted 23 June, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: 27 pages, 1 figure, 2 tables, 14 theorems

    MSC Class: 62L10; 94A15

  20. arXiv:1904.03535  [pdf, other

    cs.LG cs.AI stat.ML

    Randomised Bayesian Least-Squares Policy Iteration

    Authors: Nikolaos Tziortziotis, Christos Dimitrakakis, Michalis Vazirgiannis

    Abstract: We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised BLSPI (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, th… ▽ More

    Submitted 6 April, 2019; originally announced April 2019.

    Comments: European Workshop on Reinforcement Learning 14, October 2018, Lille, France

  21. arXiv:1902.02661  [pdf, other

    cs.LG cs.AI stat.ML

    Bayesian Reinforcement Learning via Deep, Sparse Sampling

    Authors: Divya Grover, Debabrota Basu, Christos Dimitrakakis

    Abstract: We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term… ▽ More

    Submitted 27 June, 2020; v1 submitted 7 February, 2019; originally announced February 2019.

    Comments: Published in AISTATS 2020

  22. arXiv:1902.00941  [pdf, other

    nucl-th cs.LG stat.ML

    Bayesian optimization in ab initio nuclear physics

    Authors: A. Ekström, C. Forssén, C. Dimitrakakis, D. Dubhashi, H. T. Johansson, A. S. Muhammad, H. Salomonsson, A. Schliep

    Abstract: Theoretical models of the strong nuclear interaction contain unknown coupling constants (parameters) that must be determined using a pool of calibration data. In cases where the models are complex, leading to time consuming calculations, it is particularly challenging to systematically search the corresponding parameter domain for the best fit to the data. In this paper, we explore the prospect of… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

    Comments: 33 pages, 14 figures

  23. arXiv:1806.09192  [pdf, ps, other

    cs.CR cs.AI cs.LG

    On The Differential Privacy of Thompson Sampling With Gaussian Prior

    Authors: Aristide C. Y. Tossou, Christos Dimitrakakis

    Abstract: We show that Thompson Sampling with Gaussian Prior as detailed by Algorithm 2 in (Agrawal & Goyal, 2013) is already differentially private. Theorem 1 show that it enjoys a very competitive privacy loss of only $\mathcal{O}(\ln^2 T)$ after T rounds. Finally, Theorem 2 show that one can control the privacy loss to any desirable $ε$ level by appropriately increasing the variance of the samples from t… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: Accepted in Privacy in Machine Learning and Artificial Intelligence Workshop 2018

  24. arXiv:1707.09678  [pdf, ps, other

    cs.LG

    Learning to Match

    Authors: Philip Ekman, Sebastian Bellevik, Christos Dimitrakakis, Aristide Tossou

    Abstract: Outsourcing tasks to previously unknown parties is becoming more common. One specific such problem involves matching a set of workers to a set of tasks. Even if the latter have precise requirements, the quality of individual workers is usually unknown. The problem is thus a version of matching under uncertainty. We believe that this type of problem is going to be increasingly important. When the… ▽ More

    Submitted 30 July, 2017; originally announced July 2017.

    Comments: 5 pages. This version will be presented at the VAMS Recsys workshop 2017

  25. arXiv:1707.01875  [pdf, ps, other

    cs.LG

    Calibrated Fairness in Bandits

    Authors: Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, David C. Parkes

    Abstract: We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distr… ▽ More

    Submitted 6 July, 2017; originally announced July 2017.

    Comments: To be presented at the FAT-ML'17 workshop

  26. arXiv:1706.00119  [pdf, ps, other

    cs.LG stat.ML

    Bayesian fairness

    Authors: Christos Dimitrakakis, Yang Liu, David Parkes, Goran Radanovic

    Abstract: We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty. We argue that recent notions of fairness in machine learning need to explicitly incorporate parameter uncertainty, hence we introduce the notion of {\em Bayesian fairness} as a suitable candidate for fair decision rules. Using balance, a definition of fairnes… ▽ More

    Submitted 4 November, 2018; v1 submitted 31 May, 2017; originally announced June 2017.

    Comments: 13 pages, 8 figures, to appear at AAAI 2019

  27. arXiv:1701.04238  [pdf, other

    cs.LG cs.AI

    Thompson Sampling For Stochastic Bandits with Graph Feedback

    Authors: Aristide C. Y. Tossou, Christos Dimitrakakis, Devdatt Dubhashi

    Abstract: We present a novel extension of Thompson Sampling for stochastic sequential decision problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, linking its performance to the underlying properties of the graph. Thompson Sampling has the advantage of being applicable without the need to co… ▽ More

    Submitted 16 January, 2017; originally announced January 2017.

  28. arXiv:1701.04222  [pdf, other

    cs.LG cs.AI cs.CR

    Achieving Privacy in the Adversarial Multi-Armed Bandit

    Authors: Aristide C. Y. Tossou, Christos Dimitrakakis

    Abstract: In this paper, we improve the previously best known regret bound to achieve $ε$-differential privacy in oblivious adversarial bandits from $\mathcal{O}{(T^{2/3}/ε)}$ to $\mathcal{O}{(\sqrt{T} \ln T /ε)}$. This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in $T$. However, we can improve th… ▽ More

    Submitted 16 January, 2017; originally announced January 2017.

  29. arXiv:1512.06992  [pdf, other

    cs.AI cs.CR cs.LG math.ST stat.ML

    On the Differential Privacy of Bayesian Inference

    Authors: Zuhe Zhang, Benjamin Rubinstein, Christos Dimitrakakis

    Abstract: We study how to communicate findings of Bayesian inference to third parties, while preserving the strong guarantee of differential privacy. Our main contributions are four different algorithms for private Bayesian inference on proba-bilistic graphical models. These include two mechanisms for adding noise to the Bayesian updates, either directly to the posterior parameters, or to their Fourier tran… ▽ More

    Submitted 22 December, 2015; originally announced December 2015.

    Comments: AAAI 2016, Feb 2016, Phoenix, Arizona, United States

  30. arXiv:1511.08681  [pdf, ps, other

    stat.ML cs.CR cs.LG

    Algorithms for Differentially Private Multi-Armed Bandits

    Authors: Aristide Tossou, Christos Dimitrakakis

    Abstract: We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(ε, δ)$ differentially private variants of Upper Confidence Bound algo… ▽ More

    Submitted 27 November, 2015; originally announced November 2015.

    Journal ref: AAAI 2016, Feb 2016, Phoenix, Arizona, United States

  31. arXiv:1412.3276  [pdf, ps, other

    cs.LG stat.ML

    Generalised Entropy MDPs and Minimax Regret

    Authors: Emmanouil G. Androulakis, Christos Dimitrakakis

    Abstract: Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.

    Submitted 10 December, 2014; originally announced December 2014.

    Comments: 7 pages, NIPS workshop "From bad models to good policies"

  32. arXiv:1408.2067  [pdf

    cs.LG stat.ML

    Probabilistic inverse reinforcement learning in unknown environments

    Authors: Aristide Tossou, Christos Dimitrakakis

    Abstract: We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or op… ▽ More

    Submitted 9 August, 2014; originally announced August 2014.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-635-643

  33. arXiv:1307.3785  [pdf, ps, other

    stat.ML cs.LG

    Probabilistic inverse reinforcement learning in unknown environments

    Authors: Aristide C. Y. Tossou, Christos Dimitrakakis

    Abstract: We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or op… ▽ More

    Submitted 14 July, 2013; originally announced July 2013.

    Comments: UAI 2013

  34. arXiv:1306.1066  [pdf, other

    stat.ML cs.LG

    Bayesian Differential Privacy through Posterior Sampling

    Authors: Christos Dimitrakakis, Blaine Nelson, and Zuhe Zhang, Aikaterini Mitrokotsa, Benjamin Rubinstein

    Abstract: Differential privacy formalises privacy-preserving mechanisms that provide access to a database. We pose the question of whether Bayesian inference itself can be used directly to provide private access to data, with no modification. The answer is affirmative: under certain conditions on the prior, sampling from the posterior distribution can be used to achieve a desired level of privacy and utilit… ▽ More

    Submitted 23 December, 2016; v1 submitted 5 June, 2013; originally announced June 2013.

    Comments: 38 pages; An earlier version of this article was published in ALT 2014. This version has corrections and additional results

  35. arXiv:1305.1809  [pdf, other

    stat.ML cs.LG

    Cover Tree Bayesian Reinforcement Learning

    Authors: Nikolaos Tziortziotis, Christos Dimitrakakis, Konstantinos Blekas

    Abstract: This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the mo… ▽ More

    Submitted 2 May, 2014; v1 submitted 8 May, 2013; originally announced May 2013.

  36. arXiv:1303.6977  [pdf, ps, other

    stat.ML cs.LG

    ABC Reinforcement Learning

    Authors: Christos Dimitrakakis, Nikolaos Tziortziotis

    Abstract: This paper introduces a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The main advantage is that we only require a prior distribution on a class of simulators (generative models). This is useful in domains where an analytical probabilistic model of the underlying process is too complex to formulate, but where detailed… ▽ More

    Submitted 28 June, 2013; v1 submitted 27 March, 2013; originally announced March 2013.

    Comments: Corrected version of paper appearing in ICML 2013

  37. Monte-Carlo utility estimates for Bayesian reinforcement learning

    Authors: Christos Dimitrakakis

    Abstract: This paper introduces a set of algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, Monte-Carlo estimation of upper bounds on the Bayes-optimal value function is employed to construct an optimistic policy. Secondly, gradient-based algorithms for approximate upper and lower bounds are introduced. Finally, we introduce a new class of gradient algorithms for Bayesian Bellman error min… ▽ More

    Submitted 11 March, 2013; originally announced March 2013.

    Comments: 6 pages, 4 figures, 1 table, submitted to IEEE conference on decision and control

  38. arXiv:1303.0665  [pdf, other

    cs.IR cs.LG stat.ML

    Personalized News Recommendation with Context Trees

    Authors: Florent Garcin, Christos Dimitrakakis, Boi Faltings

    Abstract: The profusion of online news articles makes it difficult to find interesting articles, a problem that can be assuaged by using a recommender system to bring the most relevant news stories to readers. However, news recommendation is challenging because the most relevant articles are often new content seen by few users. In addition, they are subject to trends and preference changes over time, and in… ▽ More

    Submitted 3 November, 2014; v1 submitted 4 March, 2013; originally announced March 2013.

    Journal ref: Proceedings of the 7th ACM conference on Recommender systems (2013), pp. 105--112

  39. arXiv:1208.5641  [pdf, ps, other

    cs.CR stat.AP

    Near-Optimal Blacklisting

    Authors: Christos Dimitrakakis, Aikaterini Mitrokotsa

    Abstract: Many applications involve agents sharing a resource, such as networks or services. When agents are honest, the system functions well and there is a net profit. Unfortunately, some agents may be malicious, but it may be hard to detect them. We consider the intrusion response problem of how to permanently blacklist agents, in order to maximise expected profit. This is not trivial, as blacklisting ma… ▽ More

    Submitted 29 July, 2013; v1 submitted 28 August, 2012; originally announced August 2012.

    Comments: Submitted to INFOCOM 2014, 10 pages, 3 figures

  40. arXiv:1201.2555  [pdf, ps, other

    cs.LG stat.ML

    Sparse Reward Processes

    Authors: Christos Dimitrakakis

    Abstract: We introduce a class of learning problems where the agent is presented with a series of tasks. Intuitively, if there is relation among those tasks, then the information gained during execution of one task has value for the execution of another task. Consequently, the agent is intrinsically motivated to explore its environment beyond the degree necessary to solve the current task it has at hand. We… ▽ More

    Submitted 5 September, 2012; v1 submitted 12 January, 2012; originally announced January 2012.

    Comments: 14 pages, 4 figures

  41. Bayesian multitask inverse reinforcement learning

    Authors: Christos Dimitrakakis, Constantin Rothkopf

    Abstract: We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the rel… ▽ More

    Submitted 17 November, 2011; v1 submitted 18 June, 2011; originally announced June 2011.

    Comments: Corrected version. 13 pages, 8 figures

    MSC Class: 62C10; 91B08; 91B10 ACM Class: G.3

    Journal ref: Recent Advances in Reinforcement Learning LNCS 7188, pp. 273-284, 2012

  42. arXiv:1106.3651  [pdf, ps, other

    cs.LG stat.ML

    Robust Bayesian reinforcement learning through tight lower bounds

    Authors: Christos Dimitrakakis

    Abstract: In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to th… ▽ More

    Submitted 11 November, 2011; v1 submitted 18 June, 2011; originally announced June 2011.

    Comments: Corrected version. 12 pages, 3 figures, 1 table

  43. arXiv:1104.5687  [pdf, other

    stat.ML cs.LG

    Preference elicitation and inverse reinforcement learning

    Authors: Constantin Rothkopf, Christos Dimitrakakis

    Abstract: We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a posterior distribution on the agent's preferences, policy and optionally, the obtained reward sequence, from observations. We examine the relation… ▽ More

    Submitted 29 June, 2011; v1 submitted 29 April, 2011; originally announced April 2011.

    Comments: 13 pages, 4 figures; ECML 2011

    Report number: EPFL-REPORT-165975

  44. arXiv:1009.0278  [pdf, ps, other

    cs.CR cs.NI

    Expected loss analysis of thresholded authentication protocols in noisy conditions

    Authors: Christos Dimitrakakis, Aikaterini Mitrokotsa, Serge Vaudenay

    Abstract: A number of authentication protocols have been proposed recently, where at least some part of the authentication is performed during a phase, lasting $n$ rounds, with no error correction. This requires assigning an acceptable threshold for the number of detected errors. This paper describes a framework enabling an expected loss analysis for all the protocols in this family. Furthermore, computatio… ▽ More

    Submitted 1 September, 2010; originally announced September 2010.

    Comments: 17 pages, 2 figures; draft

  45. arXiv:1005.2263  [pdf, other

    stat.ML cs.LG

    Context models on sequences of covers

    Authors: Christos Dimitrakakis

    Abstract: We present a class of models that, via a simple construction, enables exact, incremental, non-parametric, polynomial-time, Bayesian inference of conditional measures. The approach relies upon creating a sequence of covers on the conditioning variable and maintaining a different model for each set within a cover. Inference remains tractable by specifying the probabilistic model in terms of a random… ▽ More

    Submitted 30 May, 2011; v1 submitted 13 May, 2010; originally announced May 2010.

    Comments: 14 pages, 2 figures

  46. arXiv:0912.5029  [pdf, ps, other

    cs.LG cs.AI

    Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

    Authors: Christos Dimitrakakis

    Abstract: There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to obtain stochastic lower and upper bounds on the value of each tree node. This enables us to use stoc… ▽ More

    Submitted 26 December, 2009; originally announced December 2009.

    Comments: 13 pages, 1 figure, ICAART 2010

    Report number: TR-UVA-09-01

  47. arXiv:0910.0483  [pdf, ps, other

    stat.ML cs.LG stat.AP

    Statistical Decision Making for Authentication and Intrusion Detection

    Authors: Christos Dimitrakakis, Aikaterini Mitrokotsa

    Abstract: User authentication and intrusion detection differ from standard classification problems in that while we have data generated from legitimate users, impostor or intrusion data is scarce or non-existent. We review existing techniques for dealing with this problem and propose a novel alternative based on a principled statistical decision-making view point. We examine the technique on a toy problem… ▽ More

    Submitted 5 October, 2009; originally announced October 2009.

    Comments: 13 pages, 2 figures, to be presented at ICMLA 2009

    Report number: IAS-UVA-09-02

  48. arXiv:0906.4618  [pdf, other

    cs.CR

    Shedding Light on RFID Distance Bounding Protocols and Terrorist Fraud Attacks

    Authors: Pedro Peris-Lopez, Julio C. Hernandez-Castro, Christos Dimitrakakis, Aikaterini Mitrokotsa, Juan M. E. Tapiador

    Abstract: The vast majority of RFID authentication protocols assume the proximity between readers and tags due to the limited range of the radio channel. However, in real scenarios an intruder can be located between the prover (tag) and the verifier (reader) and trick this last one into thinking that the prover is in close proximity. This attack is generally known as a relay attack in which scope distance f… ▽ More

    Submitted 20 June, 2010; v1 submitted 25 June, 2009; originally announced June 2009.

    Comments: 31 pages, 10 figures, 1 table

  49. arXiv:0902.0392  [pdf, ps, other

    stat.ML cs.LG

    Tree Exploration for Bayesian RL Exploration

    Authors: Christos Dimitrakakis

    Abstract: Research in reinforcement learning has produced algorithms for optimal decision making under uncertainty that fall within two main types. The first employs a Bayesian framework, where optimality improves with increased computational time. This is because the resulting planning task takes the form of a dynamic programming problem on a belief tree with an infinite number of states. The second type e… ▽ More

    Submitted 21 September, 2011; v1 submitted 2 February, 2009; originally announced February 2009.

    Comments: 13 pages, 1 figure. Slightly extended and corrected version (notation errors and lower bound calculation) of homonymous paper presented at the conference of Computational Intelligence for Modelling, Control and Automation 2008 (CIMCA'08)

    Report number: IAS-08-04

  50. arXiv:0807.2043  [pdf

    cs.CR cs.CV cs.NI

    Intrusion Detection Using Cost-Sensitive Classification

    Authors: Aikaterini Mitrokotsa, Christos Dimitrakakis, Christos Douligeris

    Abstract: Intrusion Detection is an invaluable part of computer networks defense. An important consideration is the fact that raising false alarms carries a significantly lower cost than not detecting at- tacks. For this reason, we examine how cost-sensitive classification methods can be used in Intrusion Detection systems. The performance of the approach is evaluated under different experimental conditio… ▽ More

    Submitted 13 July, 2008; originally announced July 2008.

    Comments: 13 pages, 6 figures, presented at EC2ND 2007