Skip to main content

Showing 1–41 of 41 results for author: Metelli, A M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15124  [pdf, ps, other

    cs.LG

    A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

    Authors: Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli

    Abstract: Hierarchical Reinforcement Learning (HRL) approaches have shown successful results in solving a large variety of complex, structured, long-horizon problems. Nevertheless, a full theoretical understanding of this empirical evidence is currently missing. In the context of the \emph{option} framework, prior research has devised efficient algorithms for scenarios where options are fixed, and the high-… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.07991  [pdf, other

    cs.LG stat.ML

    Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis

    Authors: Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

    Abstract: Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. Previous works have proposed approaches to MTL that can be divided into feature learning, focused on the identification of a common feature representation, and task clustering, where similar tasks are grouped together. In this paper, we pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.03812  [pdf, other

    cs.LG

    How to Scale Inverse RL to Large State Spaces? A Provably Efficient Approach

    Authors: Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

    Abstract: In online Inverse Reinforcement Learning (IRL), the learner can collect samples about the dynamics of the environment to improve its estimate of the reward function. Since IRL suffers from identifiability issues, many theoretical works on online IRL focus on estimating the entire set of rewards that explain the demonstrations, named the feasible reward set. However, none of the algorithms availabl… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2406.03033  [pdf, other

    cs.LG stat.ML

    Optimal Multi-Fidelity Best-Arm Identification

    Authors: Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli

    Abstract: In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimalit… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2405.06363  [pdf, ps, other

    cs.LG cs.AI

    Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs

    Authors: Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restelli

    Abstract: We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our s… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  6. arXiv:2405.05630  [pdf, other

    cs.LG

    Policy Gradient with Active Importance Sampling

    Authors: Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello Restelli

    Abstract: Importance sampling (IS) represents a fundamental technique for a large surge of off-policy reinforcement learning approaches. Policy gradient (PG) methods, in particular, significantly benefit from IS, enabling the effective reuse of previously collected samples, thus increasing sample efficiency. However, classically, IS is employed in RL as a passive tool for re-weighting historical samples. Ho… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2405.02235  [pdf, other

    cs.LG

    Learning Optimal Deterministic Policies with Stochastic Policy Gradients

    Authors: Alessandro Montenegro, Marco Mussi, Alberto Maria Metelli, Matteo Papini

    Abstract: Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. They learn stochastic parametric (hyper)policies by either exploring in the space of actions or in the space of parameters. Stochastic controllers, however, are often undesirable from a practical perspective because of their lack of robustness, safety, and traceability. In common pr… ▽ More

    Submitted 30 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  8. arXiv:2402.15392  [pdf, ps, other

    cs.LG

    Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

    Authors: Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

    Abstract: Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. It is well-known that the IRL problem is fundamentally ill-posed, i.e., many reward functions can explain the demonstrations. For this reason, IRL has been recently reframed in terms of estimating the feasible reward set (Metelli et al., 2021), thus, postponing the selection… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: International Conference on Machine Learning 41 (ICML 2024)

  9. arXiv:2402.13821  [pdf, ps, other

    cs.LG

    Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

    Authors: Alberto Maria Metelli

    Abstract: Configurable Markov Decision Processes (Conf-MDPs) have recently been introduced as an extension of the traditional Markov Decision Processes (MDPs) to model the real-world scenarios in which there is the possibility to intervene in the environment in order to configure some of its parameters. In this paper, we focus on a particular subclass of Conf-MDP that satisfies regularity conditions, namely… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  10. arXiv:2402.10282  [pdf, other

    cs.LG stat.ML

    Information Capacity Regret Bounds for Bandits with Mediator Feedback

    Authors: Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

    Abstract: This work addresses the mediator feedback problem, a bandit game where the decision set consists of a number of policies, each associated with a probability distribution over a common space of outcomes. Upon choosing a policy, the learner observes an outcome sampled from its distribution and incurs the loss assigned to this outcome in the present round. We introduce the policy set capacity as an i… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  11. arXiv:2402.03792  [pdf, other

    cs.LG cs.AI

    No-Regret Reinforcement Learning in Smooth MDPs

    Authors: Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell

    Abstract: Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision proces… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  12. arXiv:2401.03857  [pdf, other

    cs.LG cs.AI

    Inverse Reinforcement Learning with Sub-optimal Experts

    Authors: Riccardo Poiani, Gabriele Curti, Alberto Maria Metelli, Marcello Restelli

    Abstract: Inverse Reinforcement Learning (IRL) techniques deal with the problem of deducing a reward function that explains the behavior of an expert agent who is assumed to act optimally in an underlying unknown task. In several problems of interest, however, it is possible to observe the behavior of multiple experts with different degree of optimality (e.g., racing drivers whose skills ranges from amateur… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  13. arXiv:2312.12869  [pdf, other

    cs.LG cs.AI

    Parameterized Projected Bellman Operator

    Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

    Abstract: Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transi… ▽ More

    Submitted 6 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Proceedings of the National Conference on Artificial Intelligence (AAAI-24)

  14. arXiv:2310.11059  [pdf, other

    cs.LG

    Causal Feature Selection via Transfer Entropy

    Authors: Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

    Abstract: Machine learning algorithms are designed to capture complex relationships between features. In this context, the high dimensionality of data often results in poor model performance, with the risk of overfitting. Feature selection, the process of selecting a subset of relevant and non-redundant features, is, therefore, an essential step to mitigate these issues. However, classical feature selection… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  15. arXiv:2310.02975  [pdf, ps, other

    cs.LG cs.AI

    $(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

    Authors: Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli

    Abstract: Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order… ▽ More

    Submitted 12 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  16. arXiv:2308.15552  [pdf, ps, other

    cs.LG stat.ML

    Pure Exploration under Mediators' Feedback

    Authors: Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

    Abstract: Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a stochastic reward. Within the context of best-arm identification (BAI) problems, the goal of the agent lies in finding the optimal arm, i.e., the one with highest expected reward, as accurately and efficiently as possible. Nevertheless, the sequentia… ▽ More

    Submitted 12 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  17. arXiv:2306.11143  [pdf, other

    cs.LG stat.ML

    Nonlinear Feature Aggregation: Two Algorithms driven by Theory

    Authors: Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

    Abstract: Many real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues, as well as the risk of overfitting. Ideally, only relevant and non-redundant features should be considered to preserve the complete information of the original data and limit the dimensionality. Dimensionality reduction and feature selection are common preproces… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  18. arXiv:2305.06936  [pdf, ps, other

    cs.LG cs.IT

    An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes

    Authors: Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli

    Abstract: A large variety of real-world Reinforcement Learning (RL) tasks is characterized by a complex and heterogeneous structure that makes end-to-end (or flat) approaches hardly applicable or even infeasible. Hierarchical Reinforcement Learning (HRL) provides general solutions to address these problems thanks to a convenient multi-level decomposition of the tasks, making their solution accessible. Altho… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  19. arXiv:2305.04361  [pdf, ps, other

    cs.LG cs.AI

    Truncating Trajectories in Monte Carlo Reinforcement Learning

    Authors: Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

    Abstract: In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i.e., the expected return. In practice, in many tasks of interest, such as policy optimization, the agent usually spends its interaction budget by collecting episodes of fixed length within a simulator (i.e., Monte Carlo simulation). However, give… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

  20. arXiv:2304.12966  [pdf, ps, other

    cs.LG

    Towards Theoretical Understanding of Inverse Reinforcement Learning

    Authors: Alberto Maria Metelli, Filippo Lazzati, Marcello Restelli

    Abstract: Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent. A well-known limitation of IRL is the ambiguity in the choice of the reward function, due to the existence of multiple rewards that explain the observed behavior. This limitation has been recently circumvented by formulating IRL as t… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Submitted to: ICML23

  21. arXiv:2304.05073  [pdf, other

    cs.LG

    A Tale of Sampling and Estimation in Discounted Reinforcement Learning

    Authors: Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

    Abstract: The most relevant problems in discounted reinforcement learning involve estimating the mean of a function under the stationary distribution of a Markov reward process, such as the expected return in policy evaluation, or the policy gradient in policy optimization. In practice, these estimates are produced through a finite-horizon episodic sampling, which neglects the mixing properties of the Marko… ▽ More

    Submitted 14 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: AISTATS 2023

  22. arXiv:2303.14734  [pdf, other

    cs.LG

    Interpretable Linear Dimensionality Reduction based on Bias-Variance Analysis

    Authors: Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

    Abstract: One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select only the relevant, non-redundant features to preserve the complete information contained in the original dataset, with little collinearity among features and a smaller dimension. This procedure helps mitigate problems like overfitting and the curs… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

  23. arXiv:2303.08102  [pdf, ps, other

    cs.LG stat.ML

    Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

    Authors: Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

    Abstract: We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic quantities that measure the similarity between experts. In some natural special cases, this allows us to obtain the first regret bound for EXP4 that can get arbitr… ▽ More

    Submitted 15 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  24. arXiv:2303.02378  [pdf, other

    cs.LG cs.AI

    Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

    Authors: Amarildo Likmeta, Matteo Sacco, Alberto Maria Metelli, Marcello Restelli

    Abstract: Uncertainty quantification has been extensively used as a means to achieve efficient directed exploration in Reinforcement Learning (RL). However, state-of-the-art methods for continuous actions still suffer from high sample complexity requirements. Indeed, they either completely lack strategies for propagating the epistemic uncertainty throughout the updates, or they mix it with aleatoric uncerta… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

  25. arXiv:2302.07510  [pdf, other

    cs.LG

    Best Arm Identification for Stochastic Rising Bandits

    Authors: Marco Mussi, Alessandro Montenegro, Francesco Trovó, Marcello Restelli, Alberto Maria Metelli

    Abstract: Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected. This setting captures a wide range of scenarios in which the available options are learning entities whose performance improves (in expectation) over time (e.g., online best model selection). While previous works addressed the regr… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Accepted to ICML 2024

  26. arXiv:2212.06251  [pdf, other

    cs.LG stat.ML

    Autoregressive Bandits

    Authors: Francesco Bacchiocchi, Gianmarco Genalti, Davide Maran, Marco Mussi, Marcello Restelli, Nicola Gatti, Alberto Maria Metelli

    Abstract: Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales forecasting, weather prediction, advertising, and pricing. When facing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for guaranteeing convergence to the optimal policy. In this work, we pr… ▽ More

    Submitted 19 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted to AISTATS 2024

  27. arXiv:2212.03922  [pdf, other

    cs.LG

    Tight Performance Guarantees of Imitator Policies with Continuous Actions

    Authors: Davide Maran, Alberto Maria Metelli, Marcello Restelli

    Abstract: Behavioral Cloning (BC) aims at learning a policy that mimics the behavior demonstrated by an expert. The current theoretical understanding of BC is limited to the case of finite actions. In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We start by deriving a novel bound on the performance gap… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  28. arXiv:2212.03798  [pdf, other

    cs.LG stat.ML

    Stochastic Rising Bandits

    Authors: Alberto Maria Metelli, Francesco Trovò, Matteo Pirola, Marcello Restelli

    Abstract: This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms th… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Corrected definition of "cumulative increment" (Equation 2) and efficient update (Appendix D)

    Journal ref: International Conference on Machine Learning (ICML). 2022

  29. arXiv:2211.11620  [pdf, other

    cs.LG

    Simultaneously Updating All Persistence Values in Reinforcement Learning

    Authors: Luca Sabbioni, Luca Al Daire, Lorenzo Bisi, Alberto Maria Metelli, Marcello Restelli

    Abstract: In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to v… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  30. arXiv:2211.08997  [pdf, other

    cs.LG

    Dynamical Linear Bandits

    Authors: Marco Mussi, Alberto Maria Metelli, Marcello Restelli

    Abstract: In many real-world sequential decision-making problems, an action does not immediately reflect on the feedback and spreads its effects over a long time frame. For instance, in online advertising, investing in a platform produces an instantaneous increase of awareness, but the actual reward, i.e., a conversion, might occur far in the future. Furthermore, whether a conversion takes place depends on:… ▽ More

    Submitted 30 May, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

  31. arXiv:2207.12509  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

    Authors: Riccardo Poiani, Ciprian Stirbu, Alberto Maria Metelli, Marcello Restelli

    Abstract: With the continuous growth of the global economy and markets, resource imbalance has risen to be one of the central issues in real logistic scenarios. In marine transportation, this trade imbalance leads to Empty Container Repositioning (ECR) problems. Once the freight has been delivered from an exporting country to an importing one, the laden will turn into empty containers that need to be reposi… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  32. arXiv:2207.03851  [pdf, other

    cs.LG cs.AI

    Storehouse: a Reinforcement Learning Environment for Optimizing Warehouse Management

    Authors: Julen Cestero, Marco Quartulli, Alberto Maria Metelli, Marcello Restelli

    Abstract: Warehouse Management Systems have been evolving and improving thanks to new Data Intelligence techniques. However, many current optimizations have been applied to specific cases or are in great need of manual interaction. Here is where Reinforcement Learning techniques come into play, providing automatization and adaptability to current optimization policies. In this paper, we present Storehouse,… ▽ More

    Submitted 21 July, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 9 pages, 6 figures, accepted in WCCI 2022

  33. arXiv:2205.10416  [pdf, ps, other

    cs.LG

    ARLO: A Framework for Automated Reinforcement Learning

    Authors: Marco Mussi, Davide Lombarda, Alberto Maria Metelli, Francesco Trovò, Marcello Restelli

    Abstract: Automated Reinforcement Learning (AutoRL) is a relatively new area of research that is gaining increasing attention. The objective of AutoRL consists in easing the employment of Reinforcement Learning (RL) techniques for the broader public by alleviating some of its main challenges, including data collection, algorithm selection, and hyper-parameter tuning. In this work, we propose a general and f… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

  34. arXiv:2112.06625  [pdf, other

    cs.LG

    Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

    Authors: Pierre Liotet, Francesco Vidaich, Alberto Maria Metelli, Marcello Restelli

    Abstract: Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for current reinforcement learning algorithms. Yet this would be a much needed feature for practical applications. In this paper, we propose an approach which learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time. This hyper-policy is trained to maxi… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted at AAAI2022

  35. arXiv:2012.08225  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Optimization as Online Learning with Mediator Feedback

    Authors: Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

    Abstract: Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available information, compared to the standard bandit feedback, allows reusing samples generated by one policy to estimate the performance of other policies. Based on t… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  36. arXiv:2002.06836  [pdf, other

    cs.LG stat.ML

    Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

    Authors: Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli

    Abstract: The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy. In this paper, we introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps, having the effect of modifying the control frequency. We start analyzing how action persistence a… ▽ More

    Submitted 12 July, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

  37. arXiv:1909.04115  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient-Aware Model-based Policy Search

    Authors: Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

    Abstract: Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of th… ▽ More

    Submitted 20 November, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

  38. arXiv:1909.03984  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Space Identification in Configurable Environments

    Authors: Alberto Maria Metelli, Guglielmo Manneschi, Marcello Restelli

    Abstract: We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy. We introduce an approach based on statistical testing to identify the set of policy parameters the agent can control, within a larger parametric policy space. After presenting two identification rules (combinatorial and simplified), applicable under di… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

  39. arXiv:1907.07384  [pdf, other

    cs.LG stat.ML

    Feature Selection via Mutual Information: New Theoretical Insights

    Authors: Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

    Abstract: Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that cond… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2019

  40. arXiv:1809.06098  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Optimization via Importance Sampling

    Authors: Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli

    Abstract: Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estima… ▽ More

    Submitted 31 October, 2018; v1 submitted 17 September, 2018; originally announced September 2018.

    Journal ref: 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada

  41. arXiv:1806.05415  [pdf, other

    cs.AI

    Configurable Markov Decision Processes

    Authors: Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

    Abstract: In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteratio… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Journal ref: Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018