-
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
Authors:
Bianca Marin Moreno,
Margaux Brégère,
Pierre Gaillard,
Nadia Oudjane
Abstract:
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can b…
▽ More
We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a slee** expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the probability transitions (uncertainty and non-stationarity coming only from external noise, independent of agent state-action pairs), we achieve optimal dynamic regret without prior knowledge of MDP changes. Unlike approaches for RL, MetaCURL handles full adversarial losses, not just stochastic ones. We believe our approach for managing non-stationarity with experts can be of interest to the RL community.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Automated Deep Learning for Load Forecasting
Authors:
Julie Keisler,
Sandra Claudel,
Gilles Cabriel,
Margaux Brégère
Abstract:
Accurate forecasting of electricity consumption is essential to ensure the performance and stability of the grid, especially as the use of renewable energy increases. Forecasting electricity is challenging because it depends on many external factors, such as weather and calendar variables. While regression-based models are currently effective, the emergence of new explanatory variables and the nee…
▽ More
Accurate forecasting of electricity consumption is essential to ensure the performance and stability of the grid, especially as the use of renewable energy increases. Forecasting electricity is challenging because it depends on many external factors, such as weather and calendar variables. While regression-based models are currently effective, the emergence of new explanatory variables and the need to refine the temporality of the signals to be forecasted is encouraging the exploration of novel methodologies, in particular deep learning models. However, Deep Neural Networks (DNNs) struggle with this task due to the lack of data points and the different types of explanatory variables (e.g. integer, float, or categorical). In this paper, we explain why and how we used Automated Deep Learning (AutoDL) to find performing DNNs for load forecasting. We ended up creating an AutoDL framework called EnergyDragon by extending the DRAGON package and applying it to load forecasting. EnergyDragon automatically selects the features embedded in the DNN training in an innovative way and optimizes the architecture and the hyperparameters of the networks. We demonstrate on the French load signal that EnergyDragon can find original DNNs that outperform state-of-the-art load forecasting methods as well as other AutoDL approaches.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
A Bandit Approach with Evolutionary Operators for Model Selection
Authors:
Margaux Brégère,
Julie Keisler
Abstract:
This work formulates model selection as an infinite-armed bandit problem, namely, a problem in which a decision maker iteratively selects one of an infinite number of fixed choices (i.e., arms) when the properties of each choice are only partially known at the time of allocation and may become better understood over time, via the attainment of rewards.Here, the arms are machine learning models to…
▽ More
This work formulates model selection as an infinite-armed bandit problem, namely, a problem in which a decision maker iteratively selects one of an infinite number of fixed choices (i.e., arms) when the properties of each choice are only partially known at the time of allocation and may become better understood over time, via the attainment of rewards.Here, the arms are machine learning models to train and selecting an arm corresponds to a partial training of the model (resource allocation).The reward is the accuracy of the selected model after its partial training.We aim to identify the best model at the end of a finite number of resource allocations and thus consider the best arm identification setup. We propose the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms into the UCB-E (Upper Confidence Bound Exploration) bandit algorithm introduced by Audiber et al.Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of-the-art for a fixed budget.
△ Less
Submitted 19 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent
Authors:
Bianca Marin Moreno,
Margaux Brégère,
Pierre Gaillard,
Nadia Oudjane
Abstract:
Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-…
▽ More
Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Reimagining Demand-Side Management with Mean Field Learning
Authors:
Bianca Marin Moreno,
Margaux Brégère,
Pierre Gaillard,
Nadia Oudjane
Abstract:
Integrating renewable energy into the power grid while balancing supply and demand is a complex issue, given its intermittent nature. Demand side management (DSM) offers solutions to this challenge. We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal. We model it as a finite horizon Markovian mean…
▽ More
Integrating renewable energy into the power grid while balancing supply and demand is a complex issue, given its intermittent nature. Demand side management (DSM) offers solutions to this challenge. We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal. We model it as a finite horizon Markovian mean field control problem. We develop a new algorithm, MD-MFC, which provides theoretical guarantees for convex and Lipschitz objective functions. What distinguishes MD-MFC from the existing load control literature is its effectiveness in directly solving the target tracking problem without resorting to regularization techniques on the main problem. A non-standard Bregman divergence on a mirror descent scheme allows dynamic programming to be used to obtain simple closed-form solutions. In addition, we show that general mean-field game algorithms can be applied to this problem, which expands the possibilities for addressing load control problems. We illustrate our claims with experiments on a realistic data set.
△ Less
Submitted 25 May, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders
Authors:
Margaux Brégère,
Ricardo J. Bessa
Abstract:
The implementation of efficient demand response (DR) programs for household electricity consumption would benefit from data-driven methods capable of simulating the impact of different tariffs schemes. This paper proposes a novel method based on conditional variational autoencoders (CVAE) to generate, from an electricity tariff profile combined with exogenous weather and calendar variables, daily…
▽ More
The implementation of efficient demand response (DR) programs for household electricity consumption would benefit from data-driven methods capable of simulating the impact of different tariffs schemes. This paper proposes a novel method based on conditional variational autoencoders (CVAE) to generate, from an electricity tariff profile combined with exogenous weather and calendar variables, daily consumption profiles of consumers segmented in different clusters. First, a large set of consumers is gathered into clusters according to their consumption behavior and price-responsiveness. The clustering method is based on a causality model that measures the effect of a specific tariff on the consumption level. Then, daily electrical energy consumption profiles are generated for each cluster with CVAE. This non-parametric approach is compared to a semi-parametric data generator based on generalized additive models and that uses prior knowledge of energy consumption. Experiments in a publicly available data set show that, the proposed method presents comparable performance to the semi-parametric one when it comes to generating the average value of the original data. The main contribution from this new method is the capacity to reproduce rebound and side effects in the generated consumption profiles. Indeed, the application of a special electricity tariff over a time window may also affect consumption outside this time window. Another contribution is that the clustering approach segments consumers according to their daily consumption profile and elasticity to tariff changes. These two results combined are very relevant for an ex-ante testing of future DR policies by system operators, retailers and energy regulators.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Online Hierarchical Forecasting for Power Consumption Data
Authors:
Margaux Brégère,
Malo Huard
Abstract:
We study the forecasting of the power consumptions of a population of households and of subpopulations thereof. These subpopulations are built according to location, to exogenous information and/or to profiles we determined from historical households consumption time series. Thus, we aim to forecast the electricity consumption time series at several levels of households aggregation. These time ser…
▽ More
We study the forecasting of the power consumptions of a population of households and of subpopulations thereof. These subpopulations are built according to location, to exogenous information and/or to profiles we determined from historical households consumption time series. Thus, we aim to forecast the electricity consumption time series at several levels of households aggregation. These time series are linked through some summation constraints which induce a hierarchy. Our approach consists in three steps: feature generation, aggregation and projection. Firstly (feature generation step), we build, for each considering group for households, a benchmark forecast (called features), using random forests or generalized additive models. Secondly (aggregation step), aggregation algorithms, run in parallel, aggregate these forecasts and provide new predictions. Finally (projection step), we use the summation constraints induced by the time series underlying hierarchy to re-conciliate the forecasts by projecting them in a well-chosen linear subspace. We provide some theoretical guaranties on the average prediction error of this methodology, through the minimization of a quantity called regret. We also test our approach on households power consumption data collected in Great Britain by multiple energy providers in the Energy Demand Research Project context. We build and compare various population segmentations for the evaluation of our approach performance.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Target Tracking for Contextual Bandits: Application to Demand Side Management
Authors:
Margaux Brégère,
Pierre Gaillard,
Yannig Goude,
Gilles Stoltz
Abstract:
We propose a contextual-bandit approach for demand side management by offering price incentives. More precisely, a target mean consumption is set at each round and the mean consumption is modeled as a complex function of the distribution of prices sent and of some contextual variables such as the temperature, weather, and so on. The performance of our strategies is measured in quadratic losses thr…
▽ More
We propose a contextual-bandit approach for demand side management by offering price incentives. More precisely, a target mean consumption is set at each round and the mean consumption is modeled as a complex function of the distribution of prices sent and of some contextual variables such as the temperature, weather, and so on. The performance of our strategies is measured in quadratic losses through a regret criterion. We offer $T^{2/3}$ upper bounds on this regret (up to poly-logarithmic terms)---and even faster rates under stronger assumptions---for strategies inspired by standard strategies for contextual bandits (like LinUCB, see Li et al., 2010). Simulations on a real data set gathered by UK Power Networks, in which price incentives were offered, show that our strategies are effective and may indeed manage demand response by suitably picking the price levels.
△ Less
Submitted 13 May, 2019; v1 submitted 28 January, 2019;
originally announced January 2019.
-
A multivariate variable selection approach for analyzing LC-MS metabolomics data
Authors:
M. Perrot-Dockès,
C. Lévy-Leduc,
J. Chiquet,
L. Sansonnet,
M. Brégère,
M. -P. Étienne,
S. Robin,
G. Genta-Jouve
Abstract:
Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. In metabolomics, for instance, data resulting from Liquid Chromatography-Mass Spectrometry (LC-MS) -- a technique which gives access to a large coverage of metabolites -- exhibit such patterns. These data sets are typically used to find…
▽ More
Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. In metabolomics, for instance, data resulting from Liquid Chromatography-Mass Spectrometry (LC-MS) -- a technique which gives access to a large coverage of metabolites -- exhibit such patterns. These data sets are typically used to find the metabolites characterizing a phenotype of interest associated with the samples. However, applying some statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure in the multivariate linear model that accounts for the dependence structure of the multiple outputs which may lead in the LC-MS framework to the selection of more relevant metabolites. We propose a novel Lasso-based approach in the multivariate framework of the general linear model taking into account the dependence structure by using various modelings of the covariance matrix of the residuals. Our numerical experiments show that including the estimation of the covariance matrix of the residuals in the Lasso criterion dramatically improves the variable selection performance. Our approach is also successfully applied to a LC-MS data set made of African copals samples for which it is able to provide a small list of metabolites without altering the phenotype discrimination. Our methodology is implemented in the R package MultiVarSel which is available from the CRAN (Comprehensive R Archive Network).
△ Less
Submitted 31 March, 2017;
originally announced April 2017.