-
OPFData: Large-scale datasets for AC optimal power flow with topological perturbations
Authors:
Sean Lovett,
Miha Zgubic,
Sofia Liguori,
Sephora Madjiheurem,
Hamish Tomlinson,
Sophie Elster,
Chris Apps,
Sims Witherspoon,
Luis Piloto
Abstract:
Solving the AC optimal power flow problem (AC-OPF) is critical to the efficient and safe planning and operation of power grids. Small efficiency improvements in this domain have the potential to lead to billions of dollars of cost savings, and significant reductions in emissions from fossil fuel generators. Recent work on data-driven solution methods for AC-OPF shows the potential for large speed…
▽ More
Solving the AC optimal power flow problem (AC-OPF) is critical to the efficient and safe planning and operation of power grids. Small efficiency improvements in this domain have the potential to lead to billions of dollars of cost savings, and significant reductions in emissions from fossil fuel generators. Recent work on data-driven solution methods for AC-OPF shows the potential for large speed improvements compared to traditional solvers; however, no large-scale open datasets for this problem exist. We present the largest readily-available collection of solved AC-OPF problems to date. This collection is orders of magnitude larger than existing readily-available datasets, allowing training of high-capacity data-driven models. Uniquely, it includes topological perturbations - a critical requirement for usage in realistic power grid operations. We hope this resource will spur the community to scale research to larger grid sizes with variable topology.
△ Less
Submitted 18 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
CANOS: A Fast and Scalable Neural AC-OPF Solver Robust To N-1 Perturbations
Authors:
Luis Piloto,
Sofia Liguori,
Sephora Madjiheurem,
Miha Zgubic,
Sean Lovett,
Hamish Tomlinson,
Sophie Elster,
Chris Apps,
Sims Witherspoon
Abstract:
Optimal Power Flow (OPF) refers to a wide range of related optimization problems with the goal of operating power systems efficiently and securely. In the simplest setting, OPF determines how much power to generate in order to minimize costs while meeting demand for power and satisfying physical and operational constraints. In even the simplest case, power grid operators use approximations of the…
▽ More
Optimal Power Flow (OPF) refers to a wide range of related optimization problems with the goal of operating power systems efficiently and securely. In the simplest setting, OPF determines how much power to generate in order to minimize costs while meeting demand for power and satisfying physical and operational constraints. In even the simplest case, power grid operators use approximations of the AC-OPF problem because solving the exact problem is prohibitively slow with state-of-the-art solvers. These approximations sacrifice accuracy and operational feasibility in favor of speed. This trade-off leads to costly "uplift payments" and increased carbon emissions, especially for large power grids. In the present work, we train a deep learning system (CANOS) to predict near-optimal solutions (within 1% of the true AC-OPF cost) without compromising speed (running in as little as 33--65 ms). Importantly, CANOS scales to realistic grid sizes with promising empirical results on grids containing as many as 10,000 buses. Finally, because CANOS is a Graph Neural Network, it is robust to changes in topology. We show that CANOS is accurate across N-1 topological perturbations of a base grid typically used in security-constrained analysis. This paves the way for more efficient optimization of more complex OPF problems which alter grid connectivity such as unit commitment, topology optimization and security-constrained OPF.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Expected Eligibility Traces
Authors:
Hado van Hasselt,
Sephora Madjiheurem,
Matteo Hessel,
David Silver,
André Barreto,
Diana Borsa
Abstract:
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that c…
▽ More
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that could also have led to the current state. In this work, we introduce expected eligibility traces. Expected traces allow, with a single update, to update states and actions that could have preceded the current state, even if they did not do so on this occasion. We discuss when expected traces provide benefits over classic (instantaneous) traces in temporal-difference learning, and show that sometimes substantial improvements can be attained. We provide a way to smoothly interpolate between instantaneous and expected traces by a mechanism similar to bootstrap**, which ensures that the resulting algorithm is a strict generalisation of TD($λ$). Finally, we discuss possible extensions and connections to related ideas, such as successor features.
△ Less
Submitted 8 February, 2021; v1 submitted 3 July, 2020;
originally announced July 2020.
-
State2vec: Off-Policy Successor Features Approximators
Authors:
Sephora Madjiheurem,
Laura Toni
Abstract:
A major challenge in reinforcement learning (RL) is the design of agents that are able to generalize across tasks that share common dynamics. A viable solution is meta-reinforcement learning, which identifies common structures among past tasks to be then generalized to new tasks (meta-test). In meta-training, the RL agent learns state representations that encode prior information from a set of tas…
▽ More
A major challenge in reinforcement learning (RL) is the design of agents that are able to generalize across tasks that share common dynamics. A viable solution is meta-reinforcement learning, which identifies common structures among past tasks to be then generalized to new tasks (meta-test). In meta-training, the RL agent learns state representations that encode prior information from a set of tasks, used to generalize the value function approximation. This has been proposed in the literature as successor representation approximators. While promising, these methods do not generalize well across optimal policies, leading to sampling-inefficiency during meta-test phases. In this paper, we propose state2vec, an efficient and low-complexity framework for learning successor features which (i) generalize across policies, (ii) ensure sample-efficiency during meta-test. We extend the well known node2vec framework to learn state embeddings that account for the discounted future state transitions in RL. The proposed off-policy state2vec captures the geometry of the underlying state space, making good basis functions for linear value function approximation.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Representation Learning on Graphs: A Reinforcement Learning Application
Authors:
Sephora Madjiheurem,
Laura Toni
Abstract:
In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved l…
▽ More
In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithm on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and the Variational Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space.
△ Less
Submitted 17 January, 2019; v1 submitted 16 January, 2019;
originally announced January 2019.
-
Qualitative Framing of Financial Incentives - A Case of Emotion Annotation
Authors:
Sephora Madjiheurem,
Valentina Sintsova,
Pearl Pu
Abstract:
Online labor platforms, such as the Amazon Mechanical Turk, provide an effective framework for eliciting responses to judgment tasks. Previous work has shown that workers respond best to financial incentives, especially to extra bonuses. However, most of the tested incentives involve describing the bonus conditions in formulas instead of plain English. We believe that different incentives given in…
▽ More
Online labor platforms, such as the Amazon Mechanical Turk, provide an effective framework for eliciting responses to judgment tasks. Previous work has shown that workers respond best to financial incentives, especially to extra bonuses. However, most of the tested incentives involve describing the bonus conditions in formulas instead of plain English. We believe that different incentives given in English (or in qualitative framing) will result in differences in workers' performance, especially when task difficulties vary. In this paper, we report the preliminary results of a crowdsourcing experiment comparing workers' performance using only qualitative framings of financial incentives. Our results demonstrate a significant increase in workers' performance using a specific well-formulated qualitative framing inspired by the Peer Truth Serum. This positive effect is observed only when the difficulty of the task is high, while when the task is easy there is no difference of which incentives to use.
△ Less
Submitted 1 September, 2016;
originally announced September 2016.