Search | arXiv e-print repository

Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

Authors: Dominik Baumann, Erfaun Noorani, James Price, Ole Peters, Colm Connaughton, Thomas B. Schön

Abstract: Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robu… ▽ More Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robustness lies in the focus on the expected value of the return as the sole ``correct'' optimization objective. The expected value is the average over the statistical ensemble of infinitely many trajectories. For non-ergodic returns, this average differs from the average over a single but infinitely long trajectory. Consequently, optimizing the expected value can lead to policies that yield exceptionally high returns with probability zero but almost surely result in catastrophic outcomes. This problem can be circumvented by transforming the time series of collected returns into one with ergodic increments. This transformation enables learning robust policies by optimizing the long-term return for individual agents rather than the average across infinitely many trajectories. We propose an algorithm for learning ergodicity transformations from data and demonstrate its effectiveness in an instructive, non-ergodic environment and on standard RL benchmarks. △ Less

Submitted 10 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09131 [pdf, other]

Machine learning-based prediction of Q-voter model in complex networks

Authors: Aruane M. Pineda, Paul Kent, Colm Connaughton, Francisco A. Rodrigues

Abstract: In this article, we consider machine learning algorithms to accurately predict two variables associated with the $Q$-voter model in complex networks, i.e., (i) the consensus time and (ii) the frequency of opinion changes. Leveraging nine topological measures of the underlying networks, we verify that the clustering coefficient (C) and information centrality (IC) emerge as the most important predic… ▽ More In this article, we consider machine learning algorithms to accurately predict two variables associated with the $Q$-voter model in complex networks, i.e., (i) the consensus time and (ii) the frequency of opinion changes. Leveraging nine topological measures of the underlying networks, we verify that the clustering coefficient (C) and information centrality (IC) emerge as the most important predictors for these outcomes. Notably, the machine learning algorithms demonstrate accuracy across three distinct initialization methods of the $Q$-voter model, including random selection and the involvement of high- and low-degree agents with positive opinions. By unraveling the intricate interplay between network structure and dynamics, this research sheds light on the underlying mechanisms responsible for polarization effects and other dynamic patterns in social systems. Adopting a holistic approach that comprehends the complexity of network systems, this study offers insights into the intricate dynamics associated with polarization effects and paves the way for investigating the structure and dynamics of complex systems through modern machine learning methods. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 32 pages, 10 figures

Journal ref: Journal of Statistical Mechanics: Theory and Experiment (JSTAT), 2023

arXiv:2308.07054 [pdf, other]

Distinguishing Risk Preferences using Repeated Gambles

Authors: James Price, Colm Connaughton

Abstract: Sequences of repeated gambles provide an experimental tool to characterize the risk preferences of humans or artificial decision-making agents. The difficulty of this inference depends on factors including the details of the gambles offered and the number of iterations of the game played. In this paper we explore in detail the practical challenges of inferring risk preferences from the observed ch… ▽ More Sequences of repeated gambles provide an experimental tool to characterize the risk preferences of humans or artificial decision-making agents. The difficulty of this inference depends on factors including the details of the gambles offered and the number of iterations of the game played. In this paper we explore in detail the practical challenges of inferring risk preferences from the observed choices of artificial agents who are presented with finite sequences of repeated gambles. We are motivated by the fact that the strategy to maximize long-run wealth for sequences of repeated additive gambles (where gains and losses are independent of current wealth) is different to the strategy for repeated multiplicative gambles (where gains and losses are proportional to current wealth.) Accurate measurement of risk preferences would be needed to tell whether an agent is employing the optimal strategy or not. To generalize the types of gambles our agents face we use the Yeo-Johnson transformation, a tool borrowed from feature engineering for time series analysis, to construct a family of gambles that interpolates smoothly between the additive and multiplicative cases. We then analyze the optimal strategy for this family, both analytically and numerically. We find that it becomes increasingly difficult to distinguish the risk preferences of agents as their wealth increases. This is because agents with different risk preferences eventually make the same decisions for sufficiently high wealth. We believe that these findings are informative for the effective design of experiments to measure risk preferences in humans. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2204.05059 [pdf, other]

doi 10.1016/j.chaos.2022.112306

Forecasting new diseases in low-data settings using transfer learning

Authors: Kirstin Roster, Colm Connaughton, Francisco A. Rodrigues

Abstract: Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiol… ▽ More Recent infectious disease outbreaks, such as the COVID-19 pandemic and the Zika epidemic in Brazil, have demonstrated both the importance and difficulty of accurately forecasting novel infectious diseases. When new diseases first emerge, we have little knowledge of the transmission process, the level and duration of immunity to reinfection, or other parameters required to build realistic epidemiological models. Time series forecasts and machine learning, while less reliant on assumptions about the disease, require large amounts of data that are also not available in early stages of an outbreak. In this study, we examine how knowledge of related diseases can help make predictions of new diseases in data-scarce environments using transfer learning. We implement both an empirical and a theoretical approach. Using empirical data from Brazil, we compare how well different machine learning models transfer knowledge between two different disease pairs: (i) dengue and Zika, and (ii) influenza and COVID-19. In the theoretical analysis, we generate data using different transmission and recovery rates with an SIR compartmental model, and then compare the effectiveness of different transfer learning methods. We find that transfer learning offers the potential to improve predictions, even beyond a model based on data from the target disease, though the appropriate source disease must be chosen carefully. While imperfect, these models offer an additional input for decision makers during pandemic response. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2010.08819 [pdf, other]

Assessment of Reward Functions in Reinforcement Learning for Multi-Modal Urban Traffic Control under Real-World limitations

Authors: Alvaro Cabrejas-Egea, Colm Connaughton

Abstract: Reinforcement Learning is proving a successful tool that can manage urban intersections with a fraction of the effort required to curate traditional traffic controllers. However, literature on the introduction and control of pedestrians to such intersections is scarce. Furthermore, it is unclear what traffic state variables should be used as reward to obtain the best agent performance. This paper… ▽ More Reinforcement Learning is proving a successful tool that can manage urban intersections with a fraction of the effort required to curate traditional traffic controllers. However, literature on the introduction and control of pedestrians to such intersections is scarce. Furthermore, it is unclear what traffic state variables should be used as reward to obtain the best agent performance. This paper robustly evaluates 30 different Reinforcement Learning reward functions for controlling intersections serving pedestrians and vehicles covering the main traffic state variables available via modern vision-based sensors. Some rewards proposed in previous literature solely for vehicular traffic are extended to pedestrians while new ones are introduced. We use a calibrated model in terms of demand, sensors, green times and other operational constraints of a real intersection in Greater Manchester, UK. The assessed rewards can be classified in 5 groups depending on the magnitudes used: queues, waiting time, delay, average speed and throughput in the junction. The performance of different agents, in terms of waiting time, is compared across different demand levels, from normal operation to saturation of traditional adaptive controllers. We find that those rewards maximising the speed of the network obtain the lowest waiting time for vehicles and pedestrians simultaneously, closely followed by queue minimisation, demonstrating better performance than other previously proposed methods. △ Less

Submitted 17 October, 2020; originally announced October 2020.

Comments: 8 pages, conference paper, 3 figures, 1 table

MSC Class: 93E03

arXiv:2008.11634 [pdf, other]

doi 10.1109/SMC42975.2020.9283498

Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations

Authors: Alvaro Cabrejas-Egea, Shaun Howell, Maksis Knutins, Colm Connaughton

Abstract: Adaptive traffic signal control is one key avenue for mitigating the growing consequences of traffic congestion. Incumbent solutions such as SCOOT and SCATS require regular and time-consuming calibration, can't optimise well for multiple road use modalities, and require the manual curation of many implementation plans. A recent alternative to these approaches are deep reinforcement learning algori… ▽ More Adaptive traffic signal control is one key avenue for mitigating the growing consequences of traffic congestion. Incumbent solutions such as SCOOT and SCATS require regular and time-consuming calibration, can't optimise well for multiple road use modalities, and require the manual curation of many implementation plans. A recent alternative to these approaches are deep reinforcement learning algorithms, in which an agent learns how to take the most appropriate action for a given state of the system. This is guided by neural networks approximating a reward function that provides feedback to the agent regarding the performance of the actions taken, making it sensitive to the specific reward function chosen. Several authors have surveyed the reward functions used in the literature, but attributing outcome differences to reward function choice across works is problematic as there are many uncontrolled differences, as well as different outcome metrics. This paper compares the performance of agents using different reward functions in a simulation of a junction in Greater Manchester, UK, across various demand profiles, subject to real world constraints: realistic sensor inputs, controllers, calibrated demand, intergreen times and stage sequencing. The reward metrics considered are based on the time spent stopped, lost time, change in lost time, average speed, queue length, junction throughput and variations of these magnitudes. The performance of these reward functions is compared in terms of total waiting time. We find that speed maximisation resulted in the lowest average waiting times across all demand levels, displaying significantly better performance than other rewards previously introduced in the literature. △ Less

Submitted 12 October, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Comments: Conference paper, 13 pages, 7 figures, 1 table

MSC Class: 93E03

arXiv:2006.13072 [pdf, other]

doi 10.1109/ITSC45102.2020.9294318

Wavelet Augmented Regression Profiling (WARP): improved long-term estimation of travel time series with recurrent congestion

Authors: Alvaro Cabrejas Egea, Colm Connaughton

Abstract: Reliable estimates of typical travel times allow road users to forward plan journeys to minimise travel time, potentially increasing overall system efficiency. On busy highways, however, congestion events can cause large, short-term spikes in travel time. These spikes make direct forecasting of travel time using standard time series models difficult on the timescales of hours to days that are rele… ▽ More Reliable estimates of typical travel times allow road users to forward plan journeys to minimise travel time, potentially increasing overall system efficiency. On busy highways, however, congestion events can cause large, short-term spikes in travel time. These spikes make direct forecasting of travel time using standard time series models difficult on the timescales of hours to days that are relevant to forward planning. The problem is that some such spikes are caused by unpredictable incidents and should be filtered out, whereas others are caused by recurrent peaks in demand and should be factored into estimates. Here we present the Wavelet Augmented Regression Profiling (WARP) method for long-term estimation of typical travel times. WARP linearly decomposes historical time series of travel times into two components: background and spikes. It then further separates the spikes into contributions from recurrent and residual congestion. This is achieved using a combination of wavelet transforms, spectral filtering and locally weighted regression. The background and recurrent congestion contributions are then used to estimate typical travel times with horizon of one week in an accurate and computationally inexpensive manner. We train and test WARP on the M6 and M11 motorways in the United Kingdom using 12 weeks of link level travel time data obtained from the UK's National Traffic Information Service (NTIS). In out-of-sample validation tests, WARP compares favourably to estimates produced by a simple segmentation method and to the estimates published by NTIS. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 12 pages, 10 figures. Conference paper accepted to 23rd IEEE International Conference on Intelligent Transportation Systems (ITSC2020)

arXiv:1910.00544 [pdf, other]

A machine learning approach to predicting dynamical observables from network structure

Authors: Francisco A. Rodrigues, Thomas Peron, Colm Connaughton, Jurgen Kurths, Yamir Moreno

Abstract: Estimating the outcome of a given dynamical process from structural features is a key unsolved challenge in network science. The goal is hindered by difficulties associated to nonlinearities, correlations and feedbacks between the structure and dynamics of complex systems. In this work, we develop an approach based on machine learning algorithms that is shown to provide an answer to the previous c… ▽ More Estimating the outcome of a given dynamical process from structural features is a key unsolved challenge in network science. The goal is hindered by difficulties associated to nonlinearities, correlations and feedbacks between the structure and dynamics of complex systems. In this work, we develop an approach based on machine learning algorithms that is shown to provide an answer to the previous challenge. Specifically, we show that it is possible to estimate the outbreak size of a disease starting from a single node as well as the degree of synchronicity of a system made up of Kuramoto oscillators. In doing so, we show which topological features of the network are key for this estimation, and provide a rank of the importance of network metrics with higher accuracy than previously done. Our approach is general and can be applied to any dynamical process running on top of complex networks. Likewise, our work constitutes an important step towards the application of machine learning methods to unravel dynamical patterns emerging in complex networked systems. △ Less

Submitted 1 October, 2019; originally announced October 2019.

Comments: 5 pages including 6 figures

arXiv:1903.05112 [pdf, other]

Empirical analysis of the variability in the flow-density relationship for smart motorways

Authors: Kieran Kalair, Colm Connaughton

Abstract: The fundamental diagram is an assumed functional relationship between traffic flow and traffic density. In practice, this relationship is noisy and exhibits significant statistical variability. On smart motorways, this variability is increased by variable speed limits that are not captured by the fundamental diagram. To study this variability, it is appropriate to consider the joint probability di… ▽ More The fundamental diagram is an assumed functional relationship between traffic flow and traffic density. In practice, this relationship is noisy and exhibits significant statistical variability. On smart motorways, this variability is increased by variable speed limits that are not captured by the fundamental diagram. To study this variability, it is appropriate to consider the joint probability distribution function (pdf) of density and flow. We perform an empirical study of the variability in the relationship between flow and density using 74 days of data from 64 sections of London's M25. The objectives are to determine how much of the variability in the flow-density relationship results from variable speed limits and to assess whether particular functional forms of the fundamental diagram are systematically preferred. Empirically, the joint pdf of flow and density is strongly bimodal, illustrating that traffic flows are often found in high-density or low-density regimes but rarely in between. We find that the high-density regime is strongly affected by variable speed limits whereas the low-density regime is not. The Daganzo-Newell (triangular) model of the fundamental diagram systematically fits best to the data. However, the optimal parameters vary with location. Clustering analysis of these parameters suggests three qualitatively different types of flow-density relationships applying to different sections of the M25. These clusters have natural interpretations in terms of the frequency and severity of flow breakdown. Accident rates also depend on cluster type suggesting possible links to other properties of traffic flows beyond the flow-density relationship. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: 11 pages, 10 figures

Showing 1–9 of 9 results for author: Connaughton, C