-
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning
Authors:
Théo Vincent,
Fabian Wahren,
Jan Peters,
Boris Belousov,
Carlo D'Eramo
Abstract:
Deep Reinforcement Learning (RL) is well known for being highly sensitive to hyperparameters, requiring practitioners substantial efforts to optimize them for the problem at hand. In recent years, the field of automated Reinforcement Learning (AutoRL) has grown in popularity by trying to address this issue. However, these approaches typically hinge on additional samples to select well-performing h…
▽ More
Deep Reinforcement Learning (RL) is well known for being highly sensitive to hyperparameters, requiring practitioners substantial efforts to optimize them for the problem at hand. In recent years, the field of automated Reinforcement Learning (AutoRL) has grown in popularity by trying to address this issue. However, these approaches typically hinge on additional samples to select well-performing hyperparameters, hindering sample-efficiency and practicality in RL. Furthermore, most AutoRL methods are heavily based on already existing AutoML methods, which were originally developed neglecting the additional challenges inherent to RL due to its non-stationarities. In this work, we propose a new approach for AutoRL, called Adaptive $Q$-Network (AdaQN), that is tailored to RL to take into account the non-stationarity of the optimization procedure without requiring additional samples. AdaQN learns several $Q$-functions, each one trained with different hyperparameters, which are updated online using the $Q$-function with the smallest approximation error as a shared target. Our selection scheme simultaneously handles different hyperparameters while co** with the non-stationarity induced by the RL optimization procedure and being orthogonal to any critic-based RL algorithm. We demonstrate that AdaQN is theoretically sound and empirically validate it in MuJoCo control problems, showing benefits in sample-efficiency, overall performance, training stability, and robustness to stochasticity.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Hybrid quantum programming with PennyLane Lightning on HPC platforms
Authors:
Ali Asadi,
Amintor Dusko,
Chae-Yeun Park,
Vincent Michaud-Rioux,
Isidor Schoch,
Shuli Shu,
Trevor Vincent,
Lee James O'Riordan
Abstract:
We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the p…
▽ More
We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of Lightning with backends supporting CPUs, as well as NVidia and AMD GPUs, and compare the results to other commonly used high-performance simulator packages, demonstrating where Lightning's implementations give performance leads. We show improved CPU performance by employing explicit SIMD intrinsics and multi-threading, batched task-based execution across multiple GPUs, and distributed forward and gradient-based quantum circuit executions across multiple nodes. Our data shows we can comfortably simulate a variety of circuits, giving examples with up to 30 qubits on a single device or node, and up to 41 qubits using multiple nodes.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Authors:
Théo Vincent,
Daniel Palenicek,
Boris Belousov,
Jan Peters,
Carlo D'Eramo
Abstract:
The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the appl…
▽ More
The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the application of an empirical approximation of the Bellman operator and a subsequent projection step onto a considered function space. It has been observed that this scheme can be potentially generalized to carry out multiple iterations of the Bellman operator at once, benefiting the underlying learning algorithm. However, till now, it has been challenging to effectively implement this idea, especially in high-dimensional problems. In this paper, we introduce iterated $Q$-Network (iQN), a novel principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions where each serves as the target for the next. We show that iQN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods. We empirically demonstrate the advantages of iQN in Atari $2600$ games and MuJoCo continuous control problems.
△ Less
Submitted 25 May, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Parameterized Projected Bellman Operator
Authors:
Théo Vincent,
Alberto Maria Metelli,
Boris Belousov,
Jan Peters,
Marcello Restelli,
Carlo D'Eramo
Abstract:
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transi…
▽ More
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.
△ Less
Submitted 6 March, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Solar Active Region Magnetogram Image Dataset for Studies of Space Weather
Authors:
Laura E. Boucheron,
Ty Vincent,
Jeremy A. Grajeda,
Ellery Wuest
Abstract:
In this dataset we provide a comprehensive collection of magnetograms (images quantifying the strength of the magnetic field) from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions (regions of large magnetic flux,…
▽ More
In this dataset we provide a comprehensive collection of magnetograms (images quantifying the strength of the magnetic field) from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions (regions of large magnetic flux, generally the source of eruptive events) as well as labels of corresponding flaring activity. This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares. The dataset will be of interest to those researchers investigating automated solar flare prediction methods, including supervised and unsupervised machine learning (classical and deep), binary and multi-class classification, and regression. This dataset is a minimally processed, user configurable dataset of consistently sized images of solar active regions that can serve as a benchmark dataset for solar flare prediction research.
△ Less
Submitted 12 February, 2024; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Recursive Deformable Image Registration Network with Mutual Attention
Authors:
Jian-Qing Zheng,
Ziyang Wang,
Baoru Huang,
Ngee Han Lim,
Tonia Vincent,
Bartlomiej W. Papiez
Abstract:
Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion do…
▽ More
Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the state-of-the-art deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time.
△ Less
Submitted 30 June, 2022; v1 submitted 3 June, 2022;
originally announced June 2022.
-
Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022
Authors:
Sebastian T. Vincent,
Loïc Barrault,
Carolina Scarton
Abstract:
This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the p…
▽ More
This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590 for constrained setting and .659 for unconstrained.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation
Authors:
Sebastian T. Vincent,
Loïc Barrault,
Carolina Scarton
Abstract:
Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-tex…
▽ More
Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models.
△ Less
Submitted 30 May, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Towards Personalised and Document-level Machine Translation of Dialogue
Authors:
Sebastian T. Vincent
Abstract:
State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. Because of that, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the transla…
▽ More
State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. Because of that, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the translation process. Both fields are relatively new and previous work within them is limited. Moreover, there are no readily available robust evaluation metrics for them, which makes it difficult to develop better systems, as well as track global progress and compare different methods. This thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue extracted from TV subtitles in five languages: English, Brazilian Portuguese, German, French and Polish. Three main challenges are addressed: (1) incorporating extra-textual information directly into NMT systems; (2) improving the machine translation of cohesion devices; (3) reliable evaluation for PersNMT and DocNMT.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
PennyLane: Automatic differentiation of hybrid quantum-classical computations
Authors:
Ville Bergholm,
Josh Izaac,
Maria Schuld,
Christian Gogolin,
Shahnawaz Ahmed,
Vishnu Ajith,
M. Sohaib Alam,
Guillermo Alonso-Linaje,
B. AkashNarayanan,
Ali Asadi,
Juan Miguel Arrazola,
Utkarsh Azad,
Sam Banning,
Carsten Blank,
Thomas R Bromley,
Benjamin A. Cordier,
Jack Ceroni,
Alain Delgado,
Olivia Di Matteo,
Amintor Dusko,
Tanya Garg,
Diego Guala,
Anthony Hayes,
Ryan Hill,
Aroosa Ijaz
, et al. (43 additional authors not shown)
Abstract:
PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpro…
▽ More
PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpropagation. PennyLane thus extends the automatic differentiation algorithms common in optimization and machine learning to include quantum and hybrid computations. A plugin system makes the framework compatible with any gate-based quantum simulator or hardware. We provide plugins for hardware providers including the Xanadu Cloud, Amazon Braket, and IBM Quantum, allowing PennyLane optimizations to be run on publicly accessible quantum devices. On the classical front, PennyLane interfaces with accelerated machine learning libraries such as TensorFlow, PyTorch, JAX, and Autograd. PennyLane can be used for the optimization of variational quantum eigensolvers, quantum approximate optimization, quantum machine learning models, and many other applications.
△ Less
Submitted 29 July, 2022; v1 submitted 12 November, 2018;
originally announced November 2018.
-
On the Importance of Attention in Meta-Learning for Few-Shot Text Classification
Authors:
Xiang Jiang,
Mohammad Havaei,
Gabriel Chartrand,
Hassan Chouaib,
Thomas Vincent,
Andrew Jesson,
Nicolas Chapados,
Stan Matwin
Abstract:
Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we…
▽ More
Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The essential difference between MAML and ATAML is in the separation of task-agnostic representation learning and task-specific attentive adaptation. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets.
△ Less
Submitted 3 June, 2018;
originally announced June 2018.
-
Effect of Bonus Payments in Cost Sharing Mechanism Design for Renewable Energy Aggregation
Authors:
Farshad Harirchi,
Tyrone Vincent,
Dejun Yang
Abstract:
The participation of renewable energy sources in energy markets is challenging, mainly because of the uncertainty associated with the renewables. Aggregation of renewable energy suppliers is shown to be very effective in decreasing this uncertainty. In the present paper, we propose a cost sharing mechanism that entices the suppliers of wind, solar and other renewable resources to form or join an a…
▽ More
The participation of renewable energy sources in energy markets is challenging, mainly because of the uncertainty associated with the renewables. Aggregation of renewable energy suppliers is shown to be very effective in decreasing this uncertainty. In the present paper, we propose a cost sharing mechanism that entices the suppliers of wind, solar and other renewable resources to form or join an aggregate. In particular, we consider the effect of a bonus for surplus in supply, which is neglected in previous work. We introduce a specific proportional cost sharing mechanism, which satisfies the desired properties of such mechanisms that are introduced in the literature, e.g., budget balancedness, ex-post individual rationality and fairness. In addition, we show that the proposed mechanism results in a stable market outcome. Finally, the results of the paper are illustrated by numerical examples.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
SpECTRE: A Task-based Discontinuous Galerkin Code for Relativistic Astrophysics
Authors:
Lawrence E. Kidder,
Scott E. Field,
Francois Foucart,
Erik Schnetter,
Saul A. Teukolsky,
Andy Bohn,
Nils Deppe,
Peter Diener,
François Hébert,
Jonas Lippuner,
Jonah Miller,
Christian D. Ott,
Mark A. Scheel,
Trevor Vincent
Abstract:
We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resoluti…
▽ More
We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resolution shock capturing methods in regions where (relativistic) shocks are found, while exploiting high-order accuracy in smooth regions. A task-based parallelism model allows efficient use of the largest supercomputers for problems with a heterogeneous workload over disparate spatial and temporal scales. We argue that the locality and algorithmic structure of discontinuous Galerkin methods will exhibit good scalability within a task-based parallelism framework. We demonstrate the code on a wide variety of challenging benchmark problems in (non)-relativistic (magneto)-hydrodynamics. We demonstrate the code's scalability including its strong scaling on the NCSA Blue Waters supercomputer up to the machine's full capacity of 22,380 nodes using 671,400 threads.
△ Less
Submitted 21 July, 2017; v1 submitted 31 August, 2016;
originally announced September 2016.
-
Computing the Approximate Convex Hull in High Dimensions
Authors:
Hossein Sartipizadeh,
Tyrone L. Vincent
Abstract:
In this paper, an effective method with time complexity of $\mathcal{O}(K^{3/2}N^2\log \frac{K}{ε_0})$ is introduced to find an approximation of the convex hull for $N$ points in dimension $n$, where $K$ is close to the number of vertices of the approximation. Since the time complexity is independent of dimension, this method is highly suitable for the data in high dimensions. Utilizing a greedy a…
▽ More
In this paper, an effective method with time complexity of $\mathcal{O}(K^{3/2}N^2\log \frac{K}{ε_0})$ is introduced to find an approximation of the convex hull for $N$ points in dimension $n$, where $K$ is close to the number of vertices of the approximation. Since the time complexity is independent of dimension, this method is highly suitable for the data in high dimensions. Utilizing a greedy approach, the proposed method attempts to find the best approximate convex hull for a given number of vertices. The approximate convex hull can be a helpful substitute for the exact convex hull for on-line processes and applications that have a favorable trade off between accuracy and parsimony.
△ Less
Submitted 8 March, 2016;
originally announced March 2016.
-
Concentration of Measure Inequalities for Toeplitz Matrices with Applications
Authors:
Borhan M. Sanandaji,
Tyrone L. Vincent,
Michael B. Wakin
Abstract:
We derive Concentration of Measure (CoM) inequalities for randomized Toeplitz matrices. These inequalities show that the norm of a high-dimensional signal mapped by a Toeplitz matrix to a low-dimensional space concentrates around its mean with a tail probability bound that decays exponentially in the dimension of the range space divided by a quantity which is a function of the signal. For the clas…
▽ More
We derive Concentration of Measure (CoM) inequalities for randomized Toeplitz matrices. These inequalities show that the norm of a high-dimensional signal mapped by a Toeplitz matrix to a low-dimensional space concentrates around its mean with a tail probability bound that decays exponentially in the dimension of the range space divided by a quantity which is a function of the signal. For the class of sparse signals, the introduced quantity is bounded by the sparsity level of the signal. However, we observe that this bound is highly pessimistic for most sparse signals and we show that if a random distribution is imposed on the non-zero entries of the signal, the typical value of the quantity is bounded by a term that scales logarithmically in the ambient dimension. As an application of the CoM inequalities, we consider Compressive Binary Detection (CBD).
△ Less
Submitted 12 July, 2012; v1 submitted 8 December, 2011;
originally announced December 2011.