Search | arXiv e-print repository

Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Authors: Théo Vincent, Fabian Wahren, Jan Peters, Boris Belousov, Carlo D'Eramo

Abstract: Deep Reinforcement Learning (RL) is well known for being highly sensitive to hyperparameters, requiring practitioners substantial efforts to optimize them for the problem at hand. In recent years, the field of automated Reinforcement Learning (AutoRL) has grown in popularity by trying to address this issue. However, these approaches typically hinge on additional samples to select well-performing h… ▽ More Deep Reinforcement Learning (RL) is well known for being highly sensitive to hyperparameters, requiring practitioners substantial efforts to optimize them for the problem at hand. In recent years, the field of automated Reinforcement Learning (AutoRL) has grown in popularity by trying to address this issue. However, these approaches typically hinge on additional samples to select well-performing hyperparameters, hindering sample-efficiency and practicality in RL. Furthermore, most AutoRL methods are heavily based on already existing AutoML methods, which were originally developed neglecting the additional challenges inherent to RL due to its non-stationarities. In this work, we propose a new approach for AutoRL, called Adaptive $Q$-Network (AdaQN), that is tailored to RL to take into account the non-stationarity of the optimization procedure without requiring additional samples. AdaQN learns several $Q$-functions, each one trained with different hyperparameters, which are updated online using the $Q$-function with the smallest approximation error as a shared target. Our selection scheme simultaneously handles different hyperparameters while co** with the non-stationarity induced by the RL optimization procedure and being orthogonal to any critic-based RL algorithm. We demonstrate that AdaQN is theoretically sound and empirically validate it in MuJoCo control problems, showing benefits in sample-efficiency, overall performance, training stability, and robustness to stochasticity. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2403.02512 [pdf, other]

Hybrid quantum programming with PennyLane Lightning on HPC platforms

Authors: Ali Asadi, Amintor Dusko, Chae-Yeun Park, Vincent Michaud-Rioux, Isidor Schoch, Shuli Shu, Trevor Vincent, Lee James O'Riordan

Abstract: We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the p… ▽ More We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of Lightning with backends supporting CPUs, as well as NVidia and AMD GPUs, and compare the results to other commonly used high-performance simulator packages, demonstrating where Lightning's implementations give performance leads. We show improved CPU performance by employing explicit SIMD intrinsics and multi-threading, batched task-based execution across multiple GPUs, and distributed forward and gradient-based quantum circuit executions across multiple nodes. Our data shows we can comfortably simulate a variety of circuits, giving examples with up to 30 qubits on a single device or node, and up to 41 qubits using multiple nodes. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: For all data and workloads, see https://github.com/PennyLaneAI/lightning-on-hpc

arXiv:2403.02107 [pdf, other]

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Authors: Théo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D'Eramo

Abstract: The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the appl… ▽ More The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the application of an empirical approximation of the Bellman operator and a subsequent projection step onto a considered function space. It has been observed that this scheme can be potentially generalized to carry out multiple iterations of the Bellman operator at once, benefiting the underlying learning algorithm. However, till now, it has been challenging to effectively implement this idea, especially in high-dimensional problems. In this paper, we introduce iterated $Q$-Network (iQN), a novel principled approach that enables multiple consecutive Bellman updates by learning a tailored sequence of action-value functions where each serves as the target for the next. We show that iQN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods. We empirically demonstrate the advantages of iQN in Atari $2600$ games and MuJoCo continuous control problems. △ Less

Submitted 25 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2312.12869 [pdf, other]

Parameterized Projected Bellman Operator

Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

Abstract: Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transi… ▽ More Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems. △ Less

Submitted 6 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: Proceedings of the National Conference on Artificial Intelligence (AAAI-24)

arXiv:2305.09492 [pdf, other]

doi 10.1038/s41597-023-02628-8

Solar Active Region Magnetogram Image Dataset for Studies of Space Weather

Authors: Laura E. Boucheron, Ty Vincent, Jeremy A. Grajeda, Ellery Wuest

Abstract: In this dataset we provide a comprehensive collection of magnetograms (images quantifying the strength of the magnetic field) from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions (regions of large magnetic flux,… ▽ More In this dataset we provide a comprehensive collection of magnetograms (images quantifying the strength of the magnetic field) from the National Aeronautics and Space Administration's (NASA's) Solar Dynamics Observatory (SDO). The dataset incorporates data from three sources and provides SDO Helioseismic and Magnetic Imager (HMI) magnetograms of solar active regions (regions of large magnetic flux, generally the source of eruptive events) as well as labels of corresponding flaring activity. This dataset will be useful for image analysis or solar physics research related to magnetic structure, its evolution over time, and its relation to solar flares. The dataset will be of interest to those researchers investigating automated solar flare prediction methods, including supervised and unsupervised machine learning (classical and deep), binary and multi-class classification, and regression. This dataset is a minimally processed, user configurable dataset of consistently sized images of solar active regions that can serve as a benchmark dataset for solar flare prediction research. △ Less

Submitted 12 February, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2206.01863 [pdf, other]

Recursive Deformable Image Registration Network with Mutual Attention

Authors: Jian-Qing Zheng, Ziyang Wang, Baoru Huang, Ngee Han Lim, Tonia Vincent, Bartlomiej W. Papiez

Abstract: Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion do… ▽ More Deformable image registration, estimating the spatial transformation between different images, is an important task in medical imaging. Many previous studies have used learning-based methods for multi-stage registration to perform 3D image registration to improve performance. The performance of the multi-stage approach, however, is limited by the size of the receptive field where complex motion does not occur at a single spatial scale. We propose a new registration network combining recursive network architecture and mutual attention mechanism to overcome these limitations. Compared with the state-of-the-art deep learning methods, our network based on the recursive structure achieves the highest accuracy in lung Computed Tomography (CT) data set (Dice score of 92\% and average surface distance of 3.8mm for lungs) and one of the most accurate results in abdominal CT data set with 9 organs of various sizes (Dice score of 55\% and average surface distance of 7.8mm). We also showed that adding 3 recursive networks is sufficient to achieve the state-of-the-art results without a significant increase in the inference time. △ Less

Submitted 30 June, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.04290

arXiv:2205.05990 [pdf, other]

Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022

Authors: Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

Abstract: This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the p… ▽ More This paper describes the SLT-CDT-UoS group's submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590 for constrained setting and .659 for unconstrained. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 8 pages, 10 figures, IWSLT22 camera-ready (system paper @ ACL-IWSLT Shared Task on Formality Control for Spoken Language Translation)

arXiv:2205.04747 [pdf, other]

Controlling Extra-Textual Attributes about Dialogue Participants -- A Case Study of English-to-Polish Neural Machine Translation

Authors: Sebastian T. Vincent, Loïc Barrault, Carolina Scarton

Abstract: Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-tex… ▽ More Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models. △ Less

Submitted 30 May, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: 9 pages, 9 figures, EAMT2022 camera-ready

Journal ref: Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, p. 121-130, Ghent, Belgium, June 2022

arXiv:2102.10979 [pdf, other]

Towards Personalised and Document-level Machine Translation of Dialogue

Authors: Sebastian T. Vincent

Abstract: State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. Because of that, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the transla… ▽ More State-of-the-art (SOTA) neural machine translation (NMT) systems translate texts at sentence level, ignoring context: intra-textual information, like the previous sentence, and extra-textual information, like the gender of the speaker. Because of that, some sentences are translated incorrectly. Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this information into the translation process. Both fields are relatively new and previous work within them is limited. Moreover, there are no readily available robust evaluation metrics for them, which makes it difficult to develop better systems, as well as track global progress and compare different methods. This thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue extracted from TV subtitles in five languages: English, Brazilian Portuguese, German, French and Polish. Three main challenges are addressed: (1) incorporating extra-textual information directly into NMT systems; (2) improving the machine translation of cohesion devices; (3) reliable evaluation for PersNMT and DocNMT. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: Thesis Proposal, 6 pages, 7 figures, accepted to the EACL2021 Student Workshop

arXiv:1811.04968 [pdf, other]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

Authors: Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, Shahnawaz Ahmed, Vishnu Ajith, M. Sohaib Alam, Guillermo Alonso-Linaje, B. AkashNarayanan, Ali Asadi, Juan Miguel Arrazola, Utkarsh Azad, Sam Banning, Carsten Blank, Thomas R Bromley, Benjamin A. Cordier, Jack Ceroni, Alain Delgado, Olivia Di Matteo, Amintor Dusko, Tanya Garg, Diego Guala, Anthony Hayes, Ryan Hill, Aroosa Ijaz , et al. (43 additional authors not shown)

Abstract: PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpro… ▽ More PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpropagation. PennyLane thus extends the automatic differentiation algorithms common in optimization and machine learning to include quantum and hybrid computations. A plugin system makes the framework compatible with any gate-based quantum simulator or hardware. We provide plugins for hardware providers including the Xanadu Cloud, Amazon Braket, and IBM Quantum, allowing PennyLane optimizations to be run on publicly accessible quantum devices. On the classical front, PennyLane interfaces with accelerated machine learning libraries such as TensorFlow, PyTorch, JAX, and Autograd. PennyLane can be used for the optimization of variational quantum eigensolvers, quantum approximate optimization, quantum machine learning models, and many other applications. △ Less

Submitted 29 July, 2022; v1 submitted 12 November, 2018; originally announced November 2018.

Comments: Code available at https://github.com/XanaduAI/pennylane/ . Significant contributions to the code (new features, new plugins, etc.) will be recognized by the opportunity to be a co-author on this paper

arXiv:1806.00852 [pdf, other]

On the Importance of Attention in Meta-Learning for Few-Shot Text Classification

Authors: Xiang Jiang, Mohammad Havaei, Gabriel Chartrand, Hassan Chouaib, Thomas Vincent, Andrew Jesson, Nicolas Chapados, Stan Matwin

Abstract: Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we… ▽ More Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The essential difference between MAML and ATAML is in the separation of task-agnostic representation learning and task-specific attentive adaptation. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets. △ Less

Submitted 3 June, 2018; originally announced June 2018.

Comments: 13 pages, 4 figures, submitted to NIPS

arXiv:1609.07448 [pdf, other]

Effect of Bonus Payments in Cost Sharing Mechanism Design for Renewable Energy Aggregation

Authors: Farshad Harirchi, Tyrone Vincent, Dejun Yang

Abstract: The participation of renewable energy sources in energy markets is challenging, mainly because of the uncertainty associated with the renewables. Aggregation of renewable energy suppliers is shown to be very effective in decreasing this uncertainty. In the present paper, we propose a cost sharing mechanism that entices the suppliers of wind, solar and other renewable resources to form or join an a… ▽ More The participation of renewable energy sources in energy markets is challenging, mainly because of the uncertainty associated with the renewables. Aggregation of renewable energy suppliers is shown to be very effective in decreasing this uncertainty. In the present paper, we propose a cost sharing mechanism that entices the suppliers of wind, solar and other renewable resources to form or join an aggregate. In particular, we consider the effect of a bonus for surplus in supply, which is neglected in previous work. We introduce a specific proportional cost sharing mechanism, which satisfies the desired properties of such mechanisms that are introduced in the literature, e.g., budget balancedness, ex-post individual rationality and fairness. In addition, we show that the proposed mechanism results in a stable market outcome. Finally, the results of the paper are illustrated by numerical examples. △ Less

Submitted 23 September, 2016; originally announced September 2016.

arXiv:1609.00098 [pdf, other]

doi 10.1016/j.jcp.2016.12.059

SpECTRE: A Task-based Discontinuous Galerkin Code for Relativistic Astrophysics

Authors: Lawrence E. Kidder, Scott E. Field, Francois Foucart, Erik Schnetter, Saul A. Teukolsky, Andy Bohn, Nils Deppe, Peter Diener, François Hébert, Jonas Lippuner, Jonah Miller, Christian D. Ott, Mark A. Scheel, Trevor Vincent

Abstract: We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resoluti… ▽ More We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resolution shock capturing methods in regions where (relativistic) shocks are found, while exploiting high-order accuracy in smooth regions. A task-based parallelism model allows efficient use of the largest supercomputers for problems with a heterogeneous workload over disparate spatial and temporal scales. We argue that the locality and algorithmic structure of discontinuous Galerkin methods will exhibit good scalability within a task-based parallelism framework. We demonstrate the code on a wide variety of challenging benchmark problems in (non)-relativistic (magneto)-hydrodynamics. We demonstrate the code's scalability including its strong scaling on the NCSA Blue Waters supercomputer up to the machine's full capacity of 22,380 nodes using 671,400 threads. △ Less

Submitted 21 July, 2017; v1 submitted 31 August, 2016; originally announced September 2016.

Comments: 41 pages, 13 figures, and 7 tables. Ancillary data contains simulation input files

Journal ref: Journal of Computational Physics, Volume 335, 2017, Pages 84-114

arXiv:1603.04422 [pdf, ps, other]

Computing the Approximate Convex Hull in High Dimensions

Authors: Hossein Sartipizadeh, Tyrone L. Vincent

Abstract: In this paper, an effective method with time complexity of $\mathcal{O}(K^{3/2}N^2\log \frac{K}{ε_0})$ is introduced to find an approximation of the convex hull for $N$ points in dimension $n$, where $K$ is close to the number of vertices of the approximation. Since the time complexity is independent of dimension, this method is highly suitable for the data in high dimensions. Utilizing a greedy a… ▽ More In this paper, an effective method with time complexity of $\mathcal{O}(K^{3/2}N^2\log \frac{K}{ε_0})$ is introduced to find an approximation of the convex hull for $N$ points in dimension $n$, where $K$ is close to the number of vertices of the approximation. Since the time complexity is independent of dimension, this method is highly suitable for the data in high dimensions. Utilizing a greedy approach, the proposed method attempts to find the best approximate convex hull for a given number of vertices. The approximate convex hull can be a helpful substitute for the exact convex hull for on-line processes and applications that have a favorable trade off between accuracy and parsimony. △ Less

Submitted 8 March, 2016; originally announced March 2016.

Comments: 5 pages, 1 figure, The more detailed version will be submitted

MSC Class: 52-xx

arXiv:1112.1968 [pdf, other]

doi 10.1109/TSP.2012.2222384

Concentration of Measure Inequalities for Toeplitz Matrices with Applications

Authors: Borhan M. Sanandaji, Tyrone L. Vincent, Michael B. Wakin

Abstract: We derive Concentration of Measure (CoM) inequalities for randomized Toeplitz matrices. These inequalities show that the norm of a high-dimensional signal mapped by a Toeplitz matrix to a low-dimensional space concentrates around its mean with a tail probability bound that decays exponentially in the dimension of the range space divided by a quantity which is a function of the signal. For the clas… ▽ More We derive Concentration of Measure (CoM) inequalities for randomized Toeplitz matrices. These inequalities show that the norm of a high-dimensional signal mapped by a Toeplitz matrix to a low-dimensional space concentrates around its mean with a tail probability bound that decays exponentially in the dimension of the range space divided by a quantity which is a function of the signal. For the class of sparse signals, the introduced quantity is bounded by the sparsity level of the signal. However, we observe that this bound is highly pessimistic for most sparse signals and we show that if a random distribution is imposed on the non-zero entries of the signal, the typical value of the quantity is bounded by a term that scales logarithmically in the ambient dimension. As an application of the CoM inequalities, we consider Compressive Binary Detection (CBD). △ Less

Submitted 12 July, 2012; v1 submitted 8 December, 2011; originally announced December 2011.

Comments: Initial Submission to the IEEE Transactions on Signal Processing on December 1, 2011. Revised and Resubmitted on July 12, 2012

Showing 1–15 of 15 results for author: Vincent, T