Search | arXiv e-print repository

Dataset Condensation with Latent Quantile Matching

Authors: Wei Wei, Tom De Schepper, Kevin Mets

Abstract: Dataset condensation (DC) methods aim to learn a smaller synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However two distributions with the same mean can still be vastly dif… ▽ More Dataset condensation (DC) methods aim to learn a smaller synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized dataset by matching the mean of the latent embeddings between the synthetic and the real dataset. However two distributions with the same mean can still be vastly different. In this work we demonstrate the shortcomings of using Maximum Mean Discrepancy to match latent distributions i.e. the weak matching power and lack of outlier regularization. To alleviate these shortcomings we propose our new method: Latent Quantile Matching (LQM) which matches the quantiles of the latent embeddings to minimize the goodness of fit test statistic between two distributions. Empirical experiments on both image and graph-structured datasets show that LQM matches or outperforms previous state of the art in distribution matching based DC. Moreover we show that LQM improves the performance in continual graph learning (CGL) setting where memory efficiency and privacy can be important. Our work sheds light on the application of DM based DC for CGL. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by CVPR Workshop 2024: 1st Workshop on Dataset Distillation for Computer Vision

arXiv:2401.18054 [pdf, other]

doi 10.5220/0012394400003660

Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition

Authors: Wei Wei, Tom De Schepper, Kevin Mets

Abstract: Continual learning (CL) is the research field that aims to build machine learning models that can accumulate knowledge continuously over different tasks without retraining from scratch. Previous studies have shown that pre-training graph neural networks (GNN) may lead to negative transfer (Hu et al., 2020) after fine-tuning, a setting which is closely related to CL. Thus, we focus on studying GNN… ▽ More Continual learning (CL) is the research field that aims to build machine learning models that can accumulate knowledge continuously over different tasks without retraining from scratch. Previous studies have shown that pre-training graph neural networks (GNN) may lead to negative transfer (Hu et al., 2020) after fine-tuning, a setting which is closely related to CL. Thus, we focus on studying GNN in the continual graph learning (CGL) setting. We propose the first continual graph learning benchmark for spatio-temporal graphs and use it to benchmark well-known CGL methods in this novel setting. The benchmark is based on the N-UCLA and NTU-RGB+D datasets for skeleton-based action recognition. Beyond benchmarking for standard performance metrics, we study the class and task-order sensitivity of CGL methods, i.e., the impact of learning order on each class/task's performance, and the architectural sensitivity of CGL methods with backbone GNN at various widths and depths. We reveal that task-order robust methods can still be class-order sensitive and observe results that contradict previous empirical observations on architectural sensitivity in CL. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: This work is accepted at VISAPP 2024 as a short paper

arXiv:2311.09878 [pdf, other]

doi 10.1109/IECON49645.2022.9968678

Safety Aware Autonomous Path Planning Using Model Predictive Reinforcement Learning for Inland Waterways

Authors: Astrid Vanneste, Simon Vanneste, Olivier Vasseur, Robin Janssens, Mattias Billast, Ali Anwar, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx

Abstract: In recent years, interest in autonomous ship** in urban waterways has increased significantly due to the trend of kee** cars and trucks out of city centers. Classical approaches such as Frenet frame based planning and potential field navigation often require tuning of many configuration parameters and sometimes even require a different configuration depending on the situation. In this paper, w… ▽ More In recent years, interest in autonomous ship** in urban waterways has increased significantly due to the trend of kee** cars and trucks out of city centers. Classical approaches such as Frenet frame based planning and potential field navigation often require tuning of many configuration parameters and sometimes even require a different configuration depending on the situation. In this paper, we propose a novel path planning approach based on reinforcement learning called Model Predictive Reinforcement Learning (MPRL). MPRL calculates a series of waypoints for the vessel to follow. The environment is represented as an occupancy grid map, allowing us to deal with any shape of waterway and any number and shape of obstacles. We demonstrate our approach on two scenarios and compare the resulting path with path planning using a Frenet frame and path planning based on a proximal policy optimization (PPO) agent. Our results show that MPRL outperforms both baselines in both test scenarios. The PPO based approach was not able to reach the goal in either scenario while the Frenet frame approach failed in the scenario consisting of a corner with obstacles. MPRL was able to safely (collision free) navigate to the goal in both of the test scenarios. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2308.04938 [pdf, other]

An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement Learning

Authors: Astrid Vanneste, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx

Abstract: Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to redu… ▽ More Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2204.05669

arXiv:2308.04844 [pdf, other]

Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement Learning

Authors: Astrid Vanneste, Thomas Somers, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx

Abstract: Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in… ▽ More Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Paper accepted to the BNAIC/BeNeLearn 2022 conference

arXiv:2212.11030 [pdf, other]

doi 10.5220/0010838400003124

Deep set conditioned latent representations for action recognition

Authors: Akash Singh, Tom De Schepper, Kevin Mets, Peter Hellinckx, Jose Oramas, Steven Latre

Abstract: In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognisin… ▽ More In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49% mAP in atomic action recognition and 17.57% mAP in composite action recognition, over a I3D-NL baseline, on the CATER dataset. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: Conference VISAPP 2022, 11 pages,5 figures, 2 Tables, 6 plots

Journal ref: In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, ISBN 978-989-758-555-5; ISSN 2184-4321, year 2022, pages 456-466

arXiv:2211.08467 [pdf, other]

Structured Exploration Through Instruction Enhancement for Object Navigation

Authors: Matthias Hutsebaut-Buysse, Kevin Mets, Tom De Schepper, Steven Latré

Abstract: Finding an object of a specific class in an unseen environment remains an unsolved navigation problem. Hence, we propose a hierarchical learning-based method for object navigation. The top-level is capable of high-level planning, and building a memory on a floorplan-level (e.g., which room makes the most sense for the agent to visit next, where has the agent already been?). While the lower-level i… ▽ More Finding an object of a specific class in an unseen environment remains an unsolved navigation problem. Hence, we propose a hierarchical learning-based method for object navigation. The top-level is capable of high-level planning, and building a memory on a floorplan-level (e.g., which room makes the most sense for the agent to visit next, where has the agent already been?). While the lower-level is tasked with efficiently navigating between rooms and looking for objects in them. Instructions can be provided to the agent using a simple synthetic language. The top-level intelligently enhances the instructions in order to make the overall task more tractable. Language grounding, map** instructions to visual observations, is performed by utilizing an additional separate supervised trained goal assessment module. We demonstrate the effectiveness of our method on a dynamic configurable domestic environment. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: Paper accepted to the BNAIC/BeNeLearn 2022 conference

arXiv:2204.05669 [pdf, other]

An Analysis of Discretization Methods for Communication Learning with Multi-Agent Reinforcement Learning

Authors: Astrid Vanneste, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Steven Latré, Peter Hellinckx

Abstract: Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to redu… ▽ More Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as two methods that have not been used for communication learning before. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. Our results show that none of the methods is best in all environments. The best choice in discretization method greatly depends on the environment. However, the discretize regularize unit (DRU), straight through DRU and the straight through gumbel softmax show the most consistent results across all the tested environments. Therefore, these methods prove to be the best choice for general use while the straight through estimator and the gumbel softmax may provide better results in specific environments but fail completely in others. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: Accepted at Adaptive and Learning Agents Workshop (ALA 2022) https://ala2022.github.io/

arXiv:2110.15779 [pdf, other]

doi 10.1007/978-3-030-89899-1_21

Learning to Communicate with Reinforcement Learning for an Adaptive Traffic Control System

Authors: Simon Vanneste, Gauthier de Borrekens, Stig Bosmans, Astrid Vanneste, Kevin Mets, Siegfried Mercelis, Steven Latré, Peter Hellinckx

Abstract: Recent work in multi-agent reinforcement learning has investigated inter agent communication which is learned simultaneously with the action policy in order to improve the team reward. In this paper, we investigate independent Q-learning (IQL) without communication and differentiable inter-agent learning (DIAL) with learned communication on an adaptive traffic control system (ATCS). In real world… ▽ More Recent work in multi-agent reinforcement learning has investigated inter agent communication which is learned simultaneously with the action policy in order to improve the team reward. In this paper, we investigate independent Q-learning (IQL) without communication and differentiable inter-agent learning (DIAL) with learned communication on an adaptive traffic control system (ATCS). In real world ATCS, it is impossible to present the full state of the environment to every agent so in our simulation, the individual agents will only have a limited observation of the full state of the environment. The ATCS will be simulated using the Simulation of Urban MObility (SUMO) traffic simulator in which two connected intersections are simulated. Every intersection is controlled by an agent which has the ability to change the direction of the traffic flow. Our results show that a DIAL agent outperforms an independent Q-learner on both training time and on maximum achieved reward as it is able to share relevant information with the other agents. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.15762 [pdf, other]

doi 10.1007/978-3-030-89899-1_20

Mixed Cooperative-Competitive Communication Using Multi-Agent Reinforcement Learning

Authors: Astrid Vanneste, Wesley Van Wijnsberghe, Simon Vanneste, Kevin Mets, Siegfried Mercelis, Steven Latré, Peter Hellinckx

Abstract: By using communication between multiple agents in multi-agent environments, one can reduce the effects of partial observability by combining one agent's observation with that of others in the same dynamic environment. While a lot of successful research has been done towards communication learning in cooperative settings, communication learning in mixed cooperative-competitive settings is also impo… ▽ More By using communication between multiple agents in multi-agent environments, one can reduce the effects of partial observability by combining one agent's observation with that of others in the same dynamic environment. While a lot of successful research has been done towards communication learning in cooperative settings, communication learning in mixed cooperative-competitive settings is also important and brings its own complexities such as the opposing team overhearing the communication. In this paper, we apply differentiable inter-agent learning (DIAL), designed for cooperative settings, to a mixed cooperative-competitive setting. We look at the difference in performance between communication that is private for a team and communication that can be overheard by the other team. Our research shows that communicating agents are able to achieve similar performance to fully observable agents after a given training period in our chosen environment. Overall, we find that sharing communication across teams results in decreased performance for the communicating team in comparison to results achieved with private communication. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2009.08880 [pdf, other]

HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory

Authors: Jakob Struye, Kevin Mets, Steven Latré

Abstract: Building Reinforcement Learning (RL) algorithms which are able to adapt to continuously evolving tasks is an open research challenge. One technology that is known to inherently handle such non-stationary input patterns well is Hierarchical Temporal Memory (HTM), a general and biologically plausible computational model for the human neocortex. As the RL paradigm is inspired by human learning, HTM i… ▽ More Building Reinforcement Learning (RL) algorithms which are able to adapt to continuously evolving tasks is an open research challenge. One technology that is known to inherently handle such non-stationary input patterns well is Hierarchical Temporal Memory (HTM), a general and biologically plausible computational model for the human neocortex. As the RL paradigm is inspired by human learning, HTM is a natural framework for an RL algorithm supporting non-stationary environments. In this paper, we present HTMRL, the first strictly HTM-based RL algorithm. We empirically and statistically show that HTMRL scales to many states and actions, and demonstrate that HTM's ability for adapting to changing patterns extends to RL. Specifically, HTMRL performs well on a 10-armed bandit after 750 steps, but only needs a third of that to adapt to the bandit suddenly shuffling its arms. HTMRL is the first iteration of a novel RL approach, with the potential of extending to a capable algorithm for Meta-RL. △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2007.05196 [pdf, other]

Pre-trained Word Embeddings for Goal-conditional Transfer Learning in Reinforcement Learning

Authors: Matthias Hutsebaut-Buysse, Kevin Mets, Steven Latré

Abstract: Reinforcement learning (RL) algorithms typically start tabula rasa, without any prior knowledge of the environment, and without any prior skills. This however often leads to low sample efficiency, requiring a large amount of interaction with the environment. This is especially true in a lifelong learning setting, in which the agent needs to continually extend its capabilities. In this paper, we ex… ▽ More Reinforcement learning (RL) algorithms typically start tabula rasa, without any prior knowledge of the environment, and without any prior skills. This however often leads to low sample efficiency, requiring a large amount of interaction with the environment. This is especially true in a lifelong learning setting, in which the agent needs to continually extend its capabilities. In this paper, we examine how a pre-trained task-independent language model can make a goal-conditional RL agent more sample efficient. We do this by facilitating transfer learning between different related tasks. We experimentally demonstrate our approach on a set of object navigation tasks. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Comments: Paper accepted to the ICML 2020 Language in Reinforcement Learning (LaReL) Workshop

arXiv:2006.07200 [pdf, other]

Learning to Communicate Using Counterfactual Reasoning

Authors: Simon Vanneste, Astrid Vanneste, Kevin Mets, Tom De Schepper, Ali Anwar, Siegfried Mercelis, Steven Latré, Peter Hellinckx

Abstract: Learning to communicate in order to share state information is an active problem in the area of multi-agent reinforcement learning (MARL). The credit assignment problem, the non-stationarity of the communication environment and the creation of influenceable agents are major challenges within this research field which need to be overcome in order to learn a valid communication protocol. This paper… ▽ More Learning to communicate in order to share state information is an active problem in the area of multi-agent reinforcement learning (MARL). The credit assignment problem, the non-stationarity of the communication environment and the creation of influenceable agents are major challenges within this research field which need to be overcome in order to learn a valid communication protocol. This paper introduces the novel multi-agent counterfactual communication learning (MACC) method which adapts counterfactual reasoning in order to overcome the credit assignment problem for communicating agents. Secondly, the non-stationarity of the communication environment while learning the communication Q-function is overcome by creating the communication Q-function using the action policy of the other agents and the Q-function of the action environment. Additionally, a social loss function is introduced in order to create influenceable agents which is required to learn a valid communication protocol. Our experiments show that MACC is able to outperform the state-of-the-art baselines in four different scenarios in the Particle environment. △ Less

Submitted 26 April, 2022; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Accepted at Adaptive and Learning Agents Workshop (ALA 2022) https://ala2022.github.io/

arXiv:1910.04040 [pdf, other]

Fast Task-Adaptation for Tasks Labeled Using Natural Language in Reinforcement Learning

Authors: Matthias Hutsebaut-Buysse, Kevin Mets, Steven Latré

Abstract: Over its lifetime, a reinforcement learning agent is often tasked with different tasks. How to efficiently adapt a previously learned control policy from one task to another, remains an open research question. In this paper, we investigate how instructions formulated in natural language can enable faster and more effective task adaptation. This can serve as the basis for develo** language instru… ▽ More Over its lifetime, a reinforcement learning agent is often tasked with different tasks. How to efficiently adapt a previously learned control policy from one task to another, remains an open research question. In this paper, we investigate how instructions formulated in natural language can enable faster and more effective task adaptation. This can serve as the basis for develo** language instructed skills, which can be used in a lifelong learning setting. Our method is capable of assessing, given a set of developed base control policies, which policy will adapt best to a new unseen task. △ Less

Submitted 9 October, 2019; originally announced October 2019.

Showing 1–14 of 14 results for author: Mets, K