-
EventSleep: Sleep Activity Recognition with Event Cameras
Authors:
Carlos Plou,
Nerea Gallego,
Alberto Sabater,
Eduardo Montijano,
Pablo Urcola,
Luis Montesano,
Ruben Martinez-Cantin,
Ana C. Murillo
Abstract:
Event cameras are a promising technology for activity recognition in dark environments due to their unique properties. However, real event camera datasets under low-lighting conditions are still scarce, which also limits the number of approaches to solve these kind of problems, hindering the potential of this technology in many applications. We present EventSleep, a new dataset and methodology to…
▽ More
Event cameras are a promising technology for activity recognition in dark environments due to their unique properties. However, real event camera datasets under low-lighting conditions are still scarce, which also limits the number of approaches to solve these kind of problems, hindering the potential of this technology in many applications. We present EventSleep, a new dataset and methodology to address this gap and study the suitability of event cameras for a very relevant medical application: sleep monitoring for sleep disorders analysis. The dataset contains synchronized event and infrared recordings emulating common movements that happen during the sleep, resulting in a new challenging and unique dataset for activity recognition in dark environments. Our novel pipeline is able to achieve high accuracy under these challenging conditions and incorporates a Bayesian approach (Laplace ensembles) to increase the robustness in the predictions, which is fundamental for medical applications. Our work is the first application of Bayesian neural networks for event cameras, the first use of Laplace ensembles in a realistic problem, and also demonstrates for the first time the potential of event cameras in a new application domain: to enhance current sleep evaluation procedures. Our activity recognition results highlight the potential of event cameras under dark conditions, and its capacity and robustness for sleep activity recognition, and open problems as the adaptation of event data pre-processing techniques to dark environments.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Body Schema Acquisition through Active Learning
Authors:
Ruben Martinez-Cantin,
Manuel Lopes,
Luis Montesano
Abstract:
We present an active learning algorithm for the problem of body schema learning, i.e. estimating a kinematic model of a serial robot. The learning process is done online using Recursive Least Squares (RLS) estimation, which outperforms gradient methods usually applied in the literature. In addiction, the method provides the required information to apply an active learning algorithm to find the opt…
▽ More
We present an active learning algorithm for the problem of body schema learning, i.e. estimating a kinematic model of a serial robot. The learning process is done online using Recursive Least Squares (RLS) estimation, which outperforms gradient methods usually applied in the literature. In addiction, the method provides the required information to apply an active learning algorithm to find the optimal set of robot configurations and observations to improve the learning process. By selecting the most informative observations, the proposed method minimizes the required amount of data. We have developed an efficient version of the active learning algorithm to select the points in real-time. The algorithms have been tested and compared using both simulated environments and a real humanoid robot.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Event Transformer+. A multi-purpose solution for efficient event data processing
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current topperforming methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while e…
▽ More
Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current topperforming methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer+, that improves our seminal work EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream classification (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.
△ Less
Submitted 3 September, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Event Transformer. A sparse-aware solution for efficient event data processing
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensi…
▽ More
Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms. Efforts toward efficient solutions usually do not achieve top-accuracy results for complex tasks. This work proposes a novel framework, Event Transformer (EvT), that effectively takes advantage of event-data properties to be highly efficient and accurate. We introduce a new patch-based event representation and a compact transformer-like architecture to process it. EvT is evaluated on different event-based benchmarks for action and gesture recognition. Evaluation results show better or comparable accuracy to the state-of-the-art while requiring significantly less computation resources, which makes EvT able to work with minimal latency both on GPU and CPU.
△ Less
Submitted 18 April, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank
Authors:
Inigo Alonso,
Alberto Sabater,
David Ferstl,
Luis Montesano,
Ana C. Murillo
Abstract:
This work presents a novel approach for semi-supervised semantic segmentation. The key element of this approach is our contrastive learning module that enforces the segmentation network to yield similar pixel-level feature representations for same-class samples across the whole dataset. To achieve this, we maintain a memory bank continuously updated with relevant and high-quality feature vectors f…
▽ More
This work presents a novel approach for semi-supervised semantic segmentation. The key element of this approach is our contrastive learning module that enforces the segmentation network to yield similar pixel-level feature representations for same-class samples across the whole dataset. To achieve this, we maintain a memory bank continuously updated with relevant and high-quality feature vectors from labeled data. In an end-to-end training, the features from both labeled and unlabeled data are optimized to be similar to same-class samples from the memory bank. Our approach outperforms the current state-of-the-art for semi-supervised semantic segmentation and semi-supervised domain adaptation on well-known public benchmarks, with larger improvements on the most challenging scenarios, i.e., less available labeled data. https://github.com/Shathe/SemiSeg-Contrastive
△ Less
Submitted 6 August, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Domain and View-point Agnostic Hand Action Recognition
Authors:
Alberto Sabater,
Iñigo Alonso,
Luis Montesano,
Ana C. Murillo
Abstract:
Hand action recognition is a special case of action recognition with applications in human-robot interaction, virtual reality or life-logging systems. Building action classifiers able to work for such heterogeneous action domains is very challenging. There are very subtle changes across different actions from a given application but also large variations across domains (e.g. virtual reality vs lif…
▽ More
Hand action recognition is a special case of action recognition with applications in human-robot interaction, virtual reality or life-logging systems. Building action classifiers able to work for such heterogeneous action domains is very challenging. There are very subtle changes across different actions from a given application but also large variations across domains (e.g. virtual reality vs life-logging). This work introduces a novel skeleton-based hand motion representation model that tackles this problem. The framework we propose is agnostic to the application domain or camera recording view-point. When working on a single domain (intra-domain action classification) our approach performs better or similar to current state-of-the-art methods on well-known hand action recognition benchmarks. And, more importantly, when performing hand action recognition for action domains and camera perspectives which our approach has not been trained for (cross-domain action classification), our proposed framework achieves comparable performance to intra-domain state-of-the-art methods. These experiments show the robustness and generalization capabilities of our framework.
△ Less
Submitted 7 October, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
One-shot action recognition in challenging therapy scenarios
Authors:
Alberto Sabater,
Laura Santos,
Jose Santos-Victor,
Alexandre Bernardino,
Luis Montesano,
Ana C. Murillo
Abstract:
One-shot action recognition aims to recognize new action categories from a single reference example, typically referred to as the anchor example. This work presents a novel approach for one-shot action recognition in the wild that computes motion representations robust to variable kinematic conditions. One-shot action recognition is then performed by evaluating anchor and target motion representat…
▽ More
One-shot action recognition aims to recognize new action categories from a single reference example, typically referred to as the anchor example. This work presents a novel approach for one-shot action recognition in the wild that computes motion representations robust to variable kinematic conditions. One-shot action recognition is then performed by evaluating anchor and target motion representations. We also develop a set of complementary steps that boost the action recognition performance in the most challenging scenarios. Our approach is evaluated on the public NTU-120 one-shot action recognition benchmark, outperforming previous action recognition models. Besides, we evaluate our framework on a real use-case of therapy with autistic people. These recordings are particularly challenging due to high-level artifacts from the patient motion. Our results provide not only quantitative but also online qualitative measures, essential for the patient evaluation and monitoring during the actual therapy.
△ Less
Submitted 29 July, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Domain Adaptation in LiDAR Semantic Segmentation by Aligning Class Distributions
Authors:
Inigo Alonso,
Luis Riazuelo,
Luis Montesano,
Ana C. Murillo
Abstract:
LiDAR semantic segmentation provides 3D semantic information about the environment, an essential cue for intelligent systems during their decision making processes. Deep neural networks are achieving state-of-the-art results on large public benchmarks on this task. Unfortunately, finding models that generalize well or adapt to additional domains, where data distribution is different, remains a maj…
▽ More
LiDAR semantic segmentation provides 3D semantic information about the environment, an essential cue for intelligent systems during their decision making processes. Deep neural networks are achieving state-of-the-art results on large public benchmarks on this task. Unfortunately, finding models that generalize well or adapt to additional domains, where data distribution is different, remains a major challenge. This work addresses the problem of unsupervised domain adaptation for LiDAR semantic segmentation models. Our approach combines novel ideas on top of the current state-of-the-art approaches and yields new state-of-the-art results. We propose simple but effective strategies to reduce the domain shift by aligning the data distribution on the input space. Besides, we propose a learning-based approach that aligns the distribution of the semantic classes of the target domain to the source domain. The presented ablation study shows how each part contributes to the final performance. Our strategy is shown to outperform previous approaches for domain adaptation with comparisons run on three different domains.
△ Less
Submitted 3 December, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Robust and efficient post-processing for video object detection
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast…
▽ More
Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast post-processing algorithm achieve the current state-of-the-art. This work introduces a novel post-processing pipeline that overcomes some of the limitations of previous post-processing methods by introducing a learning-based similarity evaluation between detections across frames. Our method improves the results of state-of-the-art specific video detectors, specially regarding fast moving objects, and presents low resource requirements. And applied to efficient still image detectors, such as YOLO, provides comparable results to much more computationally intensive detectors.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Performance of object recognition in wearable videos
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Wearable technologies are enabling plenty of new applications of computer vision, from life logging to health assistance. Many of them are required to recognize the elements of interest in the scene captured by the camera. This work studies the problem of object detection and localization on videos captured by this type of camera. Wearable videos are a much more challenging scenario for object det…
▽ More
Wearable technologies are enabling plenty of new applications of computer vision, from life logging to health assistance. Many of them are required to recognize the elements of interest in the scene captured by the camera. This work studies the problem of object detection and localization on videos captured by this type of camera. Wearable videos are a much more challenging scenario for object detection than standard images or even another type of videos, due to lower quality images (e.g. poor focus) or high clutter and occlusion common in wearable recordings. Existing work typically focuses on detecting the objects of focus or those being manipulated by the user wearing the camera. We perform a more general evaluation of the task of object detection in this type of video, because numerous applications, such as marketing studies, also need detecting objects which are not in focus by the user. This work presents a thorough study of the well known YOLO architecture, that offers an excellent trade-off between accuracy and speed, for the particular case of object detection in wearable video. We focus our study on the public ADL Dataset, but we also use additional public data for complementary evaluations. We run an exhaustive set of experiments with different variations of the original architecture and its training strategy. Our experiments drive to several conclusions about the most promising directions for our goal and point us to further research steps to improve detection in wearable videos.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
3D-MiniNet: Learning a 2D Representation from Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation
Authors:
Iñigo Alonso,
Luis Riazuelo,
Luis Montesano,
Ana C. Murillo
Abstract:
LIDAR semantic segmentation, which assigns a semantic label to each 3D point measured by the LIDAR, is becoming an essential task for many robotic applications such as autonomous driving. Fast and efficient semantic segmentation methods are needed to match the strong computational and temporal restrictions of many of these real-world applications.
This work presents 3D-MiniNet, a novel approach…
▽ More
LIDAR semantic segmentation, which assigns a semantic label to each 3D point measured by the LIDAR, is becoming an essential task for many robotic applications such as autonomous driving. Fast and efficient semantic segmentation methods are needed to match the strong computational and temporal restrictions of many of these real-world applications.
This work presents 3D-MiniNet, a novel approach for LIDAR semantic segmentation that combines 3D and 2D learning layers. It first learns a 2D representation from the raw points through a novel projection which extracts local and global information from the 3D data. This representation is fed to an efficient 2D Fully Convolutional Neural Network (FCNN) that produces a 2D semantic segmentation. These 2D semantic labels are re-projected back to the 3D space and enhanced through a post-processing module. The main novelty in our strategy relies on the projection learning module. Our detailed ablation study shows how each component contributes to the final performance of 3D-MiniNet. We validate our approach on well known public benchmarks (SemanticKITTI and KITTI), where 3D-MiniNet gets state-of-the-art results while being faster and more parameter-efficient than previous methods.
△ Less
Submitted 27 April, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth
Authors:
Jose M. Facil,
Benjamin Ummenhofer,
Huizhong Zhou,
Luis Montesano,
Thomas Brox,
Javier Civera
Abstract:
Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn cali…
▽ More
Single-view depth estimation suffers from the problem that a network trained on images from one camera does not generalize to images taken with a different camera model. Thus, changing the camera model requires collecting an entirely new training dataset. In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns. Experiments confirm that this improves the generalization capabilities of depth prediction networks considerably, and clearly outperforms the state of the art when the train and test images are acquired with different cameras.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Condition-Invariant Multi-View Place Recognition
Authors:
Jose M. Facil,
Daniel Olid,
Luis Montesano,
Javier Civera
Abstract:
Visual place recognition is particularly challenging when places suffer changes in its appearance. Such changes are indeed common, e.g., due to weather, night/day or seasons. In this paper we leverage on recent research using deep networks, and explore how they can be improved by exploiting the temporal sequence information. Specifically, we propose 3 different alternatives (Descriptor Grou**, F…
▽ More
Visual place recognition is particularly challenging when places suffer changes in its appearance. Such changes are indeed common, e.g., due to weather, night/day or seasons. In this paper we leverage on recent research using deep networks, and explore how they can be improved by exploiting the temporal sequence information. Specifically, we propose 3 different alternatives (Descriptor Grou**, Fusion and Recurrent Descriptors) for deep networks to use several frames of a sequence. We show that our approaches produce more compact and best performing descriptors than single- and multi-view baselines in the literature in two public databases.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Differentiating resting brain states using ordinal symbolic analysis
Authors:
C. Quintero-Quiroz,
Luis Montesano,
A. J. Pons,
M. C. Torrent,
J. García-Ojalvo,
C. Masoller
Abstract:
Symbolic methods of analysis are valuable tools for investigating complex time-dependent signals. In particular, the ordinal method defines sequences of symbols according to the ordering in which values appear in a time series. This method has been shown to yield useful information, even when applied to signals with large noise contamination. Here we use ordinal analysis to investigate the transit…
▽ More
Symbolic methods of analysis are valuable tools for investigating complex time-dependent signals. In particular, the ordinal method defines sequences of symbols according to the ordering in which values appear in a time series. This method has been shown to yield useful information, even when applied to signals with large noise contamination. Here we use ordinal analysis to investigate the transition between eyes closed (EC) and eyes open (EO) resting states. We analyze two {EEG} datasets (with 71 and 109 healthy subjects) with different recording conditions (sampling rates and the number of electrodes in the scalp). Using as diagnostic tools the permutation entropy, the entropy computed from symbolic transition probabilities, and an asymmetry coefficient (that measures the asymmetry of the likelihood of the transitions between symbols) we show that ordinal analysis applied to the raw data distinguishes the two brain states. In both datasets, we find that the EO state is characterized by higher entropies and lower asymmetry coefficient, as compared to the EC state. Our results thus show that these diagnostic tools have the potential for detecting and characterizing changes in time-evolving brain states.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Language Bootstrap**: Learning Word Meanings From Perception-Action Association
Authors:
Giampiero Salvi,
Luis Montesano,
Alexandre Bernardino,
José Santos-Victor
Abstract:
We address the problem of bootstrap** language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a map** between robot actions, ro…
▽ More
We address the problem of bootstrap** language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a map** between robot actions, robot perceptions, and the perceived effects of these actions upon objects. We extend the affordance model to incorporate spoken words, which allows us to ground the verbal symbols to the execution of actions and the perception of the environment. The model takes verbal descriptions of a task as the input and uses temporal co-occurrence to create links between speech utterances and the involved objects, actions, and effects. We show that the robot is able form useful word-to-meaning associations, even without considering grammatical structure in the learning process and in the presence of recognition errors. These word-to-meaning associations are embedded in the robot's own understanding of its actions. Thus, they can be directly used to instruct the robot to perform tasks and also allow to incorporate context in the speech recognition task. We believe that the encouraging results with our approach may afford robots with a capacity to acquire language descriptors in their operation's environment as well as to shed some light as to how this challenging process develops with human infants.
△ Less
Submitted 27 November, 2017;
originally announced November 2017.
-
Single-View and Multi-View Depth Fusion
Authors:
José M. Fácil,
Alejo Concha,
Luis Montesano,
Javier Civera
Abstract:
Dense and accurate 3D map** from a monocular sequence is a key technology for several applications and still an open research area. This paper leverages recent results on single-view CNN-based depth estimation and fuses them with multi-view depth estimation. Both approaches present complementary strengths. Multi-view depth is highly accurate but only in high-texture areas and high-parallax cases…
▽ More
Dense and accurate 3D map** from a monocular sequence is a key technology for several applications and still an open research area. This paper leverages recent results on single-view CNN-based depth estimation and fuses them with multi-view depth estimation. Both approaches present complementary strengths. Multi-view depth is highly accurate but only in high-texture areas and high-parallax cases. Single-view depth captures the local structure of mid-level regions, including texture-less areas, but the estimated depth lacks global coherence. The single and multi-view fusion we propose is challenging in several aspects. First, both depths are related by a deformation that depends on the image content. Second, the selection of multi-view points of high accuracy might be difficult for low-parallax configurations. We present contributions for both problems. Our results in the public datasets of NYUv2 and TUM shows that our algorithm outperforms the individual single and multi-view approaches. A video showing the key aspects of map** in our Single and Multi-view depth proposal is available at https://youtu.be/ipc5HukTb4k
△ Less
Submitted 27 June, 2017; v1 submitted 22 November, 2016;
originally announced November 2016.
-
Advantages of EEG phase patterns for the detection of gait intention in healthy and stroke subjects
Authors:
Andreea Ioana Sburlea,
Luis Montesano,
Javier Minguez
Abstract:
One use of EEG-based brain-computer interfaces (BCIs) in rehabilitation is the detection of movement intention. In this paper we investigate for the first time the instantaneous phase of movement related cortical potential (MRCP) and its application to the detection of gait intention. We demonstrate the utility of MRCP phase in two independent datasets, in which 10 healthy subjects and 9 chronic s…
▽ More
One use of EEG-based brain-computer interfaces (BCIs) in rehabilitation is the detection of movement intention. In this paper we investigate for the first time the instantaneous phase of movement related cortical potential (MRCP) and its application to the detection of gait intention. We demonstrate the utility of MRCP phase in two independent datasets, in which 10 healthy subjects and 9 chronic stroke patients executed a self-initiated gait task in three sessions. Phase features were compared to more conventional amplitude and power features. The neurophysiology analysis showed that phase features have higher signal-to-noise ratio than the other features. Also, BCI detectors of gait intention based on phase, amplitude, and their combination were evaluated under three conditions: session specific calibration, intersession transfer, and intersubject transfer. Results show that the phase based detector is the most accurate for session specific calibration (movement intention was correctly detected in 66.5% of trials in healthy subjects, and in 63.3% in stroke patients). However, in intersession and intersubject transfer, the detector that combines amplitude and phase features is the most accurate one and the only that retains its accuracy (62.5% in healthy subjects and 59% in stroke patients) w.r.t. session specific calibration. Thus, MRCP phase features improve the detection of gait intention and could be used in practice to remove time-consuming BCI recalibration.
△ Less
Submitted 15 May, 2016;
originally announced May 2016.
-
Active Learning for Autonomous Intelligent Agents: Exploration, Curiosity, and Interaction
Authors:
Manuel Lopes,
Luis Montesano
Abstract:
In this survey we present different approaches that allow an intelligent agent to explore autonomous its environment to gather information and learn multiple tasks. Different communities proposed different solutions, that are in many cases, similar and/or complementary. These solutions include active learning, exploration/exploitation, online-learning and social learning. The common aspect of all…
▽ More
In this survey we present different approaches that allow an intelligent agent to explore autonomous its environment to gather information and learn multiple tasks. Different communities proposed different solutions, that are in many cases, similar and/or complementary. These solutions include active learning, exploration/exploitation, online-learning and social learning. The common aspect of all these approaches is that it is the agent to selects and decides what information to gather next. Applications for these approaches already include tutoring systems, autonomous gras** learning, navigation and map** and human-robot interaction. We discuss how these approaches are related, explaining their similarities and their differences in terms of problem assumptions and metrics of success. We consider that such an integrated discussion will improve inter-disciplinary research and applications.
△ Less
Submitted 6 March, 2014;
originally announced March 2014.
-
On the Performance of Maximum Likelihood Inverse Reinforcement Learning
Authors:
Héctor Ratia,
Luis Montesano,
Ruben Martinez-Cantin
Abstract:
Inverse reinforcement learning (IRL) addresses the problem of recovering a task description given a demonstration of the optimal policy used to solve such a task. The optimal policy is usually provided by an expert or teacher, making IRL specially suitable for the problem of apprenticeship learning. The task description is encoded in the form of a reward function of a Markov decision process (MDP)…
▽ More
Inverse reinforcement learning (IRL) addresses the problem of recovering a task description given a demonstration of the optimal policy used to solve such a task. The optimal policy is usually provided by an expert or teacher, making IRL specially suitable for the problem of apprenticeship learning. The task description is encoded in the form of a reward function of a Markov decision process (MDP). Several algorithms have been proposed to find the reward function corresponding to a set of demonstrations. One of the algorithms that has provided best results in different applications is a gradient method to optimize a policy squared error criterion. On a parallel line of research, other authors have presented recently a gradient approximation of the maximum likelihood estimate of the reward signal. In general, both approaches approximate the gradient estimate and the criteria at different stages to make the algorithm tractable and efficient. In this work, we provide a detailed description of the different methods to highlight differences in terms of reward estimation, policy similarity and computational costs. We also provide experimental results to evaluate the differences in performance of the methods.
△ Less
Submitted 7 February, 2012;
originally announced February 2012.