-
EventSleep: Sleep Activity Recognition with Event Cameras
Authors:
Carlos Plou,
Nerea Gallego,
Alberto Sabater,
Eduardo Montijano,
Pablo Urcola,
Luis Montesano,
Ruben Martinez-Cantin,
Ana C. Murillo
Abstract:
Event cameras are a promising technology for activity recognition in dark environments due to their unique properties. However, real event camera datasets under low-lighting conditions are still scarce, which also limits the number of approaches to solve these kind of problems, hindering the potential of this technology in many applications. We present EventSleep, a new dataset and methodology to…
▽ More
Event cameras are a promising technology for activity recognition in dark environments due to their unique properties. However, real event camera datasets under low-lighting conditions are still scarce, which also limits the number of approaches to solve these kind of problems, hindering the potential of this technology in many applications. We present EventSleep, a new dataset and methodology to address this gap and study the suitability of event cameras for a very relevant medical application: sleep monitoring for sleep disorders analysis. The dataset contains synchronized event and infrared recordings emulating common movements that happen during the sleep, resulting in a new challenging and unique dataset for activity recognition in dark environments. Our novel pipeline is able to achieve high accuracy under these challenging conditions and incorporates a Bayesian approach (Laplace ensembles) to increase the robustness in the predictions, which is fundamental for medical applications. Our work is the first application of Bayesian neural networks for event cameras, the first use of Laplace ensembles in a realistic problem, and also demonstrates for the first time the potential of event cameras in a new application domain: to enhance current sleep evaluation procedures. Our activity recognition results highlight the potential of event cameras under dark conditions, and its capacity and robustness for sleep activity recognition, and open problems as the adaptation of event data pre-processing techniques to dark environments.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
SpectralWaste Dataset: Multimodal Data for Waste Sorting Automation
Authors:
Sara Casao,
Fernando Peña,
Alberto Sabater,
Rosa Castillón,
Darío Suárez,
Eduardo Montijano,
Ana C. Murillo
Abstract:
The increase in non-biodegradable waste is a worldwide concern. Recycling facilities play a crucial role, but their automation is hindered by the complex characteristics of waste recycling lines like clutter or object deformation. In addition, the lack of publicly available labeled data for these environments makes develo** robust perception systems challenging. Our work explores the benefits of…
▽ More
The increase in non-biodegradable waste is a worldwide concern. Recycling facilities play a crucial role, but their automation is hindered by the complex characteristics of waste recycling lines like clutter or object deformation. In addition, the lack of publicly available labeled data for these environments makes develo** robust perception systems challenging. Our work explores the benefits of multimodal perception for object segmentation in real waste management scenarios. First, we present SpectralWaste, the first dataset collected from an operational plastic waste sorting facility that provides synchronized hyperspectral and conventional RGB images. This dataset contains labels for several categories of objects that commonly appear in sorting plants and need to be detected and separated from the main trash flow for several reasons, such as security in the management line or reuse. Additionally, we propose a pipeline employing different object segmentation architectures and evaluate the alternatives on our dataset, conducting an extensive analysis for both multimodal and unimodal alternatives. Our evaluation pays special attention to efficiency and suitability for real-time processing and demonstrates how HSI can bring a boost to RGB-only perception in these realistic industrial settings without much computational overhead.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Event Transformer+. A multi-purpose solution for efficient event data processing
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current topperforming methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while e…
▽ More
Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current topperforming methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer+, that improves our seminal work EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream classification (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.
△ Less
Submitted 3 September, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Event Transformer. A sparse-aware solution for efficient event data processing
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensi…
▽ More
Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms. Efforts toward efficient solutions usually do not achieve top-accuracy results for complex tasks. This work proposes a novel framework, Event Transformer (EvT), that effectively takes advantage of event-data properties to be highly efficient and accurate. We introduce a new patch-based event representation and a compact transformer-like architecture to process it. EvT is evaluated on different event-based benchmarks for action and gesture recognition. Evaluation results show better or comparable accuracy to the state-of-the-art while requiring significantly less computation resources, which makes EvT able to work with minimal latency both on GPU and CPU.
△ Less
Submitted 18 April, 2022; v1 submitted 7 April, 2022;
originally announced April 2022.
-
Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank
Authors:
Inigo Alonso,
Alberto Sabater,
David Ferstl,
Luis Montesano,
Ana C. Murillo
Abstract:
This work presents a novel approach for semi-supervised semantic segmentation. The key element of this approach is our contrastive learning module that enforces the segmentation network to yield similar pixel-level feature representations for same-class samples across the whole dataset. To achieve this, we maintain a memory bank continuously updated with relevant and high-quality feature vectors f…
▽ More
This work presents a novel approach for semi-supervised semantic segmentation. The key element of this approach is our contrastive learning module that enforces the segmentation network to yield similar pixel-level feature representations for same-class samples across the whole dataset. To achieve this, we maintain a memory bank continuously updated with relevant and high-quality feature vectors from labeled data. In an end-to-end training, the features from both labeled and unlabeled data are optimized to be similar to same-class samples from the memory bank. Our approach outperforms the current state-of-the-art for semi-supervised semantic segmentation and semi-supervised domain adaptation on well-known public benchmarks, with larger improvements on the most challenging scenarios, i.e., less available labeled data. https://github.com/Shathe/SemiSeg-Contrastive
△ Less
Submitted 6 August, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Domain and View-point Agnostic Hand Action Recognition
Authors:
Alberto Sabater,
Iñigo Alonso,
Luis Montesano,
Ana C. Murillo
Abstract:
Hand action recognition is a special case of action recognition with applications in human-robot interaction, virtual reality or life-logging systems. Building action classifiers able to work for such heterogeneous action domains is very challenging. There are very subtle changes across different actions from a given application but also large variations across domains (e.g. virtual reality vs lif…
▽ More
Hand action recognition is a special case of action recognition with applications in human-robot interaction, virtual reality or life-logging systems. Building action classifiers able to work for such heterogeneous action domains is very challenging. There are very subtle changes across different actions from a given application but also large variations across domains (e.g. virtual reality vs life-logging). This work introduces a novel skeleton-based hand motion representation model that tackles this problem. The framework we propose is agnostic to the application domain or camera recording view-point. When working on a single domain (intra-domain action classification) our approach performs better or similar to current state-of-the-art methods on well-known hand action recognition benchmarks. And, more importantly, when performing hand action recognition for action domains and camera perspectives which our approach has not been trained for (cross-domain action classification), our proposed framework achieves comparable performance to intra-domain state-of-the-art methods. These experiments show the robustness and generalization capabilities of our framework.
△ Less
Submitted 7 October, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
One-shot action recognition in challenging therapy scenarios
Authors:
Alberto Sabater,
Laura Santos,
Jose Santos-Victor,
Alexandre Bernardino,
Luis Montesano,
Ana C. Murillo
Abstract:
One-shot action recognition aims to recognize new action categories from a single reference example, typically referred to as the anchor example. This work presents a novel approach for one-shot action recognition in the wild that computes motion representations robust to variable kinematic conditions. One-shot action recognition is then performed by evaluating anchor and target motion representat…
▽ More
One-shot action recognition aims to recognize new action categories from a single reference example, typically referred to as the anchor example. This work presents a novel approach for one-shot action recognition in the wild that computes motion representations robust to variable kinematic conditions. One-shot action recognition is then performed by evaluating anchor and target motion representations. We also develop a set of complementary steps that boost the action recognition performance in the most challenging scenarios. Our approach is evaluated on the public NTU-120 one-shot action recognition benchmark, outperforming previous action recognition models. Besides, we evaluate our framework on a real use-case of therapy with autistic people. These recordings are particularly challenging due to high-level artifacts from the patient motion. Our results provide not only quantitative but also online qualitative measures, essential for the patient evaluation and monitoring during the actual therapy.
△ Less
Submitted 29 July, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Robust and efficient post-processing for video object detection
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast…
▽ More
Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast post-processing algorithm achieve the current state-of-the-art. This work introduces a novel post-processing pipeline that overcomes some of the limitations of previous post-processing methods by introducing a learning-based similarity evaluation between detections across frames. Our method improves the results of state-of-the-art specific video detectors, specially regarding fast moving objects, and presents low resource requirements. And applied to efficient still image detectors, such as YOLO, provides comparable results to much more computationally intensive detectors.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Performance of object recognition in wearable videos
Authors:
Alberto Sabater,
Luis Montesano,
Ana C. Murillo
Abstract:
Wearable technologies are enabling plenty of new applications of computer vision, from life logging to health assistance. Many of them are required to recognize the elements of interest in the scene captured by the camera. This work studies the problem of object detection and localization on videos captured by this type of camera. Wearable videos are a much more challenging scenario for object det…
▽ More
Wearable technologies are enabling plenty of new applications of computer vision, from life logging to health assistance. Many of them are required to recognize the elements of interest in the scene captured by the camera. This work studies the problem of object detection and localization on videos captured by this type of camera. Wearable videos are a much more challenging scenario for object detection than standard images or even another type of videos, due to lower quality images (e.g. poor focus) or high clutter and occlusion common in wearable recordings. Existing work typically focuses on detecting the objects of focus or those being manipulated by the user wearing the camera. We perform a more general evaluation of the task of object detection in this type of video, because numerous applications, such as marketing studies, also need detecting objects which are not in focus by the user. This work presents a thorough study of the well known YOLO architecture, that offers an excellent trade-off between accuracy and speed, for the particular case of object detection in wearable video. We focus our study on the public ADL Dataset, but we also use additional public data for complementary evaluations. We run an exhaustive set of experiments with different variations of the original architecture and its training strategy. Our experiments drive to several conclusions about the most promising directions for our goal and point us to further research steps to improve detection in wearable videos.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
The Polarimetric and Helioseismic Imager on Solar Orbiter
Authors:
S. K. Solanki,
J. C. del Toro Iniesta,
J. Woch,
A. Gandorfer,
J. Hirzberger,
A. Alvarez-Herrero,
T. Appourchaux,
V. Martínez Pillet,
I. Pérez-Grande,
E. Sanchis Kilders,
W. Schmidt,
J. M. Gómez Cama,
H. Michalik,
W. Deutsch,
G. Fernandez-Rico,
B. Grauf,
L. Gizon,
K. Heerlein,
M. Kolleck,
A. Lagg,
R. Meller,
R. Müller,
U. Schühle,
J. Staub,
K. Albert
, et al. (99 additional authors not shown)
Abstract:
This paper describes the Polarimetric and Helioseismic Imager on the Solar Orbiter mission (SO/PHI), the first magnetograph and helioseismology instrument to observe the Sun from outside the Sun-Earth line. It is the key instrument meant to address the top-level science question: How does the solar dynamo work and drive connections between the Sun and the heliosphere? SO/PHI will also play an impo…
▽ More
This paper describes the Polarimetric and Helioseismic Imager on the Solar Orbiter mission (SO/PHI), the first magnetograph and helioseismology instrument to observe the Sun from outside the Sun-Earth line. It is the key instrument meant to address the top-level science question: How does the solar dynamo work and drive connections between the Sun and the heliosphere? SO/PHI will also play an important role in answering the other top-level science questions of Solar Orbiter, as well as hosting the potential of a rich return in further science.
SO/PHI measures the Zeeman effect and the Doppler shift in the FeI 617.3nm spectral line. To this end, the instrument carries out narrow-band imaging spectro-polarimetry using a tunable LiNbO_3 Fabry-Perot etalon, while the polarisation modulation is done with liquid crystal variable retarders (LCVRs). The line and the nearby continuum are sampled at six wavelength points and the data are recorded by a 2kx2k CMOS detector. To save valuable telemetry, the raw data are reduced on board, including being inverted under the assumption of a Milne-Eddington atmosphere, although simpler reduction methods are also available on board. SO/PHI is composed of two telescopes; one, the Full Disc Telescope (FDT), covers the full solar disc at all phases of the orbit, while the other, the High Resolution Telescope (HRT), can resolve structures as small as 200km on the Sun at closest perihelion. The high heat load generated through proximity to the Sun is greatly reduced by the multilayer-coated entrance windows to the two telescopes that allow less than 4% of the total sunlight to enter the instrument, most of it in a narrow wavelength band around the chosen spectral line.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.