Search | arXiv e-print repository

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

Authors: Vivienne Huiling Wang, Tinghuai Wang, Wenyan Yang, Joni-Kristian Kämäräinen, Joni Pajarinen

Abstract: In goal-conditioned hierarchical reinforcement learning (HRL), a high-level policy specifies a subgoal for the low-level policy to reach. Effective HRL hinges on a suitable subgoal represen tation function, abstracting state space into latent subgoal space and inducing varied low-level behaviors. Existing methods adopt a subgoal representation that provides a deterministic map** from state space… ▽ More In goal-conditioned hierarchical reinforcement learning (HRL), a high-level policy specifies a subgoal for the low-level policy to reach. Effective HRL hinges on a suitable subgoal represen tation function, abstracting state space into latent subgoal space and inducing varied low-level behaviors. Existing methods adopt a subgoal representation that provides a deterministic map** from state space to latent subgoal space. Instead, this paper utilizes Gaussian Processes (GPs) for the first probabilistic subgoal representation. Our method employs a GP prior on the latent subgoal space to learn a posterior distribution over the subgoal representation functions while exploiting the long-range correlation in the state space through learnable kernels. This enables an adaptive memory that integrates long-range subgoal information from prior planning steps allowing to cope with stochastic uncertainties. Furthermore, we propose a novel learning objective to facilitate the simultaneous learning of probabilistic subgoal representations and policies within a unified framework. In experiments, our approach outperforms state-of-the-art baselines in standard benchmarks but also in environments with stochastic elements and under diverse reward conditions. Additionally, our model shows promising capabilities in transferring low-level policies across different tasks. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2309.17260 [pdf, other]

PlaceNav: Topological Navigation through Place Recognition

Authors: Lauri Suomela, Jussi Kalliola, Harry Edelman, Joni-Kristian Kämäräinen

Abstract: Recent results suggest that splitting topological navigation into robot-independent and robot-specific components improves navigation performance by enabling the robot-independent part to be trained with data collected by robots of different types. However, the navigation methods' performance is still limited by the scarcity of suitable training data and they suffer from poor computational scaling… ▽ More Recent results suggest that splitting topological navigation into robot-independent and robot-specific components improves navigation performance by enabling the robot-independent part to be trained with data collected by robots of different types. However, the navigation methods' performance is still limited by the scarcity of suitable training data and they suffer from poor computational scaling. In this work, we present PlaceNav, subdividing the robot-independent part into navigation-specific and generic computer vision components. We utilize visual place recognition for the subgoal selection of the topological navigation pipeline. This makes subgoal selection more efficient and enables leveraging large-scale datasets from non-robotics sources, increasing training data availability. Bayesian filtering, enabled by place recognition, further improves navigation performance by increasing the temporal consistency of subgoals. Our experimental results verify the design and the new method obtains a 76% higher success rate in indoor and 23% higher in outdoor navigation tasks with higher computational efficiency. △ Less

Submitted 29 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: ICRA2024 camera ready

arXiv:2303.09334 [pdf, other]

Depth-Aware Image Compositing Model for Parallax Camera Motion Blur

Authors: German F. Torres, Joni-Kristian Kämäräinen

Abstract: Camera motion introduces spatially varying blur due to the depth changes in the 3D world. This work investigates scene configurations where such blur is produced under parallax camera motion. We present a simple, yet accurate, Image Compositing Blur (ICB) model for depth-dependent spatially varying blur. The (forward) model produces realistic motion blur from a single image, depth map, and camera… ▽ More Camera motion introduces spatially varying blur due to the depth changes in the 3D world. This work investigates scene configurations where such blur is produced under parallax camera motion. We present a simple, yet accurate, Image Compositing Blur (ICB) model for depth-dependent spatially varying blur. The (forward) model produces realistic motion blur from a single image, depth map, and camera trajectory. Furthermore, we utilize the ICB model, combined with a coordinate-based MLP, to learn a sharp neural representation from the blurred input. Experimental results are reported for synthetic and real examples. The results verify that the ICB forward model is computationally efficient and produces realistic blur, despite the lack of occlusion information. Additionally, our method for restoring a sharp representation proves to be a competitive approach for the deblurring task. △ Less

Submitted 30 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.02646 [pdf, other]

Seq2Seq Imitation Learning for Tactile Feedback-based Manipulation

Authors: Wenyan Yang, Alexandre Angleraud, Roel S. Pieters, Joni Pajarinen, Joni-Kristian Kämäräinen

Abstract: Robot control for tactile feedback-based manipulation can be difficult due to the modeling of physical contacts, partial observability of the environment, and noise in perception and control. This work focuses on solving partial observability of contact-rich manipulation tasks as a Sequence-to-Sequence (Seq2Seq)} Imitation Learning (IL) problem. The proposed Seq2Seq model produces a robot-environm… ▽ More Robot control for tactile feedback-based manipulation can be difficult due to the modeling of physical contacts, partial observability of the environment, and noise in perception and control. This work focuses on solving partial observability of contact-rich manipulation tasks as a Sequence-to-Sequence (Seq2Seq)} Imitation Learning (IL) problem. The proposed Seq2Seq model produces a robot-environment interaction sequence to estimate the partially observable environment state variables. Then, the observed interaction sequence is transformed to a control sequence for the task itself. The proposed Seq2Seq IL for tactile feedback-based manipulation is experimentally validated on a door-open task in a simulated environment and a snap-on insertion task with a real robot. The model is able to learn both tasks from only 50 expert demonstrations, while state-of-the-art reinforcement learning and imitation learning methods fail. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2302.08865 [pdf, other]

Swapped goal-conditioned offline reinforcement learning

Authors: Wenyan Yang, Huiling Wang, Dingding Cai, Joni Pajarinen, Joni-Kristen Kämäräinen

Abstract: Offline goal-conditioned reinforcement learning (GCRL) can be challenging due to overfitting to the given dataset. To generalize agents' skills outside the given dataset, we propose a goal-swap** procedure that generates additional trajectories. To alleviate the problem of noise and extrapolation errors, we present a general offline reinforcement learning method called deterministic Q-advantage… ▽ More Offline goal-conditioned reinforcement learning (GCRL) can be challenging due to overfitting to the given dataset. To generalize agents' skills outside the given dataset, we propose a goal-swap** procedure that generates additional trajectories. To alleviate the problem of noise and extrapolation errors, we present a general offline reinforcement learning method called deterministic Q-advantage policy gradient (DQAPG). In the experiments, DQAPG outperforms state-of-the-art goal-conditioned offline RL methods in a wide range of benchmark tasks, and goal-swap** further improves the test results. It is noteworthy, that the proposed method obtains good performance on the challenging dexterous in-hand manipulation tasks for which the prior methods failed. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.07741

arXiv:2302.07741 [pdf, other]

Prioritized offline Goal-swap** Experience Replay

Authors: Wenyan Yang, Joni Pajarinen, Dinging Cai, Joni Kämäräinen

Abstract: In goal-conditioned offline reinforcement learning, an agent learns from previously collected data to go to an arbitrary goal. Since the offline data only contains a finite number of trajectories, a main challenge is how to generate more data. Goal-swap** generates additional data by switching trajectory goals but while doing so produces a large number of invalid trajectories. To address this is… ▽ More In goal-conditioned offline reinforcement learning, an agent learns from previously collected data to go to an arbitrary goal. Since the offline data only contains a finite number of trajectories, a main challenge is how to generate more data. Goal-swap** generates additional data by switching trajectory goals but while doing so produces a large number of invalid trajectories. To address this issue, we propose prioritized goal-swap** experience replay (PGSER). PGSER uses a pre-trained Q function to assign higher priority weights to goal swapped transitions that allow reaching the goal. In experiments, PGSER significantly improves over baselines in a wide range of benchmark tasks, including challenging previously unsuccessful dexterous in-hand manipulation tasks. △ Less

Submitted 5 March, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2203.14134 [pdf, other]

RGBD Object Tracking: An In-depth Review

Authors: **yu Yang, Zhe Li, Song Yan, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen, Ling Shao

Abstract: RGBD object tracking is gaining momentum in computer vision research thanks to the development of depth sensors. Although numerous RGBD trackers have been proposed with promising performance, an in-depth review for comprehensive understanding of this area is lacking. In this paper, we firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking… ▽ More RGBD object tracking is gaining momentum in computer vision research thanks to the development of depth sensors. Although numerous RGBD trackers have been proposed with promising performance, an in-depth review for comprehensive understanding of this area is lacking. In this paper, we firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking framework. Then, we summarize the existing datasets and the evaluation metrics. We benchmark a representative set of RGBD trackers, and give detailed analyses based on their performances. Particularly, we are the first to provide depth quality evaluation and analysis of tracking results in depth-friendly scenarios in RGBD tracking. For long-term settings in most RGBD tracking videos, we give an analysis of trackers' performance on handling target disappearance. To enable better understanding of RGBD trackers, we propose robustness evaluation against input perturbations. Finally, we summarize the challenges and provide open directions for this community. All resources are publicly available at https://github.com/memoryunreal/RGBD-tracking-review. △ Less

Submitted 26 March, 2022; originally announced March 2022.

Comments: 13 pages

arXiv:2203.13048 [pdf, other]

Benchmarking Visual Localization for Autonomous Navigation

Authors: Lauri Suomela, Jussi Kalliola, Atakan Dag, Harry Edelman, Joni-Kristian Kämäräinen

Abstract: This work introduces a simulator-based benchmark for visual localization in the autonomous navigation context. The dynamic benchmark enables investigation of how variables such as the time of day, weather, and camera perspective affect the navigation performance of autonomous agents that utilize visual localization for closed-loop control. The experimental part of the paper studies the effects of… ▽ More This work introduces a simulator-based benchmark for visual localization in the autonomous navigation context. The dynamic benchmark enables investigation of how variables such as the time of day, weather, and camera perspective affect the navigation performance of autonomous agents that utilize visual localization for closed-loop control. The experimental part of the paper studies the effects of four such variables by evaluating state-of-the-art visual localization methods as part of the motion planning module of an autonomous navigation stack. The results show major variation in the suitability of the different methods for vision-based navigation. To the authors' best knowledge, the proposed benchmark is the first to study modern visual localization methods as part of a complete navigation stack. We make the benchmark available at https://github.com/lasuomela/carla_vloc_benchmark. △ Less

Submitted 18 October, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: WACV2023 camera ready

arXiv:2201.09635 [pdf, other]

State-Conditioned Adversarial Subgoal Generation

Authors: Vivienne Huiling Wang, Joni Pajarinen, Tinghuai Wang, Joni-Kristian Kämäräinen

Abstract: Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity… ▽ More Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state-conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks. △ Less

Submitted 13 March, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2110.11679 [pdf, other]

Depth-only Object Tracking

Authors: Song Yan, **yu Yang, Ales Leonardis, Joni-Kristian Kamarainen

Abstract: Depth (D) indicates occlusion and is less sensitive to illumination changes, which make depth attractive modality for Visual Object Tracking (VOT). Depth is used in RGBD object tracking where the best trackers are deep RGB trackers with additional heuristic using depth maps. There are two potential reasons for the heuristics: 1) the lack of large RGBD tracking datasets to train deep RGBD trackers… ▽ More Depth (D) indicates occlusion and is less sensitive to illumination changes, which make depth attractive modality for Visual Object Tracking (VOT). Depth is used in RGBD object tracking where the best trackers are deep RGB trackers with additional heuristic using depth maps. There are two potential reasons for the heuristics: 1) the lack of large RGBD tracking datasets to train deep RGBD trackers and 2) the long-term evaluation protocol of VOT RGBD that benefits from heuristics such as depth-based occlusion detection. In this work, we study how far D-only tracking can go if trained with large amounts of depth data. To compensate the lack of depth data, we generate depth maps for tracking. We train a "Depth-DiMP" from the scratch with the generated data and fine-tune it with the available small RGBD tracking datasets. The depth-only DiMP achieves good accuracy in depth-only tracking and combined with the original RGB DiMP the end-to-end trained RGBD-DiMP outperforms the recent VOT 2020 RGBD winners. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: Accepted to BMVC2021

arXiv:2108.13962 [pdf, other]

DepthTrack : Unveiling the Power of RGBD Tracking

Authors: Song Yan, **yu Yang, Jani Käpylä, Feng Zheng, Aleš Leonardis, Joni-Kristian Kämäräinen

Abstract: RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensors have become popular in many application fields such as robotics.However, the best RGBD trackers are extensions of the state-of-the-art deep RGB trackers. They are trained with RGB data and the depth channel is used as a sidekick for subtleties such as occlusion detection. This can be explained by the fact that there are no s… ▽ More RGBD (RGB plus depth) object tracking is gaining momentum as RGBD sensors have become popular in many application fields such as robotics.However, the best RGBD trackers are extensions of the state-of-the-art deep RGB trackers. They are trained with RGB data and the depth channel is used as a sidekick for subtleties such as occlusion detection. This can be explained by the fact that there are no sufficiently large RGBD datasets to 1) train deep depth trackers and to 2) challenge RGB trackers with sequences for which the depth cue is essential. This work introduces a new RGBD tracking dataset - Depth-Track - that has twice as many sequences (200) and scene types (40) than in the largest existing dataset, and three times more objects (90). In addition, the average length of the sequences (1473), the number of deformable objects (16) and the number of annotated tracking attributes (15) have been increased. Furthermore, by running the SotA RGB and RGBD trackers on DepthTrack, we propose a new RGBD tracking baseline, namely DeT, which reveals that deep RGBD tracking indeed benefits from genuine training data. The code and dataset is available at https://github.com/xiaozai/DeT △ Less

Submitted 31 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV2021

arXiv:2108.07514 [pdf, other]

Monolithic vs. hybrid controller for multi-objective Sim-to-Real learning

Authors: Atakan Dag, Alexandre Angleraud, Wenyan Yang, Nataliya Strokina, Roel S. Pieters, Minna Lanz, Joni-Kristian Kamarainen

Abstract: Simulation to real (Sim-to-Real) is an attractive approach to construct controllers for robotic tasks that are easier to simulate than to analytically solve. Working Sim-to-Real solutions have been demonstrated for tasks with a clear single objective such as "reach the target". Real world applications, however, often consist of multiple simultaneous objectives such as "reach the target" but "avoid… ▽ More Simulation to real (Sim-to-Real) is an attractive approach to construct controllers for robotic tasks that are easier to simulate than to analytically solve. Working Sim-to-Real solutions have been demonstrated for tasks with a clear single objective such as "reach the target". Real world applications, however, often consist of multiple simultaneous objectives such as "reach the target" but "avoid obstacles". A straightforward solution in the context of reinforcement learning (RL) is to combine multiple objectives into a multi-term reward function and train a single monolithic controller. Recently, a hybrid solution based on pre-trained single objective controllers and a switching rule between them was proposed. In this work, we compare these two approaches in the multi-objective setting of a robot manipulator to reach a target while avoiding an obstacle. Our findings show that the training of a hybrid controller is easier and obtains a better success-failure trade-off than a monolithic controller. The controllers trained in simulator were verified by a real set-up. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

arXiv:2103.12379 [pdf, other]

Neural Network Controller for Autonomous Pile Loading Revised

Authors: Wenyan Yang, Nataliya Strokina, Nikolay Serbenyuk, Joni Pajarinen, Reza Ghabcheloo, Juho Vihonen, Mohammad M. Aref, Joni-Kristian Kämäräinen

Abstract: We have recently proposed two pile loading controllers that learn from human demonstrations: a neural network (NNet) [1] and a random forest (RF) controller [2]. In the field experiments the RF controller obtained clearly better success rates. In this work, the previous findings are drastically revised by experimenting summer time trained controllers in winter conditions. The winter experiments re… ▽ More We have recently proposed two pile loading controllers that learn from human demonstrations: a neural network (NNet) [1] and a random forest (RF) controller [2]. In the field experiments the RF controller obtained clearly better success rates. In this work, the previous findings are drastically revised by experimenting summer time trained controllers in winter conditions. The winter experiments revealed a need for additional sensors, more training data, and a controller that can take advantage of these. Therefore, we propose a revised neural controller (NNetV2) which has a more expressive structure and uses a neural attention mechanism to focus on important parts of the sensor and control signals. Using the same data and sensors to train and test the three controllers, NNetV2 achieves better robustness against drastically changing conditions and superior success rate. To the best of our knowledge, this is the first work testing a learning-based controller for a heavy-duty machine in drastically varying outdoor conditions and delivering high success rate in winter, being trained in summer. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: 7 pages

arXiv:2101.02515 [pdf, other]

Learning Anthropometry from Rendered Humans

Authors: Song Yan, Joni-Kristian Kämäräinen

Abstract: Accurate estimation of anthropometric body measurements from RGB images has many potential applications in industrial design, online clothing, medical diagnosis and ergonomics. Research on this topic is limited by the fact that there exist only generated datasets which are based on fitting a 3D body mesh to 3D body scans in the commercial CAESAR dataset. For 2D only silhouettes are generated. To c… ▽ More Accurate estimation of anthropometric body measurements from RGB images has many potential applications in industrial design, online clothing, medical diagnosis and ergonomics. Research on this topic is limited by the fact that there exist only generated datasets which are based on fitting a 3D body mesh to 3D body scans in the commercial CAESAR dataset. For 2D only silhouettes are generated. To circumvent the data bottleneck, we introduce a new 3D scan dataset of 2,675 female and 1,474 male scans. We also introduce a small dataset of 200 RGB images and tape measured ground truth. With the help of the two new datasets we propose a part-based shape model and a deep neural network for estimating anthropometric measurements from 2D images. All data will be made publicly available. △ Less

Submitted 7 January, 2021; originally announced January 2021.

arXiv:2011.04612 [pdf, other]

Fast Fourier Intrinsic Network

Authors: Yanlin Qian, Miao**g Shi, Joni-Kristian Kämäräinen, Jiri Matas

Abstract: We address the problem of decomposing an image into albedo and shading. We propose the Fast Fourier Intrinsic Network, FFI-Net in short, that operates in the spectral domain, splitting the input into several spectral bands. Weights in FFI-Net are optimized in the spectral domain, allowing faster convergence to a lower error. FFI-Net is lightweight and does not need auxiliary networks for training.… ▽ More We address the problem of decomposing an image into albedo and shading. We propose the Fast Fourier Intrinsic Network, FFI-Net in short, that operates in the spectral domain, splitting the input into several spectral bands. Weights in FFI-Net are optimized in the spectral domain, allowing faster convergence to a lower error. FFI-Net is lightweight and does not need auxiliary networks for training. The network is trained end-to-end with a novel spectral loss which measures the global distance between the network prediction and corresponding ground truth. FFI-Net achieves state-of-the-art performance on MPI-Sintel, MIT Intrinsic, and IIW datasets. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: WACV 2021 - camera ready

arXiv:2003.03763 [pdf, other]

A Benchmark for Temporal Color Constancy

Authors: Yanlin Qian, Jani Käpylä, Joni-Kristian Kämäräinen, Samu Koskinen, Jiri Matas

Abstract: Temporal Color Constancy (CC) is a recently proposed approach that challenges the conventional single-frame color constancy. The conventional approach is to use a single frame - shot frame - to estimate the scene illumination color. In temporal CC, multiple frames from the view finder sequence are used to estimate the color. However, there are no realistic large scale temporal color constancy data… ▽ More Temporal Color Constancy (CC) is a recently proposed approach that challenges the conventional single-frame color constancy. The conventional approach is to use a single frame - shot frame - to estimate the scene illumination color. In temporal CC, multiple frames from the view finder sequence are used to estimate the color. However, there are no realistic large scale temporal color constancy datasets for method evaluation. In this work, a new temporal CC benchmark is introduced. The benchmark comprises of (1) 600 real-world sequences recorded with a high-resolution mobile phone camera, (2) a fixed train-test split which ensures consistent evaluation, and (3) a baseline method which achieves high accuracy in the new benchmark and the dataset used in previous works. Results for more than 20 well-known color constancy methods including the recent state-of-the-arts are reported in our experiments. △ Less

Submitted 8 March, 2020; originally announced March 2020.

Comments: 16 pages, 6 figures

arXiv:1912.00660 [pdf, other]

DAL -- A Deep Depth-aware Long-term Tracker

Authors: Yanlin Qian, Alan Lukežič, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas

Abstract: The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep f… ▽ More The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep features. Moreover, the same depth-aware correlation filter is used for target re-detection. Comprehensive evaluations show that the proposed tracker achieves state-of-the-art performance on the Princeton RGBD, STC, and the newly-released CDTB benchmarks and runs 20 fps. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 10 pages

arXiv:1911.00694 [pdf, other]

Anthropometric clothing measurements from 3D body scans

Authors: Song Yan, Johan Wirta, Joni-Kristian Kämäräinen

Abstract: We propose a full processing pipeline to acquire anthropometric measurements from 3D measurements. The first stage of our pipeline is a commercial point cloud scanner. In the second stage, a pre-defined body model is fitted to the captured point cloud. We have generated one male and one female model from the SMPL library. The fitting process is based on non-rigid Iterative Closest Point (ICP) algo… ▽ More We propose a full processing pipeline to acquire anthropometric measurements from 3D measurements. The first stage of our pipeline is a commercial point cloud scanner. In the second stage, a pre-defined body model is fitted to the captured point cloud. We have generated one male and one female model from the SMPL library. The fitting process is based on non-rigid Iterative Closest Point (ICP) algorithm that minimizes overall energy of point distance and local stiffness energy terms. In the third stage, we measure multiple circumference paths on the fitted model surface and use a non-linear regressor to provide the final estimates of anthropometric measurements. We scanned 194 male and 181 female subjects and the proposed pipeline provides mean absolute errors from 2.5 mm to 16.0 mm depending on the anthropometric measurement. △ Less

Submitted 2 November, 2019; originally announced November 2019.

arXiv:1909.02933 [pdf, other]

AR-based interaction for safe human-robot collaborative manufacturing

Authors: Antti Hietanen, Jyrki Latokartano, Roel Pieters, Minna Lanz, Joni-Kristian Kämäräinen

Abstract: Industrial standards define safety requirements for Human-Robot Collaboration (HRC) in industrial manufacturing. The standards particularly require real-time monitoring and securing of the minimum protective distance between a robot and an operator. In this work, we propose a depth-sensor based model for workspace monitoring and an interactive Augmented Reality (AR) User Interface (UI) for safe HR… ▽ More Industrial standards define safety requirements for Human-Robot Collaboration (HRC) in industrial manufacturing. The standards particularly require real-time monitoring and securing of the minimum protective distance between a robot and an operator. In this work, we propose a depth-sensor based model for workspace monitoring and an interactive Augmented Reality (AR) User Interface (UI) for safe HRC. The AR UI is implemented on two different hardware: a projector-mirror setup anda wearable AR gear (HoloLens). We experiment the workspace model and UIs for a realistic diesel motor assembly task. The AR-based interactive UIs provide 21-24% and 57-64% reduction in the task completion and robot idle time, respectively, as compared to a baseline without interaction and workspace sharing. However, subjective evaluations reveal that HoloLens based AR is not yet suitable for industrial manufacturing while the projector-mirror setup shows clear improvements in safety and work ergonomics. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: 7 pages, 7 figures

arXiv:1907.00618 [pdf, other]

CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

Authors: Alan Lukežič, Ugur Kart, Jani Käpylä, Ahmed Durmush, Joni-Kristian Kämäräinen, Jiří Matas, Matej Kristan

Abstract: A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize th… ▽ More A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term tackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1906.02783 [pdf, other]

Object Pose Estimation in Robotics Revisited

Authors: Antti Hietanen, Jyrki Latokartano, Alessandro Foi, Roel Pieters, Ville Kyrki, Minna Lanz, Joni-Kristian Kämäräinen

Abstract: Vision based object gras** and manipulation in robotics require accurate estimation of object's 6D pose. The 6D pose estimation has received significant attention in computer vision community and multiple datasets and evaluation metrics have been proposed. However, the existing metrics measure how well two geometrical surfaces are aligned - ground truth vs. estimated pose - which does not direct… ▽ More Vision based object gras** and manipulation in robotics require accurate estimation of object's 6D pose. The 6D pose estimation has received significant attention in computer vision community and multiple datasets and evaluation metrics have been proposed. However, the existing metrics measure how well two geometrical surfaces are aligned - ground truth vs. estimated pose - which does not directly measure how well a robot can perform the task with the given estimate. In this work we propose a probabilistic metric that directly measures success in robotic tasks. The evaluation metric is based on non-parametric probability density that is estimated from samples of a real physical setup. During the pose evaluation stage the physical setup is not needed. The evaluation metric is validated in controlled experiments and a new pose estimation dataset of industrial parts is introduced. The experimental results with the parts confirm that the proposed evaluation metric better reflects the true performance in robotics than the existing metrics. △ Less

Submitted 21 May, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: 29 pages, 8 figures

arXiv:1902.10466 [pdf, other]

Flash Lightens Gray Pixels

Authors: Yanlin Qian, Song Yan, Joni-Kristian Kämäräinen, Jiri Matas

Abstract: In the real world, a scene is usually cast by multiple illuminants and herein we address the problem of spatial illumination estimation. Our solution is based on detecting gray pixels with the help of flash photography. We show that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data or calibration of the flash. We also introduce… ▽ More In the real world, a scene is usually cast by multiple illuminants and herein we address the problem of spatial illumination estimation. Our solution is based on detecting gray pixels with the help of flash photography. We show that flash photography significantly improves the performance of gray pixel detection without illuminant prior, training data or calibration of the flash. We also introduce a novel flash photography dataset generated from the MIT intrinsic dataset. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: 5 pages including refs, 4 figures, submitted to International Conference on Image Processing

arXiv:1901.03198 [pdf, other]

On Finding Gray Pixels

Authors: Yanlin Qian, Joni-Kristian Kämäräinen, Jarno Nikkanen, Jiri Matas

Abstract: We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. GI allows to estimate one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmar… ▽ More We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. GI allows to estimate one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmarks, GI outperforms state-of-the-art statistical methods and many recent deep methods. GI is simple and fast, written in a few dozen lines of code, processing a 1080p image in ~0.4 seconds with a non-optimized Matlab code. △ Less

Submitted 2 May, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

Comments: appear in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019. 9 pages, 7 figures. this article is an extension of arXiv:1803.08326

arXiv:1811.10863 [pdf, other]

Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters

Authors: Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas

Abstract: Standard RGB-D trackers treat the target as an inherently 2D structure, which makes modelling appearance changes related even to simple out-of-plane rotation highly challenging. We address this limitation by proposing a novel long-term RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs online 3D target reconstruction to facilitate robust learning of a set of view-specifi… ▽ More Standard RGB-D trackers treat the target as an inherently 2D structure, which makes modelling appearance changes related even to simple out-of-plane rotation highly challenging. We address this limitation by proposing a novel long-term RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs). The 3D reconstruction supports two performance-enhancing features: (i) generation of accurate spatial support for constrained DCF learning from its 2D projection and (ii) point cloud based estimation of 3D pose change for selection and storage of view-specific DCFs which are used to robustly localize the target after out-of-view rotation or heavy occlusion. Extensive evaluation of OTR on the challenging Princeton RGB-D tracking and STC Benchmarks shows it outperforms the state-of-the-art by a large margin. △ Less

Submitted 27 November, 2018; originally announced November 2018.

arXiv:1808.05848 [pdf, other]

Performance Analysis and Robustification of Single-query 6-DoF Camera Pose Estimation

Authors: Junsheng Fu, Said Pertuz, Jiri Matas, Joni-Kristian Kämäräinen

Abstract: We consider a single-query 6-DoF camera pose estimation with reference images and a point cloud, i.e. the problem of estimating the position and orientation of a camera by using reference images and a point cloud. In this work, we perform a systematic comparison of three state-of-the-art strategies for 6-DoF camera pose estimation, i.e. feature-based, photometric-based and mutual-information-based… ▽ More We consider a single-query 6-DoF camera pose estimation with reference images and a point cloud, i.e. the problem of estimating the position and orientation of a camera by using reference images and a point cloud. In this work, we perform a systematic comparison of three state-of-the-art strategies for 6-DoF camera pose estimation, i.e. feature-based, photometric-based and mutual-information-based approaches. The performance of the studied methods is evaluated on two standard datasets in terms of success rate, translation error and max orientation error. Building on the results analysis, we propose a hybrid approach that combines feature-based and mutual-information-based pose estimation methods since it provides complementary properties for pose estimation. Experiments show that (1) in cases with large environmental variance, the hybrid approach outperforms feature-based and mutual-information-based approaches by an average of 25.1% and 5.8% in terms of success rate, respectively; (2) in cases where query and reference images are captured at similar imaging conditions, the hybrid approach performs similarly as the feature-based approach, but outperforms both photometric-based and mutual-information-based approaches with a clear margin; (3) the feature-based approach is consistently more accurate than mutual-information-based and photometric-based approaches when at least 4 consistent matching points are found between the query and reference images. △ Less

Submitted 17 August, 2018; originally announced August 2018.

arXiv:1805.08009 [pdf, other]

Object Detection in Equirectangular Panorama

Authors: Wenyan Yang, Yanlin Qian, Francesco Cricri, Lixin Fan, Joni-Kristian Kamarainen

Abstract: We introduced a high-resolution equirectangular panorama (360-degree, virtual reality) dataset for object detection and propose a multi-projection variant of YOLO detector. The main challenge with equirectangular panorama image are i) the lack of annotated training data, ii) high-resolution imagery and iii) severe geometric distortions of objects near the panorama projection poles. In this work, w… ▽ More We introduced a high-resolution equirectangular panorama (360-degree, virtual reality) dataset for object detection and propose a multi-projection variant of YOLO detector. The main challenge with equirectangular panorama image are i) the lack of annotated training data, ii) high-resolution imagery and iii) severe geometric distortions of objects near the panorama projection poles. In this work, we solve the challenges by i) using training examples available in the "conventional datasets" (ImageNet and COCO), ii) employing only low-resolution images that require only moderate GPU computing power and memory, and iii) our multi-projection YOLO handles projection distortions by making multiple stereographic sub-projections. In our experiments, YOLO outperforms the other state-of-art detector, Faster RCNN and our multi-projection YOLO achieves the best accuracy with low-resolution input. △ Less

Submitted 21 May, 2018; originally announced May 2018.

Comments: 6 pages

arXiv:1803.08326 [pdf, other]

Revisiting Gray Pixel for Statistical Illumination Estimation

Authors: Yanlin Qian, Said Pertuz, Jarno Nikkanen, Joni-Kristian Kämäräinen, Jiri Matas

Abstract: We present a statistical color constancy method that relies on novel gray pixel detection and mean shift clustering. The method, called Mean Shifted Grey Pixel -- MSGP, is based on the observation: true-gray pixels are aligned towards one single direction. Our solution is compact, easy to compute and requires no training. Experiments on two real-world benchmarks show that the proposed approach out… ▽ More We present a statistical color constancy method that relies on novel gray pixel detection and mean shift clustering. The method, called Mean Shifted Grey Pixel -- MSGP, is based on the observation: true-gray pixels are aligned towards one single direction. Our solution is compact, easy to compute and requires no training. Experiments on two real-world benchmarks show that the proposed approach outperforms state-of-the-art methods in the camera-agnostic scenario. In the setting where the camera is known, MSGP outperforms all statistical methods. △ Less

Submitted 9 January, 2019; v1 submitted 22 March, 2018; originally announced March 2018.

Comments: updated and will appear in VISSAP 2019 (long paper)

arXiv:1802.09227 [pdf, other]

Depth Masked Discriminative Correlation Filter

Authors: Uğur Kart, Joni-Kristian Kämäräinen, Jiří Matas, Lixin Fan, Francesco Cricri

Abstract: Depth information provides a strong cue for occlusion detection and handling, but has been largely omitted in generic object tracking until recently due to lack of suitable benchmark datasets and applications. In this work, we propose a Depth Masked Discriminative Correlation Filter (DM-DCF) which adopts novel depth segmentation based occlusion detection that stops correlation filter updating and… ▽ More Depth information provides a strong cue for occlusion detection and handling, but has been largely omitted in generic object tracking until recently due to lack of suitable benchmark datasets and applications. In this work, we propose a Depth Masked Discriminative Correlation Filter (DM-DCF) which adopts novel depth segmentation based occlusion detection that stops correlation filter updating and depth masking which adaptively adjusts the spatial support for correlation filter. In Princeton RGBD Tracking Benchmark, our DM-DCF is among the state-of-the-art in overall ranking and the winner on multiple categories. Moreover, since it is based on DCF, ``DM-DCF`` runs an order of magnitude faster than its competitors making it suitable for time constrained applications. △ Less

Submitted 10 October, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

arXiv:1708.06963 [pdf, other]

doi 10.1109/ICRA.2013.6630856

Pose Estimation using Local Structure-Specific Shape and Appearance Context

Authors: Anders Glent Buch, Dirk Kraft, Joni-Kristian Kamarainen, Henrik Gordon Petersen, Norbert Krüger

Abstract: We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations… ▽ More We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations which describe the neighborhood of a descriptor. By quantitative evaluations, we show that our descriptors provide high discriminative power compared to state of the art approaches. In addition, we show how to utilize this for the estimation of the alignment pose between two point sets. We present experiments both in controlled and real-life scenarios to validate our approach. △ Less

Submitted 23 August, 2017; originally announced August 2017.

Journal ref: 2013 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:1703.05393 [pdf, other]

Convolutional Low-Resolution Fine-Grained Classification

Authors: Dingding Cai, Ke Chen, Yanlin Qian, Joni-Kristian Kämäräinen

Abstract: Successful fine-grained image classification methods learn subtle details between visually similar (sub-)classes, but the problem becomes significantly more challenging if the details are missing due to low resolution. Encouraged by the recent success of Convolutional Neural Network (CNN) architectures in image classification, we propose a novel resolution-aware deep model which combines convoluti… ▽ More Successful fine-grained image classification methods learn subtle details between visually similar (sub-)classes, but the problem becomes significantly more challenging if the details are missing due to low resolution. Encouraged by the recent success of Convolutional Neural Network (CNN) architectures in image classification, we propose a novel resolution-aware deep model which combines convolutional image super-resolution and convolutional fine-grained classification into a single model in an end-to-end manner. Extensive experiments on the Stanford Cars and Caltech-UCSD Birds 200-2011 benchmarks demonstrate that the proposed model consistently performs better than conventional convolutional net on classifying fine-grained object classes in low-resolution images. △ Less

Submitted 16 October, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

arXiv:1607.03856 [pdf, other]

Deep Structured-Output Regression Learning for Computational Color Constancy

Authors: Yanlin Qian, Ke Chen, Joni-Kristian Kamarainen, Jarno Nikkanen, Jiri Matas

Abstract: Computational color constancy that requires esti- mation of illuminant colors of images is a fundamental yet active problem in computer vision, which can be formulated into a regression problem. To learn a robust regressor for color constancy, obtaining meaningful imagery features and capturing latent correlations across output variables play a vital role. In this work, we introduce a novel deep s… ▽ More Computational color constancy that requires esti- mation of illuminant colors of images is a fundamental yet active problem in computer vision, which can be formulated into a regression problem. To learn a robust regressor for color constancy, obtaining meaningful imagery features and capturing latent correlations across output variables play a vital role. In this work, we introduce a novel deep structured-output regression learning framework to achieve both goals simultaneously. By borrowing the power of deep convolutional neural networks (CNN) originally designed for visual recognition, the proposed framework can automatically discover strong features for white balancing over different illumination conditions and learn a multi-output regressor beyond underlying relationships between features and targets to find the complex interdependence of dif- ferent dimensions of target variables. Experiments on two public benchmarks demonstrate that our method achieves competitive performance in comparison with the state-of-the-art approaches. △ Less

Submitted 11 August, 2016; v1 submitted 13 July, 2016; originally announced July 2016.

Showing 1–31 of 31 results for author: Kämäräinen, J