Search | arXiv e-print repository

FOOD: Facial Authentication and Out-of-Distribution Detection with Short-Range FMCW Radar

Authors: Sabri Mustafa Kahya, Boran Hamdi Sivrikaya, Muhammet Sami Yavuz, Eckehard Steinbach

Abstract: This paper proposes a short-range FMCW radar-based facial authentication and out-of-distribution (OOD) detection framework. Our pipeline jointly estimates the correct classes for the in-distribution (ID) samples and detects the OOD samples to prevent their inaccurate prediction. Our reconstruction-based architecture consists of a main convolutional block with one encoder and multi-decoder configur… ▽ More This paper proposes a short-range FMCW radar-based facial authentication and out-of-distribution (OOD) detection framework. Our pipeline jointly estimates the correct classes for the in-distribution (ID) samples and detects the OOD samples to prevent their inaccurate prediction. Our reconstruction-based architecture consists of a main convolutional block with one encoder and multi-decoder configuration, and intermediate linear encoder-decoder parts. Together, these elements form an accurate human face classifier and a robust OOD detector. For our dataset, gathered using a 60 GHz short-range FMCW radar, our network achieves an average classification accuracy of 98.07% in identifying in-distribution human faces. As an OOD detector, it achieves an average Area Under the Receiver Operating Characteristic (AUROC) curve of 98.50% and an average False Positive Rate at 95% True Positive Rate (FPR95) of 6.20%. Also, our extensive experiments show that the proposed approach outperforms previous OOD detectors in terms of common OOD detection metrics. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted at ICIP 2024

arXiv:2402.17758 [pdf]

ADL4D: Towards A Contextually Rich Dataset for 4D Activities of Daily Living

Authors: Marsil Zakour, Partha Pratim Nath, Ludwig Lohmer, Emre Faik Gökçe, Martin Piccolrovazzi, Constantin Patsch, Yuankai Wu, Rahul Chaudhari, Eckehard Steinbach

Abstract: Hand-Object Interactions (HOIs) are conditioned on spatial and temporal contexts like surrounding objects, previous actions, and future intents (for example, gras** and handover actions vary greatly based on objects proximity and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over time) are limited to one subject interacting with one object only. This restricts the genera… ▽ More Hand-Object Interactions (HOIs) are conditioned on spatial and temporal contexts like surrounding objects, previous actions, and future intents (for example, gras** and handover actions vary greatly based on objects proximity and trajectory obstruction). However, existing datasets for 4D HOI (3D HOI over time) are limited to one subject interacting with one object only. This restricts the generalization of learning-based HOI methods trained on those datasets. We introduce ADL4D, a dataset of up to two subjects interacting with different sets of objects performing Activities of Daily Living (ADL) like breakfast or lunch preparation activities. The transition between multiple objects to complete a certain task over time introduces a unique context lacking in existing datasets. Our dataset consists of 75 sequences with a total of 1.1M RGB-D frames, hand and object poses, and per-hand fine-grained action annotations. We develop an automatic system for multi-view multi-hand 3D pose annotation capable of tracking hand poses over time. We integrate and test it against publicly available datasets. Finally, we evaluate our dataset on the tasks of Hand Mesh Recovery (HMR) and Hand Action Segmentation (HAS). △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17544 [pdf, other]

Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Authors: H. Burak Dogaroglu, A. Burakhan Koyuncu, Atanas Boev, Elena Alshina, Eckehard Steinbach

Abstract: As learned image codecs (LICs) become more prevalent, their low coding efficiency for out-of-distribution data becomes a bottleneck for some applications. To improve the performance of LICs for screen content (SC) images without breaking backwards compatibility, we propose to introduce parameterized and invertible linear transformations into the coding pipeline without changing the underlying base… ▽ More As learned image codecs (LICs) become more prevalent, their low coding efficiency for out-of-distribution data becomes a bottleneck for some applications. To improve the performance of LICs for screen content (SC) images without breaking backwards compatibility, we propose to introduce parameterized and invertible linear transformations into the coding pipeline without changing the underlying baseline codec's operation flow. We design two neural networks to act as prefilters and postfilters in our setup to increase the coding efficiency and help with the recovery from coding artifacts. Our end-to-end trained solution achieves up to 10% bitrate savings on SC compression compared to the baseline LICs while introducing only 1% extra parameters. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures, 2 tables

arXiv:2312.08894 [pdf, other]

HAROOD: Human Activity Classification and Out-of-Distribution Detection with Short-Range FMCW Radar

Authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach

Abstract: We propose HAROOD as a short-range FMCW radar-based human activity classifier and out-of-distribution (OOD) detector. It aims to classify human sitting, standing, and walking activities and to detect any other moving or stationary object as OOD. We introduce a two-stage network. The first stage is trained with a novel loss function that includes intermediate reconstruction loss, intermediate contr… ▽ More We propose HAROOD as a short-range FMCW radar-based human activity classifier and out-of-distribution (OOD) detector. It aims to classify human sitting, standing, and walking activities and to detect any other moving or stationary object as OOD. We introduce a two-stage network. The first stage is trained with a novel loss function that includes intermediate reconstruction loss, intermediate contrastive loss, and triplet loss. The second stage uses the first stage's output as its input and is trained with cross-entropy loss. It creates a simple classifier that performs the activity classification. On our dataset collected by 60 GHz short-range FMCW radar, we achieve an average classification accuracy of 96.51%. Also, we achieve an average AUROC of 95.04% as an OOD detector. Additionally, our extensive evaluations demonstrate the superiority of HAROOD over the state-of-the-art OOD detection methods in terms of standard OOD detection metrics. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted at ICASSP 2024

arXiv:2310.18099 [pdf, ps, other]

Enabling Acoustic Audience Feedback in Large Virtual Events

Authors: Tamay Aykut, Markus Hofbauer, Christopher Kuhn, Eckehard Steinbach, Bernd Girod

Abstract: The COVID-19 pandemic shifted many events in our daily lives into the virtual domain. While virtual conference systems provide an alternative to physical meetings, larger events require a muted audience to avoid an accumulation of background noise and distorted audio. However, performing artists strongly rely on the feedback of their audience. We propose a concept for a virtual audience framework… ▽ More The COVID-19 pandemic shifted many events in our daily lives into the virtual domain. While virtual conference systems provide an alternative to physical meetings, larger events require a muted audience to avoid an accumulation of background noise and distorted audio. However, performing artists strongly rely on the feedback of their audience. We propose a concept for a virtual audience framework which supports all participants with the ambience of a real audience. Audience feedback is collected locally, allowing users to express enthusiasm or discontent by selecting means such as clap**, whistling, booing, and laughter. This feedback is sent as abstract information to a virtual audience server. We broadcast the combined virtual audience feedback information to all participants, which can be synthesized as a single acoustic feedback by the client. The synthesis can be done by turning the collective audience feedback into a prompt that is fed to state-of-the-art models such as AudioGen. This way, each user hears a single acoustic feedback sound of the entire virtual event, without requiring to unmute or risk hearing distorted, unsynchronized feedback. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 4 pages, 2 figures

arXiv:2310.05600 [pdf, other]

Care3D: An Active 3D Object Detection Dataset of Real Robotic-Care Environments

Authors: Michael G. Adam, Sebastian Eger, Martin Piccolrovazzi, Maged Iskandar, Joern Vogel, Alexander Dietrich, Seongjien Bien, Jon Skerlj, Abdeldjallil Naceri, Eckehard Steinbach, Alin Albu-Schaeffer, Sami Haddadin, Wolfram Burgard

Abstract: As labor shortage increases in the health sector, the demand for assistive robotics grows. However, the needed test data to develop those robots is scarce, especially for the application of active 3D object detection, where no real data exists at all. This short paper counters this by introducing such an annotated dataset of real environments. The captured environments represent areas which are al… ▽ More As labor shortage increases in the health sector, the demand for assistive robotics grows. However, the needed test data to develop those robots is scarce, especially for the application of active 3D object detection, where no real data exists at all. This short paper counters this by introducing such an annotated dataset of real environments. The captured environments represent areas which are already in use in the field of robotic health care research. We further provide ground truth data within one room, for assessing SLAM algorithms running directly on a health care robot. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2308.02396 [pdf, other]

HOOD: Real-Time Human Presence and Out-of-Distribution Detection Using FMCW Radar

Authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach

Abstract: Detecting human presence indoors with millimeter-wave frequency-modulated continuous-wave (FMCW) radar faces challenges from both moving and stationary clutter. This work proposes a robust and real-time capable human presence and out-of-distribution (OOD) detection method using 60 GHz short-range FMCW radar. HOOD solves the human presence and OOD detection problems simultaneously in a single pipel… ▽ More Detecting human presence indoors with millimeter-wave frequency-modulated continuous-wave (FMCW) radar faces challenges from both moving and stationary clutter. This work proposes a robust and real-time capable human presence and out-of-distribution (OOD) detection method using 60 GHz short-range FMCW radar. HOOD solves the human presence and OOD detection problems simultaneously in a single pipeline. Our solution relies on a reconstruction-based architecture and works with radar macro and micro range-Doppler images (RDIs). HOOD aims to accurately detect the presence of humans in the presence or absence of moving and stationary disturbers. Since HOOD is also an OOD detector, it aims to detect moving or stationary clutters as OOD in humans' absence and predicts the current scene's output as "no presence." HOOD performs well in diverse scenarios, demonstrating its effectiveness across different human activities and situations. On our dataset collected with a 60 GHz short-range FMCW radar, we achieve an average AUROC of 94.36%. Additionally, our extensive evaluations and experiments demonstrate that HOOD outperforms state-of-the-art (SOTA) OOD detection methods in terms of common OOD detection metrics. Importantly, HOOD also perfectly fits on Raspberry Pi 3B+ with an ARM Cortex-A53 CPU, which showcases its versatility across different hardware environments. Videos of our human presence detection experiments are available at: https://muskahya.github.io/HOOD △ Less

Submitted 26 March, 2024; v1 submitted 24 July, 2023; originally announced August 2023.

Comments: 10 pages, 2 figures, project page: https://muskahya.github.io/HOOD

arXiv:2306.14287 [pdf, other]

Efficient Contextformer: Spatio-Channel Window Attention for Fast Context Modeling in Learned Image Compression

Authors: A. Burakhan Koyuncu, Panqi Jia, Atanas Boev, Elena Alshina, Eckehard Steinbach

Abstract: Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a significant computational effort. In this work, we introduce the Efficient Contextformer (eContextformer) - a computationally efficient transformer-based autor… ▽ More Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a significant computational effort. In this work, we introduce the Efficient Contextformer (eContextformer) - a computationally efficient transformer-based autoregressive context model for learned image compression. The eContextformer efficiently fuses the patch-wise, checkered, and channel-wise grou** techniques for parallel context modeling, and introduces a shifted window spatio-channel attention mechanism. We explore better training strategies and architectural designs and introduce additional complexity optimizations. During decoding, the proposed optimization techniques dynamically scale the attention span and cache the previous attention computations, drastically reducing the model and runtime complexity. Compared to the non-parallel approach, our proposal has ~145x lower model complexity and ~210x faster decoding speed, and achieves higher average bit savings on Kodak, CLIC2020, and Tecnick datasets. Additionally, the low complexity of our context model enables online rate-distortion algorithms, which further improve the compression performance. We achieve up to 17% bitrate savings over the intra coding of Versatile Video Coding (VVC) Test Model (VTM) 16.2 and surpass various learning-based compression models. △ Less

Submitted 27 February, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted for IEEE TCSVT (14 pages, 10 figures, 9 tables)

arXiv:2303.10195 [pdf, other]

Remote Task-oriented Grasp Area Teaching By Non-Experts through Interactive Segmentation and Few-Shot Learning

Authors: Furkan Kaynar, Sudarshan Rajagopalan, Shaobo Zhou, Eckehard Steinbach

Abstract: A robot operating in unstructured environments must be able to discriminate between different gras** styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented gras**. We propose a novel two-step framework towards this aim. The first step involves gr… ▽ More A robot operating in unstructured environments must be able to discriminate between different gras** styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented gras**. We propose a novel two-step framework towards this aim. The first step involves grasp area estimation by segmentation. We receive grasp area demonstrations for a new task via interactive segmentation, and learn from these few demonstrations to estimate the required grasp area on an unseen scene for the given task. The second step is autonomous grasp estimation in the segmented region. To train the segmentation network for few-shot learning, we built a grasp area segmentation (GAS) dataset with 10089 images grouped into 1121 segmentation tasks. We benefit from an efficient meta learning algorithm for training for few-shot adaptation. Experimental evaluation showed that our method successfully detects the correct grasp area on the respective objects in unseen test scenes and effectively allows remote teaching of new grasp strategies by non-experts. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: Presented at the AAAI Workshop on Artificial Intelligence for User-Centric Assistance in at-Home Tasks (2023)

arXiv:2303.09933 [pdf, ps, other]

Large-Scale Collaborative Writing: Technical Challenges and Recommendations

Authors: Markus Hofbauer, Christoph Bachhuber, Christopher Kuhn, Sebastian Schwarz, Bart Kroon, Eckehard Steinbach

Abstract: Collaborative writing is essential for teams that create documents together. Creating documents in large-scale collaborations is a challenging task that requires an efficient workflow. The design of such a workflow has received comparatively little attention. Conventional solutions such as working on a single Microsoft Word document or a shared online document are still widely used. In this paper,… ▽ More Collaborative writing is essential for teams that create documents together. Creating documents in large-scale collaborations is a challenging task that requires an efficient workflow. The design of such a workflow has received comparatively little attention. Conventional solutions such as working on a single Microsoft Word document or a shared online document are still widely used. In this paper, we propose a new workflow consisting of a combination of the lightweight markup language AsciiDoc together with the state-of-the-art version control system Git. The proposed process makes use of well-established workflows in the field of software development that have grown over decades. We present a detailed comparison of the proposed markup + Git workflow to Word and Word for the Web as the most prominent examples for conventional approaches.We argue that the proposed approach provides significant benefits regarding scalability, flexibility, and structuring of most collaborative writing tasks, both in academia and industry. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 14 pages, 2 figures, 1 table

arXiv:2303.06232 [pdf, other]

MCROOD: Multi-Class Radar Out-Of-Distribution Detection

Authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach

Abstract: Out-of-distribution (OOD) detection has recently received special attention due to its critical role in safely deploying modern deep learning (DL) architectures. This work proposes a reconstruction-based multi-class OOD detector that operates on radar range doppler images (RDIs). The detector aims to classify any moving object other than a person sitting, standing, or walking as OOD. We also provi… ▽ More Out-of-distribution (OOD) detection has recently received special attention due to its critical role in safely deploying modern deep learning (DL) architectures. This work proposes a reconstruction-based multi-class OOD detector that operates on radar range doppler images (RDIs). The detector aims to classify any moving object other than a person sitting, standing, or walking as OOD. We also provide a simple yet effective pre-processing technique to detect minor human body movements like breathing. The simple idea is called respiration detector (RESPD) and eases the OOD detection, especially for human sitting and standing classes. On our dataset collected by 60GHz short-range FMCW Radar, we achieve AUROCs of 97.45%, 92.13%, and 96.58% for sitting, standing, and walking classes, respectively. We perform extensive experiments and show that our method outperforms state-of-the-art (SOTA) OOD detection methods. Also, our pipeline performs 24 times faster than the second-best method and is very suitable for real-time processing. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted at ICASSP 2023

arXiv:2302.14192 [pdf, other]

Reconstruction-based Out-of-Distribution Detection for Short-Range FMCW Radar

Authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach

Abstract: Out-of-distribution (OOD) detection recently has drawn attention due to its critical role in the safe deployment of modern neural network architectures in real-world applications. The OOD detectors aim to distinguish samples that lie outside the training distribution in order to avoid the overconfident predictions of machine learning models on OOD data. Existing detectors, which mainly rely on the… ▽ More Out-of-distribution (OOD) detection recently has drawn attention due to its critical role in the safe deployment of modern neural network architectures in real-world applications. The OOD detectors aim to distinguish samples that lie outside the training distribution in order to avoid the overconfident predictions of machine learning models on OOD data. Existing detectors, which mainly rely on the logit, intermediate feature space, softmax score, or reconstruction loss, manage to produce promising results. However, most of these methods are developed for the image domain. In this study, we propose a novel reconstruction-based OOD detector to operate on the radar domain. Our method exploits an autoencoder (AE) and its latent representation to detect the OOD samples. We propose two scores incorporating the patch-based reconstruction loss and the energy value calculated from the latent representations of each patch. We achieve an AUROC of 90.72% on our dataset collected by using 60 GHz short-range FMCW Radar. The experiments demonstrate that, in terms of AUROC and AUPR, our method outperforms the baseline (AE) and the other state-of-the-art methods. Also, thanks to its model size of 641 kB, our detector is suitable for embedded usage. △ Less

Submitted 15 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted at EUSIPCO 2023

arXiv:2207.03720 [pdf, other]

doi 10.1109/ICIP46576.2022.9897588

Bounding Box Disparity: 3D Metrics for Object Detection With Full Degree of Freedom

Authors: Michael G. Adam, Martin Piccolrovazzi, Sebastian Eger, Eckehard Steinbach

Abstract: The most popular evaluation metric for object detection in 2D images is Intersection over Union (IoU). Existing implementations of the IoU metric for 3D object detection usually neglect one or more degrees of freedom. In this paper, we first derive the analytic solution for three dimensional bounding boxes. As a second contribution, a closed-form solution of the volume-to-volume distance is derive… ▽ More The most popular evaluation metric for object detection in 2D images is Intersection over Union (IoU). Existing implementations of the IoU metric for 3D object detection usually neglect one or more degrees of freedom. In this paper, we first derive the analytic solution for three dimensional bounding boxes. As a second contribution, a closed-form solution of the volume-to-volume distance is derived. Finally, the Bounding Box Disparity is proposed as a combined positive continuous metric. We provide open source implementations of the three metrics as standalone python functions, as well as extensions to the Open3D library and as ROS nodes. △ Less

Submitted 11 November, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: 4 pages+1 Page references, 4 Figures, Best Paper Award First Runner-Up @ ICIP2022

arXiv:2203.02452 [pdf, other]

doi 10.1007/978-3-031-19800-7_26

Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

Authors: A. Burakhan Koyuncu, Han Gao, Atanas Boev, Georgii Gaikov, Elena Alshina, Eckehard Steinbach

Abstract: Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adapti… ▽ More Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM. △ Less

Submitted 20 July, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Accepted at ECCV 2022; 31 pages (14 main paper + References + 13 Appendix)

arXiv:1909.11226 [pdf, other]

Minimal Work: A Grasp Quality Metric for Deformable Hollow Objects

Authors: **gyi Xu, Michael Danielczuk, Jeff Ichnowski, Jeffrey Mahler, Eckehard Steinbach, Ken Goldberg

Abstract: Robot gras** of deformable hollow objects such as plastic bottles and cups is challenging as the grasp should resist disturbances while minimally deforming the object so as not to damage it or dislodge liquids. We propose minimal work as a novel grasp quality metric that combines wrench resistance and the object deformation. We introduce an efficient algorithm to compute required work to resist… ▽ More Robot gras** of deformable hollow objects such as plastic bottles and cups is challenging as the grasp should resist disturbances while minimally deforming the object so as not to damage it or dislodge liquids. We propose minimal work as a novel grasp quality metric that combines wrench resistance and the object deformation. We introduce an efficient algorithm to compute required work to resist an external wrench for a manipulation task by solving a linear program. The algorithm first computes the minimum required grasp force and an estimation of the gripper jaw displacements based on the object deformability at different locations measured with physical experiments. The work done by the jaws is the product of the grasp force and the displacements. The grasp quality metric is computed based on the required work under perturbations of grasp poses to address uncertainties in actuation. We collect 460 physical grasps with a UR5 robot and a Robotiq gripper. Physical experiments suggest the minimal work quality metric reaches 74.2% balanced accuracy and is up to 24.2% higher than classical wrench-based quality metrics, where the balanced accuracy is the raw accuracy normalized by the number of successful and failed real-world grasps. △ Less

Submitted 24 September, 2019; originally announced September 2019.

arXiv:1909.06885 [pdf, other]

6DLS: Modeling Nonplanar Frictional Surface Contacts for Gras** using 6D Limit Surfaces

Authors: **gyi Xu, Tamay Aykut, Daolin Ma, Eckehard Steinbach

Abstract: Robot gras** with deformable gripper jaws results in nonplanar surface contacts if the jaws deform to the nonplanar local geometry of an object. The frictional force and torque that can be transmitted through a nonplanar surface contact are both three-dimensional, resulting in a six-dimensional frictional wrench (6DFW). Applying traditional planar contact models to such contacts leads to over-co… ▽ More Robot gras** with deformable gripper jaws results in nonplanar surface contacts if the jaws deform to the nonplanar local geometry of an object. The frictional force and torque that can be transmitted through a nonplanar surface contact are both three-dimensional, resulting in a six-dimensional frictional wrench (6DFW). Applying traditional planar contact models to such contacts leads to over-conservative results as the models do not consider the nonplanar surface geometry and only compute a three-dimensional subset of the 6DFW. To address this issue, we derive the 6DFW for nonplanar surfaces by combining concepts of differential geometry and Coulomb friction. We also propose two 6D limit surface (6DLS) models, generalized from well-known three-dimensional LS (3DLS) models, which describe the friction-motion constraints for a contact. We evaluate the 6DLS models by fitting them to the 6DFW samples obtained from six parametric surfaces and 2,932 meshed contacts from finite element method simulations of 24 rigid objects. We further present an algorithm to predict multicontact grasp success by building a grasp wrench space with the 6DLS model of each contact. To evaluate the algorithm, we collected 1,035 physical grasps of ten 3D-printed objects with a KUKA robot and a deformable parallel-jaw gripper. In our experiments, the algorithm achieves 66.8% precision, a metric inversely related to false positive predictions, and 76.9% recall, a metric inversely related to false negative predictions. The 6DLS models increase recall by up to 26.1% over 3DLS models with similar precision. △ Less

Submitted 27 March, 2021; v1 submitted 15 September, 2019; originally announced September 2019.

arXiv:1906.08575 [pdf, other]

Probabilistic Tile Visibility-Based Server-Side Rate Adaptation for Adaptive 360-Degree Video Streaming

Authors: Junni Zou, Chenglin Li, Chengming Liu, Qin Yang, Hongkai Xiong, Eckehard Steinbach

Abstract: In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neural network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distr… ▽ More In this paper, we study the server-side rate adaptation problem for streaming tile-based adaptive 360-degree videos to multiple users who are competing for transmission resources at the network bottleneck. Specifically, we develop a convolutional neural network (CNN)-based viewpoint prediction model to capture the nonlinear relationship between the future and historical viewpoints. A Laplace distribution model is utilized to characterize the probability distribution of the prediction error. Given the predicted viewpoint, we then map the viewport in the spherical space into its corresponding planar projection in the 2-D plane, and further derive the visibility probability of each tile based on the planar projection and the prediction error probability. According to the visibility probability, tiles are classified as viewport, marginal and invisible tiles. The server-side tile rate allocation problem for multiple users is then formulated as a non-linear discrete optimization problem to minimize the overall received video distortion of all users and the quality difference between the viewport and marginal tiles of each user, subject to the transmission capacity constraints and users' specific viewport requirements. We develop a steepest descent algorithm to solve this non-linear discrete optimization problem, by initializing the feasible starting point in accordance with the optimal solution of its continuous relaxation. Extensive experimental results show that the proposed algorithm can achieve a near-optimal solution, and outperforms the existing rate adaptation schemes for tile-based adaptive 360-video streaming. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 33 pages (single column and double space) with 15 figures, submitted to IEEE Journal of Selected Topics in Signal Processing

arXiv:1805.09447 [pdf, other]

MAVI: A Research Platform for Telepresence and Teleoperation

Authors: Mojtaba Karimi, Tamay Aykut, Eckehard Steinbach

Abstract: One of the goals in telepresence is to be able to perform daily tasks remotely. A key requirement for this is a robust and reliable mobile robotic platform. Ideally, such a platform should support 360-degree stereoscopic vision and semi-autonomous telemanipulation ability. In this technical report, we present our latest work on designing the telepresence mobile robot platform called MAVI. MAVI is… ▽ More One of the goals in telepresence is to be able to perform daily tasks remotely. A key requirement for this is a robust and reliable mobile robotic platform. Ideally, such a platform should support 360-degree stereoscopic vision and semi-autonomous telemanipulation ability. In this technical report, we present our latest work on designing the telepresence mobile robot platform called MAVI. MAVI is a low-cost and robust but extendable platform for research and educational purpose, especially for machine vision and human interaction in telepresence setups. The MAVI platform offers a balance between modularity, capabilities, accessibility, cost and an open source software framework. With a range of different sensors such as Inertial Measurement Unit (IMU), 360-degree laser rangefinder, ultrasonic proximity sensors, and force sensors along with smart actuation in omnidirectional holonomic locomotion, high load cylindrical manipulator, and actuated stereoscopic Pan-Tilt-Roll Unit (PTRU), not only MAVI can provide the basic feedbacks from its surroundings, but also can interact within the remote environment in multiple ways. The software architecture of MAVI is based on the Robot Operating System (ROS) which allows for the easy integration of the state-of-the-art software packages. △ Less

Submitted 23 May, 2018; originally announced May 2018.

arXiv:1804.02077 [pdf, other]

doi 10.1109/LRA.2018.2792681

Noise-resistant Deep Learning for Object Classification in 3D Point Clouds Using a Point Pair Descriptor

Authors: Dmytro Bobkov, Sili Chen, Ruiqing Jian, Muhammad Iqbal, Eckehard Steinbach

Abstract: Object retrieval and classification in point cloud data is challenged by noise, irregular sampling density and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusion and achieves high retrieval accuracy. We further show how the proposed descriptor can be used in a 4D convolutional neural network for the task of object classification. We propose a… ▽ More Object retrieval and classification in point cloud data is challenged by noise, irregular sampling density and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusion and achieves high retrieval accuracy. We further show how the proposed descriptor can be used in a 4D convolutional neural network for the task of object classification. We propose a novel 4D convolutional layer that is able to learn class-specific clusters in the descriptor histograms. Finally, we provide experimental validation on 3 benchmark datasets, which confirms the superiority of the proposed approach. △ Less

Submitted 5 April, 2018; originally announced April 2018.

Comments: 8 pages

Journal ref: IEEE Robotics and Automation Letters 2018 Volume 3, Issue 2 IEEE Robotics and Automation Letters IEEE Robotics and Automation Letters

arXiv:1705.05613 [pdf, other]

Toward QoE-Driven Dynamic Control Scheme Switching for Time-Delayed Teleoperation Systems: A Dedicated Case Study

Authors: Xiao Xu, Qian Liu, Eckehard Steinbach

Abstract: Networked teleoperation with haptic feedback is a prime example for the emerging Tactile Internet, which requires a careful orchestration of haptic communication and control. One major challenge in this context is how to maximize the user's quality-of-experience (QoE) while ensuring at the same time the stability of the global control loop in the presence of communication delay. In this paper, we… ▽ More Networked teleoperation with haptic feedback is a prime example for the emerging Tactile Internet, which requires a careful orchestration of haptic communication and control. One major challenge in this context is how to maximize the user's quality-of-experience (QoE) while ensuring at the same time the stability of the global control loop in the presence of communication delay. In this paper, we propose a dynamic control scheme switching strategy for teleoperation systems, which maximizes the QoE for time-varying communication delay. In order to validate the feasibility of the proposed approach, we perform a dedicated case study for a virtual teleoperation environment consisting of a one-dimensional spring-damper system, and conduct extensive subjective tests under various delay conditions for two control schemes: (1) teleoperation with the time-domain passivity approach (TDPA), which is highly delay-sensitive but supports highly dynamic interaction between the operator and a potentially quickly changing remote environment; (2) model-mediated teleoperation (MMT), which is tolerable to relatively larger communication delays, but unsuitable for quickly changing, highly dynamic remote environments. For both schemes, we use recently proposed extensions, which incorporate perceptual data reduction to reduce the required packet rate between the operator and the teleoperator. One key contribution of this paper lies in the exploration of the intrinsic relationship among QoE, communication delay and the control schemes which provides a fundamental guidance, not only to this research, but also to the future joint optimization of communication and control for time-delayed teleoperation systems. △ Less

Submitted 16 May, 2017; originally announced May 2017.

Comments: 6 pages, 9 figures, conference

arXiv:1512.06658 [pdf, other]

Deep Learning for Surface Material Classification Using Haptic And Visual Information

Authors: Haitian Zheng, Lu Fang, Mengqi Ji, Matti Strese, Yigitcan Ozer, Eckehard Steinbach

Abstract: When a user scratches a hand-held rigid tool across an object surface, an acceleration signal can be captured, which carries relevant information about the surface. More importantly, such a haptic signal is complementary to the visual appearance of the surface, which suggests the combination of both modalities for the recognition of the surface material. In this paper, we present a novel deep lear… ▽ More When a user scratches a hand-held rigid tool across an object surface, an acceleration signal can be captured, which carries relevant information about the surface. More importantly, such a haptic signal is complementary to the visual appearance of the surface, which suggests the combination of both modalities for the recognition of the surface material. In this paper, we present a novel deep learning method dealing with the surface material classification problem based on a Fully Convolutional Network (FCN), which takes as input the aforementioned acceleration signal and a corresponding image of the surface texture. Compared to previous surface material classification solutions, which rely on a careful design of hand-crafted domain-specific features, our method automatically extracts discriminative features utilizing the advanced deep learning methodologies. Experiments performed on the TUM surface material database demonstrate that our method achieves state-of-the-art classification accuracy robustly and efficiently. △ Less

Submitted 1 May, 2016; v1 submitted 21 December, 2015; originally announced December 2015.

Comments: 8 pages, under review as a paper at Transactions on Multimedia

arXiv:1510.01134 [pdf, other]

doi 10.1109/ICIP.2016.7532735

A System for Precise End-to-End Delay Measurements in Video Communication

Authors: Christoph Bachhuber, Eckehard Steinbach

Abstract: Low delay video transmission is becoming increasingly important. Delay critical, video enabled applications range from teleoperation scenarios such as controlling drones or telesurgery to autonomous control through computer vision algorithms applied on real-time video. To judge the quality of the video transmission in such a system, it is important to be able to precisely measure the end-to-end (E… ▽ More Low delay video transmission is becoming increasingly important. Delay critical, video enabled applications range from teleoperation scenarios such as controlling drones or telesurgery to autonomous control through computer vision algorithms applied on real-time video. To judge the quality of the video transmission in such a system, it is important to be able to precisely measure the end-to-end (E2E) delay of the transmitted video. We present a low-complexity system that automatically takes pairwise independent measurements of E2E delay. The precision can be far below the millisecond order, mainly limited by the sampling rate of the measurement system. In our implementation, we achieve a precision of 0.5 milliseconds with a sampling rate of 2kHz. △ Less

Submitted 22 August, 2016; v1 submitted 5 October, 2015; originally announced October 2015.

Comments: 5 pages, 4 figures, IEEE International Conference on Image Processing (ICIP 2016), Phoenix, AZ, USA, 2016

arXiv:1506.08316 [pdf, other]

Keypoint Encoding for Improved Feature Extraction from Compressed Video at Low Bitrates

Authors: Jianshu Chao, Eckehard Steinbach

Abstract: In many mobile visual analysis applications, compressed video is transmitted over a communication network and analyzed by a server. Typical processing steps performed at the server include keypoint detection, descriptor calculation, and feature matching. Video compression has been shown to have an adverse effect on feature-matching performance. The negative impact of compression can be reduced by… ▽ More In many mobile visual analysis applications, compressed video is transmitted over a communication network and analyzed by a server. Typical processing steps performed at the server include keypoint detection, descriptor calculation, and feature matching. Video compression has been shown to have an adverse effect on feature-matching performance. The negative impact of compression can be reduced by using the keypoints extracted from the uncompressed video to calculate descriptors from the compressed video. Based on this observation, we propose to provide these keypoints to the server as side information and to extract only the descriptors from the compressed video. First, we introduce four different frame types for keypoint encoding to address different types of changes in video content. These frame types represent a new scene, the same scene, a slowly changing scene, or a rapidly moving scene and are determined by comparing features between successive video frames. Then, we propose Intra, Skip and Inter modes of encoding the keypoints for different frame types. For example, keypoints for new scenes are encoded using the Intra mode, and keypoints for unchanged scenes are skipped. As a result, the bitrate of the side information related to keypoint encoding is significantly reduced. Finally, we present pairwise matching and image retrieval experiments conducted to evaluate the performance of the proposed approach using the Stanford mobile augmented reality dataset and 720p format videos. The results show that the proposed approach offers significantly improved feature matching and image retrieval performance at a given bitrate. △ Less

Submitted 4 March, 2016; v1 submitted 27 June, 2015; originally announced June 2015.

Showing 1–23 of 23 results for author: Steinbach, E