Search | arXiv e-print repository

Real2Code: Reconstruct Articulated Objects via Code Generation

Authors: Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song

Abstract: We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model. We then represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model (LLM) to predict joint articulation as co… ▽ More We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model. We then represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model (LLM) to predict joint articulation as code. By leveraging pre-trained vision and language models, our approach scales elegantly with the number of articulated parts, and generalizes from synthetic training data to real world objects in unstructured environments. Experimental results demonstrate that Real2Code significantly outperforms previous state-of-the-art in reconstruction accuracy, and is the first approach to extrapolate beyond objects' structural complexity in the training set, and reconstructs objects with up to 10 articulated parts. When incorporated with a stereo reconstruction model, Real2Code also generalizes to real world objects from a handful of multi-view RGB images, without the need for depth or camera information. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2404.12524 [pdf, other]

DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

Authors: Dominik Bauer, Zhenjia Xu, Shuran Song

Abstract: Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a… ▽ More Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a denoising autoencoder represents deformable objects of varying topology as sets of latent codes. Second, a visual predictive model performs autoregressive set prediction to determine long-horizon geometrical deformation and topological changes purely in latent space. Given a partial initial state and desired manipulation trajectories, it infers all resulting object geometries and topologies at each step. DoughNet thereby allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals. Our experiments in simulated and real environments show that DoughNet is able to significantly outperform related approaches that consider deformation only as geometrical change. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Under review. 17 pages, 14 figures

arXiv:2404.08371 [pdf, other]

Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids

Authors: Fabian Böhm, Daniel Bauer, Nils Kohl, Christie Alappat, Dominik Thönnes, Marcus Mohr, Harald Köstler, Ulrich Rüde

Abstract: This paper introduces a code generator designed for node-level optimized, extreme-scalable, matrix-free finite element operators on hybrid tetrahedral grids. It optimizes the local evaluation of bilinear forms through various techniques including tabulation, relocation of loop invariants, and inter-element vectorization - implemented as transformations of an abstract syntax tree. A key contributio… ▽ More This paper introduces a code generator designed for node-level optimized, extreme-scalable, matrix-free finite element operators on hybrid tetrahedral grids. It optimizes the local evaluation of bilinear forms through various techniques including tabulation, relocation of loop invariants, and inter-element vectorization - implemented as transformations of an abstract syntax tree. A key contribution is the development, analysis, and generation of efficient loop patterns that leverage the local structure of the underlying tetrahedral grid. These significantly enhance cache locality and arithmetic intensity, mitigating bandwidth-pressure associated with compute-sparse, low-order operators. The paper demonstrates the generator's capabilities through a comprehensive educational cycle of performance analysis, bottleneck identification, and emission of dedicated optimizations. For three differential operators ($-Δ$, $-\nabla \cdot (k(\mathbf{x})\, \nabla\,)$, $α(\mathbf{x})\, \mathbf{curl}\ \mathbf{curl} + β(\mathbf{x}) $), we determine the set of most effective optimizations. Applied by the generator, they result in speed-ups of up to 58$\times$ compared to reference implementations. Detailed node-level performance analysis yields matrix-free operators with a throughput of 1.3 to 2.1 GDoF/s, achieving up to 62% peak performance on a 36-core Intel Ice Lake socket. Finally, the solution of the curl-curl problem with more than a trillion ($ 10^{12}$) degrees of freedom on 21504 processes in less than 50 seconds demonstrates the generated operators' performance and extreme-scalability as part of a full multigrid solver. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 22 pages

MSC Class: 65F50; 65N30; 65N55; 65Y20; 65F10

arXiv:2308.01792 [pdf, other]

Fundamental Data Structures for Matrix-Free Finite Elements on Hybrid Tetrahedral Grids

Authors: Nils Kohl, Daniel Bauer, Fabian Böhm, Ulrich Rüde

Abstract: This paper presents efficient data structures for the implementation of matrix-free finite element methods on block-structured, hybrid tetrahedral grids. It provides a complete categorization of all geometric sub-objects that emerge from the regular refinement of the unstructured, tetrahedral coarse grid and describes efficient iteration patterns and analytical linearization functions for the mapp… ▽ More This paper presents efficient data structures for the implementation of matrix-free finite element methods on block-structured, hybrid tetrahedral grids. It provides a complete categorization of all geometric sub-objects that emerge from the regular refinement of the unstructured, tetrahedral coarse grid and describes efficient iteration patterns and analytical linearization functions for the map** of coefficients to memory addresses. This foundation enables the implementation of fast, extreme-scalable, matrix-free, iterative solvers, and in particular geometric multigrid methods by design. Their application to the variable-coefficient Stokes system subject to an enriched Galerkin discretization and to the curl-curl problem discretized with Nédélec edge elements showcases the flexibility of the implementation. Eventually, the solution of a curl-curl problem with $1.6 \cdot 10^{11}$ (more than one hundred billion) unknowns on more than $32000$ processes with a matrix-free full multigrid solver demonstrates its extreme-scalability. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 21 pages

arXiv:2307.15671 [pdf, other]

TrackAgent: 6D Object Tracking via Reinforcement Learning

Authors: Konstantin Röhrl, Dominik Bauer, Timothy Patten, Markus Vincze

Abstract: Tracking an object's 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or… ▽ More Tracking an object's 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent's uncertainty and a rendering-based mask propagation are effective reinitialization triggers. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: International Conference on Computer Vision Systems (ICVS) 2023

arXiv:2307.12172 [pdf, ps, other]

Challenges for Monocular 6D Object Pose Estimation in Robotics

Authors: Stefan Thalhammer, Dominik Bauer, Peter Hönig, Jean-Baptiste Weibel, José García-Rodríguez, Markus Vincze

Abstract: Object pose estimation is a core perception task that enables, for example, object gras** and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state… ▽ More Object pose estimation is a core perception task that enables, for example, object gras** and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.11827

arXiv:2306.09922 [pdf, ps, other]

Learning to Summarize and Answer Questions about a Virtual Robot's Past Actions

Authors: Chad DeChant, Iretiayo Akinola, Daniel Bauer

Abstract: When robots perform long action sequences, users will want to easily and reliably find out what they have done. We therefore demonstrate the task of learning to summarize and answer questions about a robot agent's past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-cent… ▽ More When robots perform long action sequences, users will want to easily and reliably find out what they have done. We therefore demonstrate the task of learning to summarize and answer questions about a robot agent's past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt. To enable training of question answering, we develop a method to automatically generate English-language questions and answers about objects, actions, and the temporal order in which actions occurred during episodes of robot action in the virtual environment. Training one model to both summarize and answer questions enables zero-shot transfer of representations of objects learned through question answering to improved action summarization. % involving objects not seen in training to summarize. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.04784 [pdf, other]

Designing Anthropomorphic Soft Hands through Interaction

Authors: Pragna Mannam, Kenneth Shaw, Dominik Bauer, Jean Oh, Deepak Pathak, Nancy Pollard

Abstract: Modeling and simulating soft robot hands can aid in design iteration for complex and high degree-of-freedom (DoF) morphologies. This can be further supplemented by iterating on the design based on its performance in real world manipulation tasks. However, iterating in the real world requires an approach that allows us to test new designs quickly at low costs. In this paper, we leverage rapid proto… ▽ More Modeling and simulating soft robot hands can aid in design iteration for complex and high degree-of-freedom (DoF) morphologies. This can be further supplemented by iterating on the design based on its performance in real world manipulation tasks. However, iterating in the real world requires an approach that allows us to test new designs quickly at low costs. In this paper, we leverage rapid prototy** of the hand using 3D-printing, and utilize teleoperation to evaluate the hand in real world manipulation tasks. Using this method, we design a 3D-printed 16-DoF dexterous anthropomorphic soft hand (DASH) and iteratively improve its design over five iterations. Rapid prototy** techniques such as 3D-printing allow us to directly evaluate the fabricated hand without modeling it in simulation. We show that the design improves over five design iterations through evaluating the hand's performance in 30 real-world teleoperated manipulation tasks. Testing over 900 demonstrations shows that our final version of DASH can solve 19 of the 30 tasks compared to Allegro, a popular rigid hand in the market, which can only solve 7 tasks. We open-source our CAD models as well as the teleoperated dataset for further study. △ Less

Submitted 14 March, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Journal ref: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids)

arXiv:2304.07338 [pdf, other]

Photon Field Networks for Dynamic Real-Time Volumetric Global Illumination

Authors: David Bauer, Qi Wu, Kwan-Liu Ma

Abstract: Volume data is commonly found in many scientific disciplines, like medicine, physics, and biology. Experts rely on robust scientific visualization techniques to extract valuable insights from the data. Recent years have shown path tracing to be the preferred approach for volumetric rendering, given its high levels of realism. However, real-time volumetric path tracing often suffers from stochastic… ▽ More Volume data is commonly found in many scientific disciplines, like medicine, physics, and biology. Experts rely on robust scientific visualization techniques to extract valuable insights from the data. Recent years have shown path tracing to be the preferred approach for volumetric rendering, given its high levels of realism. However, real-time volumetric path tracing often suffers from stochastic noise and long convergence times, limiting interactive exploration. In this paper, we present a novel method to enable real-time global illumination for volume data visualization. We develop Photon Field Networks -- a phase-function-aware, multi-light neural representation of indirect volumetric global illumination. The fields are trained on multi-phase photon caches that we compute a priori. Training can be done within seconds, after which the fields can be used in various rendering tasks. To showcase their potential, we develop a custom neural path tracer, with which our photon fields achieve interactive framerates even on large datasets. We conduct in-depth evaluations of the method's performance, including visual quality, stochastic noise, inference and rendering speeds, and accuracy regarding illumination and phase function awareness. Results are compared to ray marching, path tracing and photon map**. Our findings show that Photon Field Networks can faithfully represent indirect global illumination across the phase spectrum while exhibiting less stochastic noise and rendering at a significantly faster rate than traditional methods. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.04188 [pdf, other]

HyperINR: A Fast and Predictive Hypernetwork for Implicit Neural Representations via Knowledge Distillation

Authors: Qi Wu, David Bauer, Yuyang Chen, Kwan-Liu Ma

Abstract: Implicit Neural Representations (INRs) have recently exhibited immense potential in the field of scientific visualization for both data generation and visualization tasks. However, these representations often consist of large multi-layer perceptrons (MLPs), necessitating millions of operations for a single forward pass, consequently hindering interactive visual exploration. While reducing the size… ▽ More Implicit Neural Representations (INRs) have recently exhibited immense potential in the field of scientific visualization for both data generation and visualization tasks. However, these representations often consist of large multi-layer perceptrons (MLPs), necessitating millions of operations for a single forward pass, consequently hindering interactive visual exploration. While reducing the size of the MLPs and employing efficient parametric encoding schemes can alleviate this issue, it compromises generalizability for unseen parameters, rendering it unsuitable for tasks such as temporal super-resolution. In this paper, we introduce HyperINR, a novel hypernetwork architecture capable of directly predicting the weights for a compact INR. By harnessing an ensemble of multiresolution hash encoding units in unison, the resulting INR attains state-of-the-art inference performance (up to 100x higher inference bandwidth) and can support interactive photo-realistic volume visualization. Additionally, by incorporating knowledge distillation, exceptional data and visualization generation quality is achieved, making our method valuable for real-time parameter exploration. We validate the effectiveness of the HyperINR architecture through a comprehensive ablation study. We showcase the versatility of HyperINR across three distinct scientific domains: novel view synthesis, temporal super-resolution of volume data, and volume rendering with dynamic global shadows. By simultaneously achieving efficiency and generalizability, HyperINR paves the way for applying INR in a wider array of scientific visualization applications. △ Less

Submitted 9 April, 2023; originally announced April 2023.

arXiv:2210.07270 [pdf, other]

Multi-Task Learning for Joint Semantic Role and Proto-Role Labeling

Authors: Aashish Arora, Harshitha Malireddi, Daniel Bauer, Asad Sayeed, Yuval Marton

Abstract: We put forward an end-to-end multi-step machine learning model which jointly labels semantic roles and the proto-roles of Dowty (1991), given a sentence and the predicates therein. Our best architecture first learns argument spans followed by learning the argument's syntactic heads. This information is shared with the next steps for predicting the semantic roles and proto-roles. We also experiment… ▽ More We put forward an end-to-end multi-step machine learning model which jointly labels semantic roles and the proto-roles of Dowty (1991), given a sentence and the predicates therein. Our best architecture first learns argument spans followed by learning the argument's syntactic heads. This information is shared with the next steps for predicting the semantic roles and proto-roles. We also experiment with transfer learning from argument and head prediction to role and proto-role labeling. We compare using static and contextual embeddings for words, arguments, and sentences. Unlike previous work, our model does not require pre-training or fine-tuning on additional tasks, beyond using off-the-shelf (static or contextual) embeddings and supervision. It also does not require argument spans, their semantic roles, and/or their gold syntactic heads as additional input, because it learns to predict all these during training. Our multi-task learning model raises the state-of-the-art predictions for most proto-roles. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 10 pages including references. 2 figures. First 2 authors contributed significantly

arXiv:2209.09965 [pdf, other]

FoVolNet: Fast Volume Rendering using Foveated Deep Neural Networks

Authors: David Bauer, Qi Wu, Kwan-Liu Ma

Abstract: Volume data is found in many important scientific and engineering applications. Rendering this data for visualization at high quality and interactive rates for demanding applications such as virtual reality is still not easily achievable even using professional-grade hardware. We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization. We develop a cos… ▽ More Volume data is found in many important scientific and engineering applications. Rendering this data for visualization at high quality and interactive rates for demanding applications such as virtual reality is still not easily achievable even using professional-grade hardware. We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization. We develop a cost-effective foveated rendering pipeline that sparsely samples a volume around a focal point and reconstructs the full-frame using a deep neural network. Foveated rendering is a technique that prioritizes rendering computations around the user's focal point. This approach leverages properties of the human visual system, thereby saving computational resources when rendering data in the periphery of the user's field of vision. Our reconstruction network combines direct and kernel prediction methods to produce fast, stable, and perceptually convincing output. With a slim design and the use of quantization, our method outperforms state-of-the-art neural reconstruction techniques in both end-to-end frame times and visual quality. We conduct extensive evaluations of the system's rendering performance, inference speed, and perceptual properties, and we provide comparisons to competing neural image reconstruction techniques. Our test results show that FoVolNet consistently achieves significant time saving over conventional rendering while preserving perceptual quality. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: To appear at IEEE VIS 2022 and later TVCG

arXiv:2208.04683 [pdf, other]

Applying data technologies to combat AMR: current status, challenges, and opportunities on the way forward

Authors: Leonid Chindelevitch, Elita Jauneikaite, Nicole E. Wheeler, Kasim Allel, Bede Yaw Ansiri-Asafoakaa, Wireko A. Awuah, Denis C. Bauer, Stephan Beisken, Kara Fan, Gary Grant, Michael Graz, Yara Khalaf, Veranja Liyanapathirana, Carlos Montefusco-Pereira, Lawrence Mugisha, Atharv Naik, Sylvia Nanono, Anthony Nguyen, Timothy Rawson, Kessendri Reddy, Juliana M. Ruzante, Anneke Schmider, Roman Stocker, Leonhardt Unruh, Daniel Waruingi , et al. (2 additional authors not shown)

Abstract: Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Nume… ▽ More Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Numerous interventions have been proposed to control the development of AMR and mitigate the risks posed by its spread. This paper reviews key aspects of bacterial AMR management and control which make essential use of data technologies such as artificial intelligence, machine learning, and mathematical and statistical modelling, fields that have seen rapid developments in this century. Although data technologies have become an integral part of biomedical research, their impact on AMR management has remained modest. We outline the use of data technologies to combat AMR, detailing recent advancements in four complementary categories: surveillance, prevention, diagnosis, and treatment. We provide an overview on current AMR control approaches using data technologies within biomedical research, clinical practice, and in the "One Health" context. We discuss the potential impact and challenges wider implementation of data technologies is facing in high-income as well as in low- and middle-income countries, and recommend concrete actions needed to allow these technologies to be more readily integrated within the healthcare and public health sectors. △ Less

Submitted 11 August, 2022; v1 submitted 5 July, 2022; originally announced August 2022.

Comments: 65 pages, 3 figures

ACM Class: I.2.1; J.3

arXiv:2207.11620 [pdf, other]

Interactive Volume Visualization via Multi-Resolution Hash Encoding based Neural Representation

Authors: Qi Wu, David Bauer, Michael J. Doyle, Kwan-Liu Ma

Abstract: Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framewo… ▽ More Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framework, and a well-designed rendering algorithm with macro-cell acceleration, we can interactively ray trace volumetric neural representations (10-60fps). Our neural representations are also high-fidelity (PSNR > 30dB) and compact (10-1000x smaller). Additionally, we show that it is possible to fit the entire training step inside a rendering loop and skip the pre-training process completely. To support extreme-scale volume data, we also develop an efficient out-of-core training strategy, which allows our volumetric neural representation training to potentially scale up to terascale using only an NVIDIA RTX 3090 workstation. △ Less

Submitted 29 June, 2023; v1 submitted 23 July, 2022; originally announced July 2022.

Comments: There is a supplementary video for this manuscript, which can be accessed via this link: https://drive.google.com/file/d/17wSgIm_VsoeGhfyZwMpOnCYy2Mj3ydGv/view?usp=sharing

arXiv:2203.06671 [pdf, other]

Summarizing a virtual robot's past actions in natural language

Authors: Chad DeChant, Daniel Bauer

Abstract: We propose and demonstrate the task of giving natural language summaries of the actions of a robotic agent in a virtual environment. We explain why such a task is important, what makes it difficult, and discuss how it might be addressed. To encourage others to work on this, we show how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruct… ▽ More We propose and demonstrate the task of giving natural language summaries of the actions of a robotic agent in a virtual environment. We explain why such a task is important, what makes it difficult, and discuss how it might be addressed. To encourage others to work on this, we show how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner. We provide quantitative and qualitative evaluations of our results, which can serve as a baseline for future work. △ Less

Submitted 13 March, 2022; originally announced March 2022.

Comments: 12 pages, 3 figures

arXiv:2201.05230 [pdf, other]

NLP in Human Rights Research -- Extracting Knowledge Graphs About Police and Army Units and Their Commanders

Authors: Daniel Bauer, Tom Longley, Yueen Ma, Tony Wilson

Abstract: In this working paper we explore the use of an NLP system to assist the work of Security Force Monitor (SFM). SFM creates data about the organizational structure, command personnel and operations of police, army and other security forces, which assists human rights researchers, journalists and litigators in their work to help identify and bring to account specific units and personnel alleged to ha… ▽ More In this working paper we explore the use of an NLP system to assist the work of Security Force Monitor (SFM). SFM creates data about the organizational structure, command personnel and operations of police, army and other security forces, which assists human rights researchers, journalists and litigators in their work to help identify and bring to account specific units and personnel alleged to have committed abuses of human rights and international criminal law. This working paper presents an NLP system that extracts from English language news reports the names of security force units and the biographical details of their personnel, and infers the formal relationship between them. Published alongside this working paper are the system's code and training dataset. We find that the experimental NLP system performs the task at a fair to good level. Its performance is sufficient to justify further development into a live workflow that will give insight into whether its performance translates into savings in time and resource that would make it an effective technical intervention. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: Equal contributions. for associated text corpus see https://github.com/security-force-monitor/nlp_starter_dataset

arXiv:2201.00239 [pdf, other]

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Authors: Dominik Bauer, Timothy Patten, Markus Vincze

Abstract: Observational noise, inaccurate segmentation and ambiguity due to symmetry and occlusion lead to inaccurate object pose estimates. While depth- and RGB-based pose refinement approaches increase the accuracy of the resulting pose estimates, they are susceptible to ambiguity in the observation as they consider visual alignment. We propose to leverage the fact that we often observe static, rigid scen… ▽ More Observational noise, inaccurate segmentation and ambiguity due to symmetry and occlusion lead to inaccurate object pose estimates. While depth- and RGB-based pose refinement approaches increase the accuracy of the resulting pose estimates, they are susceptible to ambiguity in the observation as they consider visual alignment. We propose to leverage the fact that we often observe static, rigid scenes. Thus, the objects therein need to be under physically plausible poses. We show that considering plausibility reduces ambiguity and, in consequence, allows poses to be more accurately predicted in cluttered environments. To this end, we extend a recent RL-based registration approach towards iterative refinement of object poses. Experiments on the LINEMOD and YCB-VIDEO datasets demonstrate the state-of-the-art performance of our depth-based refinement approach. △ Less

Submitted 1 January, 2022; originally announced January 2022.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022

arXiv:2112.10272 [pdf, other]

A Multi-Layout Design for Immersive Visualization of Network Data

Authors: David Bauer, Chengbo Zheng, Oh-Hyun Kwon, Kwan-Liu Ma

Abstract: Visualization plays a vital role in making sense of complex network data. Recent studies have shown the potential of using extended reality (XR) for the immersive exploration of networks. The additional depth cues offered by XR help users perform better in certain tasks when compared to using traditional desktop setups. However, prior works on immersive network visualization rely on mostly static… ▽ More Visualization plays a vital role in making sense of complex network data. Recent studies have shown the potential of using extended reality (XR) for the immersive exploration of networks. The additional depth cues offered by XR help users perform better in certain tasks when compared to using traditional desktop setups. However, prior works on immersive network visualization rely on mostly static graph layouts to present the data to the user. This poses a problem since there is no optimal layout for all possible tasks. The choice of layout heavily depends on the type of network and the task at hand. We introduce a multi-layout approach that allows users to effectively explore hierarchical network data in immersive space. The resulting system leverages different layout techniques and interactions to efficiently use the available space in VR and provide an optimal view of the data depending on the task and the level of detail required to solve it. To evaluate our approach, we have conducted a user study comparing it against the state of the art for immersive network visualization. Participants performed tasks at varying spatial scopes. The results show that our approach outperforms the baseline in spatially focused scenarios as well as when the whole network needs to be considered. △ Less

Submitted 26 January, 2023; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: 13 pages, 6 figures, this manuscript is currently under revision

arXiv:2111.01869 [pdf, other]

Towards Very Low-Cost Iterative Prototy** for Fully Printable Dexterous Soft Robotic Hands

Authors: Dominik Bauer, Cornelia Bauer, Arjun Lakshmipathy, Roberto Shu, Nancy S. Pollard

Abstract: The design and fabrication of soft robot hands is still a time-consuming and difficult process. Advances in rapid prototy** have accelerated the fabrication process significantly while introducing new complexities into the design process. In this work, we present an approach that utilizes novel low-cost fabrication techniques in conjunction with design tools hel** soft hand designers to system… ▽ More The design and fabrication of soft robot hands is still a time-consuming and difficult process. Advances in rapid prototy** have accelerated the fabrication process significantly while introducing new complexities into the design process. In this work, we present an approach that utilizes novel low-cost fabrication techniques in conjunction with design tools hel** soft hand designers to systematically take advantage of multi-material 3D printing to create dexterous soft robotic hands. While very low cost and lightweight, we show that generated designs are highly durable, surprisingly strong, and capable of dexterous gras**. △ Less

Submitted 16 April, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.15532 [pdf, other]

Contact Transfer: A Direct, User-Driven Method for Human to Robot Transfer of Grasps and Manipulations

Authors: Arjun Lakshmipathy, Dominik Bauer, Cornelia Bauer, Nancy S. Pollard

Abstract: We present a novel method for the direct transfer of grasps and manipulations between objects and hands through utilization of contact areas. Our method fully preserves contact shapes, and in contrast to existing techniques, is not dependent on grasp families, requires no model training or grasp sampling, makes no assumptions about manipulator morphology or kinematics, and allows user control over… ▽ More We present a novel method for the direct transfer of grasps and manipulations between objects and hands through utilization of contact areas. Our method fully preserves contact shapes, and in contrast to existing techniques, is not dependent on grasp families, requires no model training or grasp sampling, makes no assumptions about manipulator morphology or kinematics, and allows user control over both transfer parameters and solution optimization. Despite these accommodations, we show that our method is capable of synthesizing kinematically feasible whole hand poses in seconds even for poor initializations or hard to reach contacts. We additionally highlight the method's benefits in both response to design alterations as well as fast approximation over in-hand manipulation sequences. Finally, we demonstrate a solution generated by our method on a physical, custom designed prosthetic hand. △ Less

Submitted 1 June, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

arXiv:2109.09500 [pdf, other]

Deep Learning-Based Estimation and Goodness-of-Fit for Large-Scale Confirmatory Item Factor Analysis

Authors: Christopher J. Urban, Daniel J. Bauer

Abstract: We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle constraints on loadings and factor correlations.… ▽ More We investigate novel parameter estimation and goodness-of-fit (GOF) assessment methods for large-scale confirmatory item factor analysis (IFA) with many respondents, items, and latent factors. For parameter estimation, we extend Urban and Bauer's (2021) deep learning algorithm for exploratory IFA to the confirmatory setting by showing how to handle constraints on loadings and factor correlations. For GOF assessment, we explore simulation-based tests and indices that extend the classifier two-sample test (C2ST), a method that tests whether a deep neural network can distinguish between observed data and synthetic data sampled from a fitted IFA model. Proposed extensions include a test of approximate fit wherein the user specifies what percentage of observed and synthetic data should be distinguishable as well as a relative fit index (RFI) that is similar in spirit to the RFIs used in structural equation modeling. Via simulation studies, we show that: (1) the confirmatory extension of Urban and Bauer's (2021) algorithm obtains comparable estimates to a state-of-the-art estimation procedure in less time; (2) C2ST-based GOF tests control the empirical type I error rate and detect when the latent dimensionality is misspecified; and (3) the sampling distribution of the C2ST-based RFI depends on the sample size. △ Less

Submitted 15 March, 2023; v1 submitted 20 September, 2021; originally announced September 2021.

arXiv:2103.15231 [pdf, other]

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Authors: Dominik Bauer, Timothy Patten, Markus Vincze

Abstract: Point cloud registration is a common step in many 3D computer vision tasks such as object pose estimation, where a 3D model is aligned to an observation. Classical registration methods generalize well to novel domains but fail when given a noisy observation or a bad initialization. Learning-based methods, in contrast, are more robust but lack in generalization capacity. We propose to consider iter… ▽ More Point cloud registration is a common step in many 3D computer vision tasks such as object pose estimation, where a 3D model is aligned to an observation. Classical registration methods generalize well to novel domains but fail when given a noisy observation or a bad initialization. Learning-based methods, in contrast, are more robust but lack in generalization capacity. We propose to consider iterative point cloud registration as a reinforcement learning task and, to this end, present a novel registration agent (ReAgent). We employ imitation learning to initialize its discrete registration policy based on a steady expert policy. Integration with policy optimization, based on our proposed alignment reward, further improves the agent's registration performance. We compare our approach to classical and learning-based registration methods on both ModelNet40 (synthetic) and ScanObjectNN (real data) and show that our ReAgent achieves state-of-the-art accuracy. The lightweight architecture of the agent, moreover, enables reduced inference time as compared to related approaches. In addition, we apply our method to the object pose estimation task on real data (LINEMOD), outperforming state-of-the-art pose refinement approaches. △ Less

Submitted 28 March, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021

arXiv:2012.02111 [pdf, other]

Deep Inverse Sensor Models as Priors for evidential Occupancy Map**

Authors: Daniel Bauer, Lars Kuhnert, Lutz Eckstein

Abstract: With the recent boost in autonomous driving, increased attention has been paid on radars as an input for occupancy map**. Besides their many benefits, the inference of occupied space based on radar detections is notoriously difficult because of the data sparsity and the environment dependent noise (e.g. multipath reflections). Recently, deep learning-based inverse sensor models, from here on cal… ▽ More With the recent boost in autonomous driving, increased attention has been paid on radars as an input for occupancy map**. Besides their many benefits, the inference of occupied space based on radar detections is notoriously difficult because of the data sparsity and the environment dependent noise (e.g. multipath reflections). Recently, deep learning-based inverse sensor models, from here on called deep ISMs, have been shown to improve over their geometric counterparts in retrieving occupancy information. Nevertheless, these methods perform a data-driven interpolation which has to be verified later on in the presence of measurements. In this work, we describe a novel approach to integrate deep ISMs together with geometric ISMs into the evidential occupancy map** framework. Our method leverages both the capabilities of the data-driven approach to initialize cells not yet observable for the geometric model effectively enhancing the perception field and convergence speed, while at the same time use the precision of the geometric ISM to converge to sharp boundaries. We further define a lower limit on the deep ISM estimate's certainty together with analytical proofs of convergence which we use to distinguish cells that are solely allocated by the deep ISM from cells already verified using the geometric approach. △ Less

Submitted 2 December, 2020; originally announced December 2020.

arXiv:2001.07859 [pdf, other]

A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis

Authors: Christopher J. Urban, Daniel J. Bauer

Abstract: Marginal maximum likelihood (MML) estimation is the preferred approach to fitting item response theory models in psychometrics due to the MML estimator's consistency, normality, and efficiency as the sample size tends to infinity. However, state-of-the-art MML estimation procedures such as the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm as well as approximate MML estimation procedures such… ▽ More Marginal maximum likelihood (MML) estimation is the preferred approach to fitting item response theory models in psychometrics due to the MML estimator's consistency, normality, and efficiency as the sample size tends to infinity. However, state-of-the-art MML estimation procedures such as the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm as well as approximate MML estimation procedures such as variational inference (VI) are computationally time-consuming when the sample size and the number of latent factors are very large. In this work, we investigate a deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors. The proposed approach applies a deep artificial neural network model called an importance-weighted autoencoder (IWAE) for exploratory IFA. The IWAE approximates the MML estimator using an importance sampling technique wherein increasing the number of importance-weighted (IW) samples drawn during fitting improves the approximation, typically at the cost of decreased computational efficiency. We provide a real data application that recovers results aligning with psychological theory across random starts. Via simulation studies, we show that the IWAE yields more accurate estimates as either the sample size or the number of IW samples increases (although factor correlation and intercepts estimates exhibit some bias) and obtains similar results to MH-RM in less time. Our simulations also suggest that the proposed approach performs similarly to and is potentially faster than constrained joint maximum likelihood estimation, a fast procedure that is consistent when the sample size and the number of items simultaneously tend to infinity. △ Less

Submitted 4 February, 2021; v1 submitted 21 January, 2020; originally announced January 2020.

Comments: 30 pages; 12 figures; accepted for publication in Psychometrika

arXiv:1909.05730 [pdf, other]

VeREFINE: Integrating Object Pose Verification with Physics-guided Iterative Refinement

Authors: Dominik Bauer, Timothy Patten, Markus Vincze

Abstract: Accurate and robust object pose estimation for robotics applications requires verification and refinement steps. In this work, we propose to integrate hypotheses verification with object pose refinement guided by physics simulation. This allows the physical plausibility of individual object pose estimates and the stability of the estimated scene to be considered in a unified optimization. The prop… ▽ More Accurate and robust object pose estimation for robotics applications requires verification and refinement steps. In this work, we propose to integrate hypotheses verification with object pose refinement guided by physics simulation. This allows the physical plausibility of individual object pose estimates and the stability of the estimated scene to be considered in a unified optimization. The proposed method is able to adapt to scenes of multiple objects and efficiently focuses on refining the most promising object poses in multi-hypotheses scenarios. We call this integrated approach VeREFINE and evaluate it on three datasets with varying scene complexity. The generality of the approach is shown by using three state-of-the-art pose estimators and three baseline refiners. Results show improvements over all baselines and on all datasets. Furthermore, our approach is applied in real-world gras** experiments and outperforms competing methods in terms of grasp success rate. Code is publicly available at github.com/dornik/verefine. △ Less

Submitted 18 May, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: Revised version

arXiv:1904.00842 [pdf, other]

Deep, spatially coherent Inverse Sensor Models with Uncertainty Incorporation using the evidential Framework

Authors: Daniel Bauer, Lars Kuhnert, Lutz Eckstein

Abstract: To perform high speed tasks, sensors of autonomous cars have to provide as much information in as few time steps as possible. However, radars, one of the sensor modalities autonomous cars heavily rely on, often only provide sparse, noisy detections. These have to be accumulated over time to reach a high enough confidence about the static parts of the environment. For radars, the state is typically… ▽ More To perform high speed tasks, sensors of autonomous cars have to provide as much information in as few time steps as possible. However, radars, one of the sensor modalities autonomous cars heavily rely on, often only provide sparse, noisy detections. These have to be accumulated over time to reach a high enough confidence about the static parts of the environment. For radars, the state is typically estimated by accumulating inverse detection models (IDMs). We employ the recently proposed evidential convolutional neural networks which, in contrast to IDMs, compute dense, spatially coherent inference of the environment state. Moreover, these networks are able to incorporate sensor noise in a principled way which we further extend to also incorporate model uncertainty. We present experimental results that show This makes it possible to obtain a denser environment perception in fewer time steps. △ Less

Submitted 29 March, 2019; originally announced April 2019.

Comments: Submitted for Intelligent Vehicle Symposium 2019

arXiv:1903.12467 [pdf, other]

Deep, spatially coherent Occupancy Maps based on Radar Measurements

Authors: Daniel Bauer, Lars Kuhnert, Lutz Eckstein

Abstract: One essential step to realize modern driver assistance technology is the accurate knowledge about the location of static objects in the environment. In this work, we use artificial neural networks to predict the occupation state of a whole scene in an end-to-end manner. This stands in contrast to the traditional approach of accumulating each detection's influence on the occupancy state and allows… ▽ More One essential step to realize modern driver assistance technology is the accurate knowledge about the location of static objects in the environment. In this work, we use artificial neural networks to predict the occupation state of a whole scene in an end-to-end manner. This stands in contrast to the traditional approach of accumulating each detection's influence on the occupancy state and allows to learn spatial priors which can be used to interpolate the environment's occupancy state. We show that these priors make our method suitable to predict dense occupancy estimations from sparse, highly uncertain inputs, as given by automotive radars, even for complex urban scenarios. Furthermore, we demonstrate that these estimations can be used for large-scale map** applications. △ Less

Submitted 29 March, 2019; originally announced March 2019.

Comments: Submitted for Automotive Meets Electronics 2019

arXiv:1602.09076 [pdf, other]

doi 10.1109/TITS.2016.2565643

Personalized and situation-aware multimodal route recommendations: the FAVOUR algorithm

Authors: Paolo Campigotto, Christian Rudloff, Maximilian Leodolter, Dietmar Bauer

Abstract: Route choice in multimodal networks shows a considerable variation between different individuals as well as the current situational context. Personalization of recommendation algorithms are already common in many areas, e.g., online retail. However, most online routing applications still provide shortest distance or shortest travel-time routes only, neglecting individual preferences as well as the… ▽ More Route choice in multimodal networks shows a considerable variation between different individuals as well as the current situational context. Personalization of recommendation algorithms are already common in many areas, e.g., online retail. However, most online routing applications still provide shortest distance or shortest travel-time routes only, neglecting individual preferences as well as the current situation. Both aspects are of particular importance in a multimodal setting as attractivity of some transportation modes such as biking crucially depends on personal characteristics and exogenous factors like the weather. This paper introduces the FAVourite rOUte Recommendation (FAVOUR) approach to provide personalized, situation-aware route proposals based on three steps: first, at the initialization stage, the user provides limited information (home location, work place, mobility options, sociodemographics) used to select one out of a small number of initial profiles. Second, based on this information, a stated preference survey is designed in order to sharpen the profile. In this step a mass preference prior is used to encode the prior knowledge on preferences from the class identified in step one. And third, subsequently the profile is continuously updated during usage of the routing services. The last two steps use Bayesian learning techniques in order to incorporate information from all contributing individuals. The FAVOUR approach is presented in detail and tested on a small number of survey participants. The experimental results on this real-world dataset show that FAVOUR generates better-quality recommendations w.r.t. alternative learning algorithms from the literature. In particular the definition of the mass preference prior for initialization of step two is shown to provide better predictions than a number of alternatives from the literature. △ Less

Submitted 29 February, 2016; originally announced February 2016.

Comments: 12 pages, 6 figures, 1 table. Submitted to IEEE Transactions on Intelligent Transportation Systems journal for publication

arXiv:1207.1115 [pdf, other]

Inferring land use from mobile phone activity

Authors: Jameson L. Toole, Michael Ulm, Dietmar Bauer, Marta C. Gonzalez

Abstract: Understanding the spatiotemporal distribution of people within a city is crucial to many planning applications. Obtaining data to create required knowledge, currently involves costly survey methods. At the same time ubiquitous mobile sensors from personal GPS devices to mobile phones are collecting massive amounts of data on urban systems. The locations, communications, and activities of millions… ▽ More Understanding the spatiotemporal distribution of people within a city is crucial to many planning applications. Obtaining data to create required knowledge, currently involves costly survey methods. At the same time ubiquitous mobile sensors from personal GPS devices to mobile phones are collecting massive amounts of data on urban systems. The locations, communications, and activities of millions of people are recorded and stored by new information technologies. This work utilizes novel dynamic data, generated by mobile phone users, to measure spatiotemporal changes in population. In the process, we identify the relationship between land use and dynamic population over the course of a typical week. A machine learning classification algorithm is used to identify clusters of locations with similar zoned uses and mobile phone activity patterns. It is shown that the mobile phone data is capable of delivering useful information on actual land use that supplements zoning regulations. △ Less

Submitted 3 July, 2012; originally announced July 2012.

Comments: To be presented at ACM UrbComp2012

ACM Class: H.2.8

Showing 1–29 of 29 results for author: Bauer, D