Search | arXiv e-print repository

Gradient Obfuscation Checklist Test Gives a False Sense of Security

Authors: Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

Abstract: One popular group of defense techniques against adversarial attacks is based on injecting stochastic noise into the network. The main source of robustness of such stochastic defenses however is often due to the obfuscation of the gradients, offering a false sense of security. Since most of the popular adversarial attacks are optimization-based, obfuscated gradients reduce their attacking ability,… ▽ More One popular group of defense techniques against adversarial attacks is based on injecting stochastic noise into the network. The main source of robustness of such stochastic defenses however is often due to the obfuscation of the gradients, offering a false sense of security. Since most of the popular adversarial attacks are optimization-based, obfuscated gradients reduce their attacking ability, while the model is still susceptible to stronger or specifically tailored adversarial attacks. Recently, five characteristics have been identified, which are commonly observed when the improvement in robustness is mainly caused by gradient obfuscation. It has since become a trend to use these five characteristics as a sufficient test, to determine whether or not gradient obfuscation is the main source of robustness. However, these characteristics do not perfectly characterize all existing cases of gradient obfuscation, and therefore can not serve as a basis for a conclusive test. In this work, we present a counterexample, showing this test is not sufficient for concluding that gradient obfuscation is not the main cause of improvements in robustness. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2204.04018 [pdf, other]

Longitudinal wall shear stress evaluation using centerline projection approach in the numerical simulations of the patient-based carotid artery

Authors: Kevin Richter, Tristan Probst, Anna Hundertmark, Pepe Eulzer, Kai Lawonn

Abstract: In this numerical study areas of the carotid bifurcation and of a distal stenosis in the internal carotid artery are closely observed to evaluate the patient's current risks of ischemic stroke. An indicator for the vessel wall defects is the stress the blood is exerting on the surrounding vessel tissue, expressed standardly by the amplitude of the wall shear stress vector (WSS) and its oscillatory… ▽ More In this numerical study areas of the carotid bifurcation and of a distal stenosis in the internal carotid artery are closely observed to evaluate the patient's current risks of ischemic stroke. An indicator for the vessel wall defects is the stress the blood is exerting on the surrounding vessel tissue, expressed standardly by the amplitude of the wall shear stress vector (WSS) and its oscillatory shear index. In contrast, our orientation-based shear evaluation detects negative shear stresses corresponding with reversal flow appearing in low shear areas. In our investigations of longitudinal component of the wall shear vector, tangential vectors aligned longitudinally with the vessel are necessary. However, as a result of stenosed regions and imaging segmentation techniques from patients' CTA scans, the geometry model's mesh is non-smooth on its surface areas and the automatically generated tangential vector field is discontinuous and multi-directional, making an interpretation of the orientation-based risk indicators unreliable. We improve the evaluation of longitudinal shear stress by applying the projection of the vessel's center-line to the surface to construct smooth tangetial field aligned longitudinaly with the vessel. We validate our approach for the longitudinal WSS component and the corresponding oscillatory index by comparing them to results obtained using automatically generated tangents in both rigid and elastic vessel modeling as well as to amplitude based indicators. The major benefit of our WSS evaluation based on its longitudinal component for the cardiovascular risk assessment is the detection of negative WSS indicating persitent reversal flow. This is impossible in the case of the amplitude-based WSS. △ Less

Submitted 22 September, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

MSC Class: 76M10

arXiv:2203.13812 [pdf, other]

Spatially Multi-conditional Image Generation

Authors: Ritika Chakraborty, Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

Abstract: In most scenarios, conditional image generation can be thought of as an inversion of the image understanding process. Since generic image understanding involves solving multiple tasks, it is natural to aim at generating images via multi-conditioning. However, multi-conditional image generation is a very challenging problem due to the heterogeneity and the sparsity of the (in practice) available co… ▽ More In most scenarios, conditional image generation can be thought of as an inversion of the image understanding process. Since generic image understanding involves solving multiple tasks, it is natural to aim at generating images via multi-conditioning. However, multi-conditional image generation is a very challenging problem due to the heterogeneity and the sparsity of the (in practice) available conditioning labels. In this work, we propose a novel neural architecture to address the problem of heterogeneity and sparsity of the spatially multi-conditional labels. Our choice of spatial conditioning, such as by semantics and depth, is driven by the promise it holds for better control of the image generation process. The proposed method uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens to merge them in a learned homogeneous space of labels. The merged labels are then used for image generation via conditional generative adversarial training. In this process, the sparsity of the labels is handled by simply drop** the input tokens corresponding to the missing labels at the desired locations, thanks to the proposed pixel-wise operating architecture. Our experiments on three benchmark datasets demonstrate the clear superiority of our method over the state-of-the-art and compared baselines. The source code will be made publicly available. △ Less

Submitted 14 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2112.15111 [pdf, other]

Improving the Behaviour of Vision Transformers with Token-consistent Stochastic Layers

Authors: Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

Abstract: We introduce token-consistent stochastic layers in vision transformers, without causing any severe drop in performance. The added stochasticity improves network calibration, robustness and strengthens privacy. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are… ▽ More We introduce token-consistent stochastic layers in vision transformers, without causing any severe drop in performance. The added stochasticity improves network calibration, robustness and strengthens privacy. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are sampled from the uniform distribution, both during training and inference. The applied linear operations preserve the topological structure, formed by the set of tokens passing through the shared multilayer perceptron. This operation encourages the learning of the recognition task to rely on the topological structures of the tokens, instead of their values, which in turn offers the desired robustness and privacy of the visual features. The effectiveness of the token-consistent stochasticity is demonstrated on three different applications, namely, network calibration, adversarial robustness, and feature privacy, by boosting the performance of the respective established baselines. △ Less

Submitted 14 July, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

Comments: This article is under consideration at the Computer Vision and Image Understanding journal

arXiv:2111.02881 [pdf, other]

Defining Gaze Patterns for Process Model Literacy -- Exploring Visual Routines in Process Models with Diverse Map**s

Authors: Michael Winter, Heiko Neumann, Rüdiger Pryss, Thomas Probst, Manfred Reichert

Abstract: Process models depict crucial artifacts for organizations regarding documentation, communication, and collaboration. The proper comprehension of such models is essential for an effective application. An important aspect in process model literacy constitutes the question how the information presented in process models is extracted and processed by the human visual system? For such visuospatial task… ▽ More Process models depict crucial artifacts for organizations regarding documentation, communication, and collaboration. The proper comprehension of such models is essential for an effective application. An important aspect in process model literacy constitutes the question how the information presented in process models is extracted and processed by the human visual system? For such visuospatial tasks, the visual system deploys a set of elemental operations, from whose compositions different visual routines are produced. This paper provides insights from an exploratory eye tracking study, in which visual routines during process model comprehension were contemplated. More specifically, n = 29 participants were asked to comprehend n = 18 process models expressed in the Business Process Model and Notation 2.0 reflecting diverse map**s (i.e., straight, upward, downward) and complexity levels. The performance measures indicated that even less complex process models pose a challenge regarding their comprehension. The upward map** confronted participants' attention with more challenges, whereas the downward map** was comprehended more effectively. Based on recorded eye movements, three gaze patterns applied during model comprehension were derived. Thereupon, we defined a general model which identifies visual routines and corresponding elemental operations during process model comprehension. Finally, implications for practice as well as research and directions for future work are discussed in this paper. △ Less

Submitted 30 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2107.02030 [pdf, other]

Are Non-Experts Able to Comprehend Business Process Models -- Study Insights Involving Novices and Experts

Authors: Michael Winter, Rüdiger Pryss, Thomas Probst, Winfried Schlee, Miles Tallon, Ulrich Frick, Manfred Reichert

Abstract: The comprehension of business process models is crucial for enterprises. Prior research has shown that children as well as adolescents perceive and interpret graphical representations in a different manner compared to grown-ups. To evaluate this, observations in the context of business process models are presented in this paper obtained from a study on visual literacy in cultural education. We dem… ▽ More The comprehension of business process models is crucial for enterprises. Prior research has shown that children as well as adolescents perceive and interpret graphical representations in a different manner compared to grown-ups. To evaluate this, observations in the context of business process models are presented in this paper obtained from a study on visual literacy in cultural education. We demonstrate that adolescents without expertise in process model comprehension are able to correctly interpret business process models expressed in terms of BPMN 2.0. In a comprehensive study, n = 205 learners (i.e., pupils at the age of 15) needed to answer questions related to process models they were confronted with, reflecting different levels of complexity. In addition, process models were created with varying styles of element labels. Study results indicate that an abstract description (i.e., using only alphabetic letters) of process models is understood more easily compared to concrete or pseudo} descriptions. As benchmark, results are compared with the ones of modeling experts (n = 40). Amongst others, study findings suggest using abstract descriptions in order to introduce novices to process modeling notations. With the obtained insights, we highlight that process models can be properly comprehended by novices. △ Less

Submitted 6 July, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

arXiv:2106.03386 [pdf, other]

Corona Health -- A Study- and Sensor-based Mobile App Platform Exploring Aspects of the COVID-19 Pandemic

Authors: Felix Beierle, Johannes Schobel, Carsten Vogel, Johannes Allgaier, Lena Mulansky, Fabian Haug, Julian Haug, Winfried Schlee, Marc Holfelder, Michael Stach, Marc Schickler, Harald Baumeister, Caroline Cohrdes, Jürgen Deckert, Lorenz Deserno, Johanna-Sophie Edler, Felizitas A. Eichner, Helmut Greger, Grit Hein, Peter Heuschmann, Dennis John, Hans A. Kestler, Dagmar Krefting, Berthold Langguth, Patrick Meybohm , et al. (7 additional authors not shown)

Abstract: Physical and mental well-being during the COVID-19 pandemic is typically assessed via surveys, which might make it difficult to conduct longitudinal studies and might lead to data suffering from recall bias. Ecological momentary assessment (EMA) driven smartphone apps can help alleviate such issues, allowing for in situ recordings. Implementing such an app is not trivial, necessitates strict regul… ▽ More Physical and mental well-being during the COVID-19 pandemic is typically assessed via surveys, which might make it difficult to conduct longitudinal studies and might lead to data suffering from recall bias. Ecological momentary assessment (EMA) driven smartphone apps can help alleviate such issues, allowing for in situ recordings. Implementing such an app is not trivial, necessitates strict regulatory and legal requirements, and requires short development cycles to appropriately react to abrupt changes in the pandemic. Based on an existing app framework, we developed Corona Health, an app that serves as a platform for deploying questionnaire-based studies in combination with recordings of mobile sensors. In this paper, we present the technical details of Corona Health and provide first insights into the collected data. Through collaborative efforts from experts from public health, medicine, psychology, and computer science, we released Corona Health publicly on Google Play and the Apple App Store (in July, 2020) in 8 languages and attracted 7,290 installations so far. Currently, five studies related to physical and mental well-being are deployed and 17,241 questionnaires have been filled out. Corona Health proves to be a viable tool for conducting research related to the COVID-19 pandemic and can serve as a blueprint for future EMA-based studies. The data we collected will substantially improve our knowledge on mental and physical health states, traits and trajectories as well as its risk and protective factors over the course of the COVID-19 pandemic and its diverse prevention measures. △ Less

Submitted 6 July, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2105.10926 [pdf, other]

Rethinking Global Context in Crowd Counting

Authors: Guolei Sun, Yun Liu, Thomas Probst, Danda Pani Paudel, Nikola Popovic, Luc Van Gool

Abstract: This paper investigates the role of global context for crowd counting. Specifically, a pure transformer is used to extract features with global information from overlap** image patches. Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches throughout transformer layers. Due to the fact that transfor… ▽ More This paper investigates the role of global context for crowd counting. Specifically, a pure transformer is used to extract features with global information from overlap** image patches. Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches throughout transformer layers. Due to the fact that transformers do not explicitly model the tried-and-true channel-wise interactions, we propose a token-attention module (TAM) to recalibrate encoded features through channel-wise attention informed by the context token. Beyond that, it is adopted to predict the total person count of the image through regression-token module (RTM). Extensive experiments on various datasets, including ShanghaiTech, UCF-QNRF, JHU-CROWD++ and NWPU, demonstrate that the proposed context extraction techniques can significantly improve the performance over the baselines. △ Less

Submitted 25 November, 2023; v1 submitted 23 May, 2021; originally announced May 2021.

Comments: Accepted by Machine Intelligence Research (MIR)

Report number: DOI: 10.1007/s11633-023-1475-z

arXiv:2102.10475 [pdf, ps, other]

Open-Ended Automatic Programming Through Combinatorial Evolution

Authors: Sebastian Fix, Thomas Probst, Oliver Ruggli, Thomas Hanne, Patrik Christen

Abstract: Combinatorial evolution - the creation of new things through the combination of existing things - can be a powerful way to evolve rather than design technical objects such as electronic circuits. Intriguingly, this seems to be an ongoing and thus open-ended process creating novelty with increasing complexity. Here, we employ combinatorial evolution in software development. While current approaches… ▽ More Combinatorial evolution - the creation of new things through the combination of existing things - can be a powerful way to evolve rather than design technical objects such as electronic circuits. Intriguingly, this seems to be an ongoing and thus open-ended process creating novelty with increasing complexity. Here, we employ combinatorial evolution in software development. While current approaches such as genetic programming are efficient in solving particular problems, they all converge towards a solution and do not create anything new anymore afterwards. Combinatorial evolution of complex systems such as languages and technology are considered open-ended. Therefore, open-ended automatic programming might be possible through combinatorial evolution. We implemented a computer program simulating combinatorial evolution of code blocks stored in a database to make them available for combining. Automatic programming in the sense of algorithm-based code generation is achieved by evaluating regular expressions. We found that reserved keywords of a programming language are suitable for defining the basic code blocks at the beginning of the simulation. We also found that placeholders can be used to combine code blocks and that code complexity can be described in terms of the importance to the programming language. As in a previous combinatorial evolution simulation of electronic circuits, complexity increased from simple keywords and special characters to more complex variable declarations, class definitions, methods, and classes containing methods and variable declarations. Combinatorial evolution, therefore, seems to be a promising approach for open-ended automatic programming. △ Less

Submitted 22 November, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

Comments: 12 pages, 2 tables

arXiv:2012.15680 [pdf, other]

Unsupervised Monocular Depth Reconstruction of Non-Rigid Scenes

Authors: Ayça Takmaz, Danda Pani Paudel, Thomas Probst, Ajad Chhatkuli, Martin R. Oswald, Luc Van Gool

Abstract: Monocular depth reconstruction of complex and dynamic scenes is a highly challenging problem. While for rigid scenes learning-based methods have been offering promising results even in unsupervised cases, there exists little to no literature addressing the same for dynamic and deformable scenes. In this work, we present an unsupervised monocular framework for dense depth estimation of dynamic scen… ▽ More Monocular depth reconstruction of complex and dynamic scenes is a highly challenging problem. While for rigid scenes learning-based methods have been offering promising results even in unsupervised cases, there exists little to no literature addressing the same for dynamic and deformable scenes. In this work, we present an unsupervised monocular framework for dense depth estimation of dynamic scenes, which jointly reconstructs rigid and non-rigid parts without explicitly modelling the camera motion. Using dense correspondences, we derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points. In this process, the dense depth map is learned implicitly using the as-rigid-as-possible hypothesis. Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes. Furthermore, the proposed method also provides unsupervised motion segmentation results as an auxiliary output. △ Less

Submitted 28 October, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

arXiv:2012.09030 [pdf, other]

CompositeTasking: Understanding Images by Spatial Composition of Tasks

Authors: Nikola Popovic, Danda Pani Paudel, Thomas Probst, Guolei Sun, Luc Van Gool

Abstract: We define the concept of CompositeTasking as the fusion of multiple, spatially distributed tasks, for various aspects of image understanding. Learning to perform spatially distributed tasks is motivated by the frequent availability of only sparse labels across tasks, and the desire for a compact multi-tasking network. To facilitate CompositeTasking, we introduce a novel task conditioning model --… ▽ More We define the concept of CompositeTasking as the fusion of multiple, spatially distributed tasks, for various aspects of image understanding. Learning to perform spatially distributed tasks is motivated by the frequent availability of only sparse labels across tasks, and the desire for a compact multi-tasking network. To facilitate CompositeTasking, we introduce a novel task conditioning model -- a single encoder-decoder network that performs multiple, spatially varying tasks at once. The proposed network takes an image and a set of pixel-wise dense task requests as inputs, and performs the requested prediction task for each pixel. Moreover, we also learn the composition of tasks that needs to be performed according to some CompositeTasking rules, which includes the decision of where to apply which task. It not only offers us a compact network for multi-tasking, but also allows for task-editing. Another strength of the proposed method is demonstrated by only having to supply sparse supervision per task. The obtained results are on par with our baselines that use dense supervision and a multi-headed multi-tasking design. The source code will be made publicly available at www.github.com/nikola3794/composite-tasking. △ Less

Submitted 17 June, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

arXiv:1909.12034 [pdf, other]

Convex Relaxations for Consensus and Non-Minimal Problems in 3D Vision

Authors: Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool

Abstract: In this paper, we formulate a generic non-minimal solver using the existing tools of Polynomials Optimization Problems (POP) from computational algebraic geometry. The proposed method exploits the well known Shor's or Lasserre's relaxations, whose theoretical aspects are also discussed. Notably, we further exploit the POP formulation of non-minimal solver also for the generic consensus maximizatio… ▽ More In this paper, we formulate a generic non-minimal solver using the existing tools of Polynomials Optimization Problems (POP) from computational algebraic geometry. The proposed method exploits the well known Shor's or Lasserre's relaxations, whose theoretical aspects are also discussed. Notably, we further exploit the POP formulation of non-minimal solver also for the generic consensus maximization problems in 3D vision. Our framework is simple and straightforward to implement, which is also supported by three diverse applications in 3D vision, namely rigid body transformation estimation, Non-Rigid Structure-from-Motion (NRSfM), and camera autocalibration. In all three cases, both non-minimal and consensus maximization are tested, which are also compared against the state-of-the-art methods. Our results are competitive to the compared methods, and are also coherent with our theoretical analysis. The main contribution of this paper is the claim that a good approximate solution for many polynomial problems involved in 3D vision can be obtained using the existing theory of numerical computational algebra. This claim leads us to reason about why many relaxed methods in 3D vision behave so well? And also allows us to offer a generic relaxed solver in a rather straightforward way. We further show that the convex relaxation of these polynomials can easily be used for maximizing consensus in a deterministic manner. We support our claim using several experiments for aforementioned three diverse problems in 3D vision. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: Accepted to ICCV'19

arXiv:1907.10695 [pdf, other]

Dual Grid Net: hand mesh vertex regression from single depth maps

Authors: Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao

Abstract: We present a method for recovering the dense 3D surface of the hand by regressing the vertex coordinates of a mesh model from a single depth map. To this end, we use a two-stage 2D fully convolutional network architecture. In the first stage, the network estimates a dense correspondence field for every pixel on the depth map or image grid to the mesh grid. In the second stage, we design a differen… ▽ More We present a method for recovering the dense 3D surface of the hand by regressing the vertex coordinates of a mesh model from a single depth map. To this end, we use a two-stage 2D fully convolutional network architecture. In the first stage, the network estimates a dense correspondence field for every pixel on the depth map or image grid to the mesh grid. In the second stage, we design a differentiable operator to map features learned from the previous stage and regress a 3D coordinate map on the mesh grid. Finally, we sample from the mesh grid to recover the mesh vertices, and fit it an articulated template mesh in closed form. During inference, the network can predict all the mesh vertices, transformation matrices for every joint and the joint coordinates in a single forward pass. When given supervision on the sparse key-point coordinates, our method achieves state-of-the-art accuracy on NYU dataset for key point localization while recovering mesh vertices and a dense correspondence map. Our framework can also be learned through self-supervision by minimizing a set of data fitting and kinematic prior terms. With multi-camera rig during training to resolve self-occlusion, it can perform competitively with strongly supervised methods Without any human annotation. △ Less

Submitted 24 July, 2019; originally announced July 2019.

arXiv:1812.03795 [pdf, other]

Map**, Localization and Path Planning for Image-based Navigation using Visual Features and Map

Authors: Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Thomas Probst, Luc Van Gool

Abstract: Building on progress in feature representations for image retrieval, image-based localization has seen a surge of research interest. Image-based localization has the advantage of being inexpensive and efficient, often avoiding the use of 3D metric maps altogether. That said, the need to maintain a large number of reference images as an effective support of localization in a scene, nonetheless call… ▽ More Building on progress in feature representations for image retrieval, image-based localization has seen a surge of research interest. Image-based localization has the advantage of being inexpensive and efficient, often avoiding the use of 3D metric maps altogether. That said, the need to maintain a large number of reference images as an effective support of localization in a scene, nonetheless calls for them to be organized in a map structure of some kind. The problem of localization often arises as part of a navigation process. We are, therefore, interested in summarizing the reference images as a set of landmarks, which meet the requirements for image-based navigation. A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: map construction and self-localization. These requirements are then exploited for compact map representation and accurate self-localization, using the framework of a network flow problem. During this process, we formulate the map construction and self-localization problems as convex quadratic and second-order cone programs, respectively. We evaluate our methods on publicly available indoor and outdoor datasets, where they outperform existing methods significantly. △ Less

Submitted 11 July, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: CVPR 2019, for implementation see https://github.com/janinethoma

arXiv:1808.04181 [pdf, other]

Incremental Non-Rigid Structure-from-Motion with Unknown Focal Length

Authors: Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool

Abstract: The perspective camera and the isometric surface prior have recently gathered increased attention for Non-Rigid Structure-from-Motion (NRSfM). Despite the recent progress, several challenges remain, particularly the computational complexity and the unknown camera focal length. In this paper we present a method for incremental Non-Rigid Structure-from-Motion (NRSfM) with the perspective camera mode… ▽ More The perspective camera and the isometric surface prior have recently gathered increased attention for Non-Rigid Structure-from-Motion (NRSfM). Despite the recent progress, several challenges remain, particularly the computational complexity and the unknown camera focal length. In this paper we present a method for incremental Non-Rigid Structure-from-Motion (NRSfM) with the perspective camera model and the isometric surface prior with unknown focal length. In the template-based case, we provide a method to estimate four parameters of the camera intrinsics. For the template-less scenario of NRSfM, we propose a method to upgrade reconstructions obtained for one focal length to another based on local rigidity and the so-called Maximum Depth Heuristics (MDH). On its basis we propose a method to simultaneously recover the focal length and the non-rigid shapes. We further solve the problem of incorporating a large number of points and adding more views in MDH-based NRSfM and efficiently solve them with Second-Order Cone Programming (SOCP). This does not require any shape initialization and produces results orders of times faster than many methods. We provide evaluations on standard sequences with ground-truth and qualitative reconstructions on challenging YouTube videos. These evaluations show that our method performs better in both speed and accuracy than the state of the art. △ Less

Submitted 13 August, 2018; originally announced August 2018.

Comments: ECCV 2018

arXiv:1807.01963 [pdf, other]

Model-free Consensus Maximization for Non-Rigid Shapes

Authors: Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool

Abstract: Many computer vision methods use consensus maximization to relate measurements containing outliers with the correct transformation model. In the context of rigid shapes, this is typically done using Random Sampling and Consensus (RANSAC) by estimating an analytical model that agrees with the largest number of measurements (inliers). However, small parameter models may not be always available. In t… ▽ More Many computer vision methods use consensus maximization to relate measurements containing outliers with the correct transformation model. In the context of rigid shapes, this is typically done using Random Sampling and Consensus (RANSAC) by estimating an analytical model that agrees with the largest number of measurements (inliers). However, small parameter models may not be always available. In this paper, we formulate the model-free consensus maximization as an Integer Program in a graph using `rules' on measurements. We then provide a method to solve it optimally using the Branch and Bound (BnB) paradigm. We focus its application on non-rigid shapes, where we apply the method to remove outlier 3D correspondences and achieve performance superior to the state of the art. Our method works with outlier ratio as high as 80\%. We further derive a similar formulation for 3D template to image matching, achieving similar or better performance compared to the state of the art. △ Less

Submitted 13 August, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: ECCV18

arXiv:1807.01515 [pdf, other]

doi 10.1016/j.procs.2018.07.139

Context Data Categories and Privacy Model for Mobile Data Collection Apps

Authors: Felix Beierle, Vinh Thuy Tran, Mathias Allemand, Patrick Neff, Winfried Schlee, Thomas Probst, Rüdiger Pryss, Johannes Zimmermann

Abstract: Context-aware applications stemming from diverse fields like mobile health, recommender systems, and mobile commerce potentially benefit from knowing aspects of the user's personality. As filling out personality questionnaires is tedious, we propose the prediction of the user's personality from smartphone sensor and usage data. In order to collect data for researching the relationship between smar… ▽ More Context-aware applications stemming from diverse fields like mobile health, recommender systems, and mobile commerce potentially benefit from knowing aspects of the user's personality. As filling out personality questionnaires is tedious, we propose the prediction of the user's personality from smartphone sensor and usage data. In order to collect data for researching the relationship between smartphone data and personality, we developed the Android app TYDR (Track Your Daily Routine) which tracks smartphone data and utilizes psychometric personality questionnaires. With TYDR, we track a larger variety of smartphone data than similar existing apps, including metadata on notifications, photos taken, and music played back by the user. For the development of TYDR, we introduce a general context data model consisting of four categories that focus on the user's different types of interactions with the smartphone: physical conditions and activity, device status and usage, core functions usage, and app usage. On top of this, we develop the privacy model PM-MoDaC specifically for apps related to the collection of mobile data, consisting of nine proposed privacy measures. We present the implementation of all of those measures in TYDR. Although the utilization of the user's personality based on the usage of his or her smartphone is a challenging endeavor, it seems to be a promising approach for various types of context-aware mobile applications. △ Less

Submitted 4 July, 2018; originally announced July 2018.

Comments: Accepted for publication at the 15th International Conference on Mobile Systems and Pervasive Computing (MobiSPC 2018)

arXiv:1803.06720 [pdf, other]

doi 10.1145/3197231.3197235

TYDR - Track Your Daily Routine. Android App for Tracking Smartphone Sensor and Usage Data

Authors: Felix Beierle, Vinh Thuy Tran, Mathias Allemand, Patrick Neff, Winfried Schlee, Thomas Probst, Rüdiger Pryss, Johannes Zimmermann

Abstract: We present the Android app TYDR (Track Your Daily Routine) which tracks smartphone sensor and usage data and utilizes standardized psychometric personality questionnaires. With the app, we aim at collecting data for researching correlations between the tracked smartphone data and the user's personality in order to predict personality from smartphone data. In this paper, we highlight our approaches… ▽ More We present the Android app TYDR (Track Your Daily Routine) which tracks smartphone sensor and usage data and utilizes standardized psychometric personality questionnaires. With the app, we aim at collecting data for researching correlations between the tracked smartphone data and the user's personality in order to predict personality from smartphone data. In this paper, we highlight our approaches in addressing the challenges in develo** such an app. We optimize the tracking of sensor data by assessing the trade-off of size of data and battery consumption and granularity of the stored information. Our user interface is designed to incentivize users to install the app and fill out questionnaires. TYDR processes and visualizes the tracked sensor and usage data as well as the results of the personality questionnaires. When develo** an app that will be used in psychological studies, requirements posed by ethics commissions / institutional review boards and data protection officials have to be met. We detail our approaches concerning those requirements regarding the anonymized storing of user data, informing the users about the data collection, and enabling an opt-out option. We present our process for anonymized data storing while still being able to identify individual users who successfully completed a psychological study with the app. △ Less

Submitted 18 March, 2018; originally announced March 2018.

Comments: Accepted for publication at the 5th IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft '18)

arXiv:1711.08996 [pdf, other]

Dense 3D Regression for Hand Pose Estimation

Authors: Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao

Abstract: We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters in… ▽ More We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters into a set of per-pixel estimations, i.e., 2D heat maps, 3D heat maps and unit 3D directional vector fields. The 2D/3D joint heat maps and 3D joint offsets are estimated via multi-task network cascades, which is trained end-to-end. The pixel-wise estimations can be directly translated into a vote casting scheme. A variant of mean shift is then used to aggregate local votes while enforcing consensus between the the estimated 3D pose and the pixel-wise 2D and 3D estimations by design. Our method is efficient and highly accurate. On MSRA and NYU hand dataset, our method outperforms all previous state-of-the-art approaches by a large margin. On the ICVL hand dataset, our method achieves similar accuracy compared to the currently proposed nearly saturated result and outperforms various other proposed methods. Code is available $\href{"https://github.com/melonwan/denseReg"}{\text{online}}$. △ Less

Submitted 24 November, 2017; originally announced November 2017.

arXiv:1709.05665 [pdf, other]

Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery

Authors: Thomas Probst, Kevis-Kokitsi Maninis, Ajad Chhatkuli, Mouloud Ourak, Emmanuel Vander Poorten, Luc Van Gool

Abstract: Computer vision and robotics are being increasingly applied in medical interventions. Especially in interventions where extreme precision is required they could make a difference. One such application is robot-assisted retinal microsurgery. In recent works, such interventions are conducted under a stereo-microscope, and with a robot-controlled surgical tool. The complementarity of computer vision… ▽ More Computer vision and robotics are being increasingly applied in medical interventions. Especially in interventions where extreme precision is required they could make a difference. One such application is robot-assisted retinal microsurgery. In recent works, such interventions are conducted under a stereo-microscope, and with a robot-controlled surgical tool. The complementarity of computer vision and robotics has however not yet been fully exploited. In order to improve the robot control we are interested in 3D reconstruction of the anatomy and in automatic tool localization using a stereo microscope. In this paper, we solve this problem for the first time using a single pipeline, starting from uncalibrated cameras to reach metric 3D reconstruction and registration, in retinal microsurgery. The key ingredients of our method are: (a) surgical tool landmark detection, and (b) 3D reconstruction with the stereo microscope, using the detected landmarks. To address the former, we propose a novel deep learning method that detects and recognizes keypoints in high definition images at higher than real-time speed. We use the detected 2D keypoints along with their corresponding 3D coordinates obtained from the robot sensors to calibrate the stereo microscope using an affine projection model. We design an online 3D reconstruction pipeline that makes use of smoothness constraints and performs robot-to-camera registration. The entire pipeline is extensively validated on open-sky porcine eye sequences. Quantitative and qualitative results are presented for all steps. △ Less

Submitted 20 November, 2017; v1 submitted 17 September, 2017; originally announced September 2017.

Comments: Accepted in Robotics and Automation Letters (RA-L). Project page: http://www.vision.ee.ethz.ch/~kmaninis/keypoints2stereo/index.html

arXiv:1702.03431 [pdf, other]

Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation

Authors: Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao

Abstract: State-of-the-art methods for 3D hand pose estimation from depth images require large amounts of annotated training data. We propose to model the statistical relationships of 3D hand poses and corresponding depth images using two deep generative models with a shared latent space. By design, our architecture allows for learning from unlabeled image data in a semi-supervised manner. Assuming a one-to… ▽ More State-of-the-art methods for 3D hand pose estimation from depth images require large amounts of annotated training data. We propose to model the statistical relationships of 3D hand poses and corresponding depth images using two deep generative models with a shared latent space. By design, our architecture allows for learning from unlabeled image data in a semi-supervised manner. Assuming a one-to-one map** between a pose and a depth map, any given point in the shared latent space can be projected into both a hand pose and a corresponding depth map. Regressing the hand pose can then be done by learning a discriminator to estimate the posterior of the latent pose given some depth maps. To improve generalization and to better exploit unlabeled depth maps, we jointly train a generator and a discriminator. At each iteration, the generator is updated with the back-propagated gradient from the discriminator to synthesize realistic depth maps of the articulated hand, while the discriminator benefits from an augmented training set of synthesized and unlabeled samples. The proposed discriminator network architecture is highly efficient and runs at 90 FPS on the CPU with accuracies comparable or better than state-of-art on 3 publicly available benchmarks. △ Less

Submitted 18 July, 2017; v1 submitted 11 February, 2017; originally announced February 2017.

Comments: 10 pages, 5 figures, accepted in CVPR 2017

arXiv:1612.05877 [pdf, other]

Deep Learning on Lie Groups for Skeleton-based Action Recognition

Authors: Zhiwu Huang, Chengde Wan, Thomas Probst, Luc Van Gool

Abstract: In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time war**, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture… ▽ More In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time war**, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation map** layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature dimensionality, the architecture is equipped with rotation pooling layers for the elements on the Lie group. Furthermore, we propose a logarithm map** layer to map the resulting manifold data into a tangent space that facilitates the application of regular output layers for the final classification. Evaluations of the proposed network for standard 3D human action recognition datasets clearly demonstrate its superiority over existing shallow Lie group feature learning methods as well as most conventional deep learning methods. △ Less

Submitted 11 April, 2017; v1 submitted 18 December, 2016; originally announced December 2016.

Comments: Accepted to CVPR 2017

Showing 1–22 of 22 results for author: Probst, T