Search | arXiv e-print repository

Brownian geometry

Abstract: We present different continuous models of random geometry that have been introduced and studied in the recent years. In particular, we consider the Brownian map, which is the universal scaling limit of large planar maps in the Gromov-Hausdorff sense, and the Brownian disk, which appears as the scaling limit of planar maps with a boundary. We discuss the construction of these models, and we emphasi… ▽ More We present different continuous models of random geometry that have been introduced and studied in the recent years. In particular, we consider the Brownian map, which is the universal scaling limit of large planar maps in the Gromov-Hausdorff sense, and the Brownian disk, which appears as the scaling limit of planar maps with a boundary. We discuss the construction of these models, and we emphasize the role played by Brownian motion indexed by the Brownian tree. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Comments: Survey paper written for the 21st Takagi lectures

MSC Class: 60D05

arXiv:1807.09169 [pdf, other]

Convolutional Simplex Projection Network (CSPN) for Weakly Supervised Semantic Segmentation

Authors: Rania Briq, Michael Moeller, Juergen Gall

Abstract: Weakly supervised semantic segmentation has been a subject of increased interest due to the scarcity of fully annotated images. We introduce a new approach for solving weakly supervised semantic segmentation with deep Convolutional Neural Networks (CNNs). The method introduces a novel layer which applies simplex projection on the output of a neural network using area constraints of class objects.… ▽ More Weakly supervised semantic segmentation has been a subject of increased interest due to the scarcity of fully annotated images. We introduce a new approach for solving weakly supervised semantic segmentation with deep Convolutional Neural Networks (CNNs). The method introduces a novel layer which applies simplex projection on the output of a neural network using area constraints of class objects. The proposed method is general and can be seamlessly integrated into any CNN architecture. Moreover, the projection layer allows strongly supervised models to be adapted to weakly supervised models effortlessly by substituting ground truth labels. Our experiments have shown that applying such an operation on the output of a CNN improves the accuracy of semantic segmentation in a weakly supervised setting with image-level labels. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Comments: BMVC 2018

arXiv:1806.08296 [pdf, other]

Are good local minima wide in sparse recovery?

Authors: Michael Moeller, Otmar Loffeld, Juergen Gall, Felix Krahmer

Abstract: The idea of compressed sensing is to exploit representations in suitable (overcomplete) dictionaries that allow to recover signals far beyond the Nyquist rate provided that they admit a sparse representation in the respective dictionary. The latter gives rise to the sparse recovery problem of finding the best sparse linear approximation of given data in a given generating system. In this paper we… ▽ More The idea of compressed sensing is to exploit representations in suitable (overcomplete) dictionaries that allow to recover signals far beyond the Nyquist rate provided that they admit a sparse representation in the respective dictionary. The latter gives rise to the sparse recovery problem of finding the best sparse linear approximation of given data in a given generating system. In this paper we analyze the iterative hard thresholding (IHT) algorithm as one of the most popular greedy methods for solving the sparse recovery problem, and demonstrate that systematically perturbing the IHT algorithm by adding noise to intermediate iterates yields improved results. Further improvements can be obtained by entirely rephrasing the problem as a parametric deep-learning-type of optimization problem. By introducing perturbations via dropout, we demonstrate to significantly outperform the classical IHT algorithm, obtaining $3$ to $6$ times lower average objective errors. △ Less

Submitted 21 June, 2018; originally announced June 2018.

arXiv:1806.07754 [pdf, other]

Spatio-Temporal Channel Correlation Networks for Action Classification

Authors: Ali Diba, Mohsen Fayyaz, Vivek Sharma, M. Mahdi Arzani, Rahman Yousefzadeh, Juergen Gall, Luc Van Gool

Abstract: The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts… ▽ More The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improved the performance by 2-3\% on Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101. △ Less

Submitted 7 February, 2019; v1 submitted 19 June, 2018; originally announced June 2018.

Comments: Accepted in ECCV 2018. arXiv admin note: substantial text overlap with arXiv:1711.08200

arXiv:1805.09521 [pdf, other]

AVID: Adversarial Visual Irregularity Detection

Authors: Mohammad Sabokrou, Masoud Pourreza, Mohsen Fayyaz, Rahim Entezari, Mahmood Fathy, Jürgen Gall, Ehsan Adeli

Abstract: Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to… ▽ More Real-time detection of irregularities in visual data is very invaluable and useful in many prospective applications including surveillance, patient monitoring systems, etc. With the surge of deep learning methods in the recent years, researchers have tried a wide spectrum of methods for different applications. However, for the case of irregularity or anomaly detection in videos, training an end-to-end model is still an open challenge, since often irregularity is not well-defined and there are not enough irregular samples to use during training. In this paper, inspired by the success of generative adversarial networks (GANs) for training deep models in unsupervised or self-supervised settings, we propose an end-to-end deep network for detection and fine localization of irregularities in videos (and images). Our proposed architecture is composed of two networks, which are trained in competing with each other while collaborating to find the irregularity. One network works as a pixel-level irregularity Inpainter, and the other works as a patch-level Detector. After an adversarial self-supervised training, in which I tries to fool D into accepting its inpainted output as regular (normal), the two networks collaborate to detect and fine-segment the irregularity in any given testing video. Our results on three different datasets show that our method can outperform the state-of-the-art and fine-segment the irregularity. △ Less

Submitted 17 July, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1805.06875 [pdf, other]

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

Authors: Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall

Abstract: Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning… ▽ More Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning of weakly annotated video data. We moreover show that explicit context and length modeling leads to huge improvements in video segmentation and labeling tasks andinclude these models into our framework. On several action segmentation benchmarks, we obtain an improvement of up to 10% compared to current state-of-the-art methods. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: CVPR 2018

arXiv:1805.04596 [pdf, other]

Joint Flow: Temporal Flow Fields for Multi Person Tracking

Authors: Andreas Doering, Umar Iqbal, Juergen Gall

Abstract: In this work we propose an online multi person pose tracking approach which works on two consecutive frames $I_{t-1}$ and $I_t$. The general formulation of our temporal network allows to rely on any multi person pose estimation approach as spatial network. From the spatial network we extract image features and pose features for both frames. These features serve as input for our temporal model that… ▽ More In this work we propose an online multi person pose tracking approach which works on two consecutive frames $I_{t-1}$ and $I_t$. The general formulation of our temporal network allows to rely on any multi person pose estimation approach as spatial network. From the spatial network we extract image features and pose features for both frames. These features serve as input for our temporal model that predicts Temporal Flow Fields (TFF). These TFF are vector fields which indicate the direction in which each body joint is going to move from frame $I_{t-1}$ to frame $I_t$. This novel representation allows to formulate a similarity measure of detected joints. These similarities are used as binary potentials in a bipartite graph optimization problem in order to perform tracking of multiple poses. We show that these TFF can be learned by a relative small CNN network whilst achieving state-of-the-art multi person pose tracking results. △ Less

Submitted 20 July, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: Accepted to BMVC

arXiv:1804.09534 [pdf, other]

Hand Pose Estimation via Latent 2.5D Heatmap Regression

Authors: Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, Jan Kautz

Abstract: Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only… ▽ More Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only an RGB image. In this paper we propose a new method for 3D hand pose estimation from a monocular image through a novel 2.5D pose representation. Our new representation estimates pose up to a scaling factor, which can be estimated additionally if a prior of the hand size is given. We implicitly learn depth maps and heatmap distributions with a novel CNN architecture. Our system achieves the state-of-the-art estimation of 2D and 3D hand pose on several challenging datasets in presence of severe occlusions. △ Less

Submitted 25 April, 2018; originally announced April 2018.

arXiv:1804.03550 [pdf, other]

Two Stream 3D Semantic Scene Completion

Authors: Martin Garbade, Yueh-Tung Chen, Johann Sawatzky, Juergen Gall

Abstract: Inferring the 3D geometry and the semantic meaning of surfaces, which are occluded, is a very challenging task. Recently, a first end-to-end learning approach has been proposed that completes a scene from a single depth image. The approach voxelizes the scene and predicts for each voxel if it is occupied and, if it is occupied, the semantic class label. In this work, we propose a two stream approa… ▽ More Inferring the 3D geometry and the semantic meaning of surfaces, which are occluded, is a very challenging task. Recently, a first end-to-end learning approach has been proposed that completes a scene from a single depth image. The approach voxelizes the scene and predicts for each voxel if it is occupied and, if it is occupied, the semantic class label. In this work, we propose a two stream approach that leverages depth information and semantic information, which is inferred from the RGB image, for this task. The approach constructs an incomplete 3D semantic tensor, which uses a compact three-channel encoding for the inferred semantic information, and uses a 3D CNN to infer the complete 3D semantic tensor. In our experimental evaluation, we show that the proposed two stream approach substantially outperforms the state-of-the-art for semantic scene completion. △ Less

Submitted 15 May, 2019; v1 submitted 10 April, 2018; originally announced April 2018.

arXiv:1804.00892 [pdf, other]

When will you do what? - Anticipating Temporal Occurrences of Activities

Authors: Yazan Abu Farha, Alexander Richard, Juergen Gall

Abstract: Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amo… ▽ More Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amount of future actions and their durations. Both, a CNN and an RNN are trained to learn future video labels based on previously seen content. We show that our methods generate accurate predictions of the future even for long videos with a huge amount of different actions and can even deal with noisy or erroneous input information. △ Less

Submitted 3 April, 2018; originally announced April 2018.

Comments: CVPR 2018

arXiv:1803.07152 [pdf, other]

Exploring the predictability of range-based volatility estimators using RNNs

Authors: Gábor Petneházi, József Gáll

Abstract: We investigate the predictability of several range-based stock volatility estimators, and compare them to the standard close-to-close estimator which is most commonly acknowledged as the volatility. The patterns of volatility changes are analyzed using LSTM recurrent neural networks, which are a state of the art method of sequence learning. We implement the analysis on all current constituents of… ▽ More We investigate the predictability of several range-based stock volatility estimators, and compare them to the standard close-to-close estimator which is most commonly acknowledged as the volatility. The patterns of volatility changes are analyzed using LSTM recurrent neural networks, which are a state of the art method of sequence learning. We implement the analysis on all current constituents of the Dow Jones Industrial Average index, and report averaged evaluation results. We find that changes in the values of range-based estimators are more predictable than that of the estimator using daily closing values only. △ Less

Submitted 19 March, 2018; originally announced March 2018.

arXiv:1802.02091 [pdf, other]

Structural Recurrent Neural Network (SRNN) for Group Activity Analysis

Authors: Sovan Biswas, Juergen Gall

Abstract: A group of persons can be analyzed at various semantic levels such as individual actions, their interactions, and the activity of the entire group. In this paper, we propose a structural recurrent neural network (SRNN) that uses a series of interconnected RNNs to jointly capture the actions of individuals, their interactions, as well as the group activity. While previous structural recurrent neura… ▽ More A group of persons can be analyzed at various semantic levels such as individual actions, their interactions, and the activity of the entire group. In this paper, we propose a structural recurrent neural network (SRNN) that uses a series of interconnected RNNs to jointly capture the actions of individuals, their interactions, as well as the group activity. While previous structural recurrent neural networks assumed that the number of nodes and edges is constant, we use a grid pooling layer to address the fact that the number of individuals in a group can vary. We evaluate two variants of the structural recurrent neural network on the Volleyball Dataset. △ Less

Submitted 6 February, 2018; originally announced February 2018.

Comments: Accepted in WACV 2018

arXiv:1711.03874 [pdf]

Material Classification in the Wild: Do Synthesized Training Data Generalise Better than Real-World Training Data?

Authors: Grigorios Kalliatakis, Anca Sticlaru, George Stamatiadis, Shoaib Ehsan, Ales Leonardis, Juergen Gall, Klaus D. McDonald-Maier

Abstract: We question the dominant role of real-world training images in the field of material classification by investigating whether synthesized data can generalise more effectively than real-world data. Experimental results on three challenging real-world material databases show that the best performing pre-trained convolutional neural network (CNN) architectures can achieve up to 91.03% mean average pre… ▽ More We question the dominant role of real-world training images in the field of material classification by investigating whether synthesized data can generalise more effectively than real-world data. Experimental results on three challenging real-world material databases show that the best performing pre-trained convolutional neural network (CNN) architectures can achieve up to 91.03% mean average precision when classifying materials in cross-dataset scenarios. We demonstrate that synthesized data achieve an improvement on mean average precision when used as training data and in conjunction with pre-trained CNN architectures, which spans from ~ 5% to ~ 19% across three widely used material databases of real-world images. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: accepted for publication in VISAPP 2018. arXiv admin note: text overlap with arXiv:1703.04101

arXiv:1710.10000 [pdf, other]

PoseTrack: A Benchmark for Human Pose Estimation and Tracking

Authors: Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, Bernt Schiele

Abstract: Human poses and motions are important cues for analysis of videos with people and there is strong evidence that representations based on body pose are highly effective for a variety of tasks such as activity recognition, content retrieval and social signal processing. In this work, we aim to further advance the state of the art by establishing "PoseTrack", a new large-scale benchmark for video-bas… ▽ More Human poses and motions are important cues for analysis of videos with people and there is strong evidence that representations based on body pose are highly effective for a variety of tasks such as activity recognition, content retrieval and social signal processing. In this work, we aim to further advance the state of the art by establishing "PoseTrack", a new large-scale benchmark for video-based human pose estimation and articulated tracking, and bringing together the community of researchers working on visual human analysis. The benchmark encompasses three competition tracks focusing on i) single-frame multi-person pose estimation, ii) multi-person pose estimation in videos, and iii) multi-person articulated tracking. To facilitate the benchmark and challenge we collect, annotate and release a new %large-scale benchmark dataset that features videos with multiple people labeled with person tracks and articulated pose. A centralized evaluation server is provided to allow participants to evaluate on a held-out test set. We envision that the proposed benchmark will stimulate productive research both by providing a large and representative training dataset as well as providing a platform to objectively evaluate and compare the proposed methods. The benchmark is freely accessible at https://posetrack.net. △ Less

Submitted 10 April, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

Comments: www.posetrack.net

arXiv:1710.02990 [pdf, other]

Separating cycles and isoperimetric inequalities in the uniform infinite planar quadrangulation

Authors: Jean-Francois Le Gall, Thomas Lehéricy

Abstract: We study geometric properties of the infinite random lattice called the uniform infinite planar quadrangulation or UIPQ. We establish a precise form of a conjecture of Krikun stating that the minimal size of a cycle that separates the ball of radius $R$ centered at the root vertex from infinity grows linearly in $R$. As a consequence, we derive certain isoperimetric bounds showing that the boundar… ▽ More We study geometric properties of the infinite random lattice called the uniform infinite planar quadrangulation or UIPQ. We establish a precise form of a conjecture of Krikun stating that the minimal size of a cycle that separates the ball of radius $R$ centered at the root vertex from infinity grows linearly in $R$. As a consequence, we derive certain isoperimetric bounds showing that the boundary size of any connected set $A$ consisting of a finite union of faces of the UIPQ and containing the root vertex is bounded below by a (random) constant times $|A|^{1/4}(\log|A|)^{-(3/4)-δ}$, where the volume $|A|$ is the number of faces in $A$. △ Less

Submitted 11 June, 2018; v1 submitted 9 October, 2017; originally announced October 2017.

Comments: Revised version, 47 pages, to appear in the Annals of Probability

MSC Class: 05C80; 60D05

arXiv:1708.01749 [pdf, other]

doi 10.1109/ICCV.2017.253

SurfaceNet: An End-to-end 3D Neural Network for Multiview Stereopsis

Authors: Mengqi Ji, Juergen Gall, Haitian Zheng, Yebin Liu, Lu Fang

Abstract: This paper proposes an end-to-end learning framework for multiview stereopsis. We term the network SurfaceNet. It takes a set of images and their corresponding camera parameters as input and directly infers the 3D model. The key advantage of the framework is that both photo-consistency as well geometric relations of the surface structure can be directly learned for the purpose of multiview stereop… ▽ More This paper proposes an end-to-end learning framework for multiview stereopsis. We term the network SurfaceNet. It takes a set of images and their corresponding camera parameters as input and directly infers the 3D model. The key advantage of the framework is that both photo-consistency as well geometric relations of the surface structure can be directly learned for the purpose of multiview stereopsis in an end-to-end fashion. SurfaceNet is a fully 3D convolutional network which is achieved by encoding the camera parameters together with the images in a 3D voxel representation. We evaluate SurfaceNet on the large-scale DTU benchmark. △ Less

Submitted 5 August, 2017; originally announced August 2017.

Comments: 2017 iccv poster

Journal ref: 2017 ICCV

arXiv:1707.02850 [pdf, other]

Adaptive Binarization for Weakly Supervised Affordance Segmentation

Authors: Johann Sawatzky, Juergen Gall

Abstract: The concept of affordance is important to understand the relevance of object parts for a certain functional interaction. Affordance types generalize across object categories and are not mutually exclusive. This makes the segmentation of affordance regions of objects in images a difficult task. In this work, we build on an iterative approach that learns a convolutional neural network for affordance… ▽ More The concept of affordance is important to understand the relevance of object parts for a certain functional interaction. Affordance types generalize across object categories and are not mutually exclusive. This makes the segmentation of affordance regions of objects in images a difficult task. In this work, we build on an iterative approach that learns a convolutional neural network for affordance segmentation from sparse keypoints. During this process, the predictions of the network need to be binarized. In this work, we propose an adaptive approach for binarization and estimate the parameters for initialization by approximated cross validation. We evaluate our approach on two affordance datasets where our approach outperforms the state-of-the-art for weakly supervised affordance segmentation. △ Less

Submitted 10 July, 2017; originally announced July 2017.

arXiv:1706.08807 [pdf, ps, other]

Recurrent Residual Learning for Action Recognition

Authors: Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall

Abstract: Action recognition is a fundamental problem in computer vision with a lot of potential applications such as video surveillance, human computer interaction, and robot learning. Given pre-segmented videos, the task is to recognize actions happening within videos. Historically, hand crafted video features were used to address the task of action recognition. With the success of Deep ConvNets as an ima… ▽ More Action recognition is a fundamental problem in computer vision with a lot of potential applications such as video surveillance, human computer interaction, and robot learning. Given pre-segmented videos, the task is to recognize actions happening within videos. Historically, hand crafted video features were used to address the task of action recognition. With the success of Deep ConvNets as an image analysis method, a lot of extensions of standard ConvNets were purposed to process variable length video data. In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition. The approach extends ResNet, a state of the art model for image classification. While the original formulation of ResNet aims at learning spatial residuals in its layers, we extend the approach by introducing recurrent connections that allow to learn a spatio-temporal residual. In contrast to fully recurrent networks, our temporal connections only allow a limited range of preceding frames to contribute to the output for the current frame, enabling efficient training and inference as well as limiting the temporal context to a reasonable local range around each frame. On a large-scale action recognition dataset, we show that our model improves over both, the standard ResNet architecture and a ResNet extended by a fully recurrent layer. △ Less

Submitted 27 June, 2017; originally announced June 2017.

arXiv:1706.00699 [pdf, other]

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

Authors: Alexander Richard, Hilde Kuehne, Juergen Gall

Abstract: Action detection and temporal segmentation of actions in videos are topics of increasing interest. While fully supervised systems have gained much attention lately, full annotation of each action within the video is costly and impractical for large amounts of video data. Thus, weakly supervised action detection and temporal segmentation methods are of great importance. While most works in this are… ▽ More Action detection and temporal segmentation of actions in videos are topics of increasing interest. While fully supervised systems have gained much attention lately, full annotation of each action within the video is costly and impractical for large amounts of video data. Thus, weakly supervised action detection and temporal segmentation methods are of great importance. While most works in this area assume an ordered sequence of occurring actions to be given, our approach only uses a set of actions. Such action sets provide much less supervision since neither action ordering nor the number of action occurrences are known. In exchange, they can be easily obtained, for instance, from meta-tags, while ordered sequences still require human annotation. We introduce a system that automatically learns to temporally segment and label actions in a video, where the only supervision that is used are action sets. An evaluation on three datasets shows that our method still achieves good results although the amount of supervision is significantly smaller than for other related methods. △ Less

Submitted 17 May, 2018; v1 submitted 2 June, 2017; originally announced June 2017.

Comments: CVPR 2018

arXiv:1705.02883 [pdf, other]

A Dual-Source Approach for 3D Human Pose Estimation from a Single Image

Authors: Umar Iqbal, Andreas Doering, Hashim Yasin, Björn Krüger, Andreas Weber, Juergen Gall

Abstract: In this work we address the challenging problem of 3D human pose estimation from single images. Recent approaches learn deep neural networks to regress 3D pose directly from images. One major challenge for such methods, however, is the collection of training data. Specifically, collecting large amounts of training data containing unconstrained images annotated with accurate 3D poses is infeasible.… ▽ More In this work we address the challenging problem of 3D human pose estimation from single images. Recent approaches learn deep neural networks to regress 3D pose directly from images. One major challenge for such methods, however, is the collection of training data. Specifically, collecting large amounts of training data containing unconstrained images annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of accurate 3D motion capture data, and the second source consists of unconstrained images with annotated 2D poses. To integrate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient 3D pose retrieval. To this end, we first convert the motion capture data into a normalized 2D pose space, and separately learn a 2D pose estimation model from the image data. During inference, we estimate the 2D pose and efficiently retrieve the nearest 3D poses. We then jointly estimate a map** from the 3D pose space to the image and reconstruct the 3D pose. We provide a comprehensive evaluation of the proposed method and experimentally demonstrate the effectiveness of our approach, even when the skeleton structures of the two sources differ substantially. △ Less

Submitted 6 September, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

Comments: under consideration at Computer Vision and Image Understanding. Extended version of CVPR-2016 paper, arXiv:1509.06720

arXiv:1704.08987 [pdf, other]

Brownian disks and the Brownian snake

Authors: Jean-François Le Gall

Abstract: We provide a new construction of the Brownian disks, which have been defined by Bettinelli and Miermont as scaling limits of quadrangulations with a boundary when the boundary size tends to infinity. Our method is very similar to the construction of the Brownian map, but it makes use of the positive excursion measure of the Brownian snake which has been introduced recently. This excursion measure… ▽ More We provide a new construction of the Brownian disks, which have been defined by Bettinelli and Miermont as scaling limits of quadrangulations with a boundary when the boundary size tends to infinity. Our method is very similar to the construction of the Brownian map, but it makes use of the positive excursion measure of the Brownian snake which has been introduced recently. This excursion measure involves a random continuous tree whose vertices are assigned nonnegative labels, which correspond to distances from the boundary in our approach to the Brownian disk. We provide several applications of our construction. In particular, we prove that the uniform measure on the boundary can be obtained as the limit of the suitably normalized volume measure on a small tubular neighborhood of the boundary. We also prove that connected components of the complement of the Brownian net are Brownian disks, as it was suggested in the recent work of Miller and Sheffield. Finally, we show that connected components of the complement of balls centered at the distinguished point of the Brownian map are independent Brownian disks, conditionally on their volumes and perimeters. △ Less

Submitted 20 October, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

Comments: Revised version taking account of the refeee's comments - 87 pages

MSC Class: 60D05

arXiv:1704.00529 [pdf, other]

doi 10.1109/ICCV.2015.90

3D Object Reconstruction from Hand-Object Interactions

Authors: Dimitrios Tzionas, Juergen Gall

Abstract: Recent advances have enabled 3d object reconstruction approaches using a single off-the-shelf RGB-D camera. Although these approaches are successful for a wide range of object classes, they rely on stable and distinctive geometric or texture features. Many objects like mechanical parts, toys, household or decorative articles, however, are textureless and characterized by minimalistic shapes that a… ▽ More Recent advances have enabled 3d object reconstruction approaches using a single off-the-shelf RGB-D camera. Although these approaches are successful for a wide range of object classes, they rely on stable and distinctive geometric or texture features. Many objects like mechanical parts, toys, household or decorative articles, however, are textureless and characterized by minimalistic shapes that are simple and symmetric. Existing in-hand scanning systems and 3d reconstruction techniques fail for such symmetric objects in the absence of highly distinctive features. In this work, we show that extracting 3d hand motion for in-hand scanning effectively facilitates the reconstruction of even featureless and highly symmetric objects and we present an approach that fuses the rich additional information of hands into a 3d reconstruction pipeline, significantly contributing to the state-of-the-art of in-hand scanning. △ Less

Submitted 3 April, 2017; originally announced April 2017.

Comments: International Conference on Computer Vision (ICCV) 2015, http://files.is.tue.mpg.de/dtzionas/In-Hand-Scanning

arXiv:1704.00515 [pdf, other]

doi 10.1007/978-3-319-11752-2_22

Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points

Authors: Dimitrios Tzionas, Abhilash Srikantha, Pablo Aponte, Juergen Gall

Abstract: Hand motion capture has been an active research topic in recent years, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera syst… ▽ More Hand motion capture has been an active research topic in recent years, following the success of full-body pose tracking. Despite similarities, hand tracking proves to be more challenging, characterized by a higher dimensionality, severe occlusions and self-similarity between fingers. For this reason, most approaches rely on strong assumptions, like hands in isolation or expensive multi-camera systems, that limit the practical use. In this work, we propose a framework for hand tracking that can capture the motion of two interacting hands using only a single, inexpensive RGB-D camera. Our approach combines a generative model with collision detection and discriminatively learned salient points. We quantitatively evaluate our approach on 14 new sequences with challenging interactions. △ Less

Submitted 3 April, 2017; originally announced April 2017.

Comments: German Conference on Pattern Recognition (GCPR) 2014, http://files.is.tue.mpg.de/dtzionas/GCPR_2014.html

arXiv:1704.00492 [pdf, other]

doi 10.1007/978-3-642-40602-7_14

A Comparison of Directional Distances for Hand Pose Estimation

Authors: Dimitrios Tzionas, Juergen Gall

Abstract: Benchmarking methods for 3d hand tracking is still an open problem due to the difficulty of acquiring ground truth data. We introduce a new dataset and benchmarking protocol that is insensitive to the accumulative error of other protocols. To this end, we create testing frame pairs of increasing difficulty and measure the pose estimation error separately for each of them. This approach gives new i… ▽ More Benchmarking methods for 3d hand tracking is still an open problem due to the difficulty of acquiring ground truth data. We introduce a new dataset and benchmarking protocol that is insensitive to the accumulative error of other protocols. To this end, we create testing frame pairs of increasing difficulty and measure the pose estimation error separately for each of them. This approach gives new insights and allows to accurately study the performance of each feature or method without employing a full tracking pipeline. Following this protocol, we evaluate various directional distances in the context of silhouette-based 3d hand tracking, expressed as special cases of a generalized Chamfer distance form. An appropriate parameter setup is proposed for each of them, and a comparative study reveals the best performing method in this context. △ Less

Submitted 3 April, 2017; originally announced April 2017.

Comments: German Conference on Pattern Recognition (GCPR) 2013, http://files.is.tue.mpg.de/dtzionas/GCPR_2013.html

arXiv:1703.08132 [pdf, other]

Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

Authors: Alexander Richard, Hilde Kuehne, Juergen Gall

Abstract: We present an approach for weakly supervised learning of human actions. Given a set of videos and an ordered list of the occurring actions, the goal is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. To address this task, we propose a combination of a discriminative repre… ▽ More We present an approach for weakly supervised learning of human actions. Given a set of videos and an ordered list of the occurring actions, the goal is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. To address this task, we propose a combination of a discriminative representation of subactions, modeled by a recurrent neural network, and a coarse probabilistic model to allow for a temporal alignment and inference over long sequences. While this system alone already generates good results, we show that the performance can be further improved by approximating the number of subactions to the characteristics of the different action classes. To this end, we adapt the number of subaction classes by iterating realignment and reestimation during training. The proposed system is evaluated on two benchmark datasets, the Breakfast and the Hollywood extended dataset, showing a competitive performance on various weak learning tasks such as temporal action segmentation and action alignment. △ Less

Submitted 9 October, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

arXiv:1703.08089 [pdf, other]

A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition

Authors: Alexander Richard, Juergen Gall

Abstract: The traditional bag-of-words approach has found a wide range of applications in computer vision. The standard pipeline consists of a generation of a visual vocabulary, a quantization of the features into histograms of visual words, and a classification step for which usually a support vector machine in combination with a non-linear kernel is used. Given large amounts of data, however, the model su… ▽ More The traditional bag-of-words approach has found a wide range of applications in computer vision. The standard pipeline consists of a generation of a visual vocabulary, a quantization of the features into histograms of visual words, and a classification step for which usually a support vector machine in combination with a non-linear kernel is used. Given large amounts of data, however, the model suffers from a lack of discriminative power. This applies particularly for action recognition, where the vast amount of video features needs to be subsampled for unsupervised visual vocabulary generation. Moreover, the kernel computation can be very expensive on large datasets. In this work, we propose a recurrent neural network that is equivalent to the traditional bag-of-words approach but enables for the application of discriminative training. The model further allows to incorporate the kernel computation into the neural network directly, solving the complexity issue and allowing to represent the complete classification system within a single network. We evaluate our method on four recent action recognition benchmarks and show that the conventional model as well as sparse coding methods are outperformed. △ Less

Submitted 23 March, 2017; originally announced March 2017.

arXiv:1703.04103 [pdf, other]

Detection of Human Rights Violations in Images: Can Convolutional Neural Networks help?

Authors: Grigorios Kalliatakis, Shoaib Ehsan, Maria Fasli, Ales Leonardis, Juergen Gall, Klaus D. McDonald-Maier

Abstract: After setting the performance benchmarks for image, video, speech and audio processing, deep convolutional networks have been core to the greatest advances in image recognition tasks in recent times. This raises the question of whether there are any benefit in targeting these remarkable deep architectures with the unattempted task of recognising human rights violations through digital images. Unde… ▽ More After setting the performance benchmarks for image, video, speech and audio processing, deep convolutional networks have been core to the greatest advances in image recognition tasks in recent times. This raises the question of whether there are any benefit in targeting these remarkable deep architectures with the unattempted task of recognising human rights violations through digital images. Under this perspective, we introduce a new, well-sampled human rights-centric dataset called Human Rights Understanding (HRUN). We conduct a rigorous evaluation on a common ground by combining this dataset with different state-of-the-art deep convolutional architectures in order to achieve recognition of human rights violations. Experimental results on the HRUN dataset have shown that the best performing CNN architectures can achieve up to 88.10\% mean average precision. Additionally, our experiments demonstrate that increasing the size of the training samples is crucial for achieving an improvement on mean average precision principally when utilising very deep networks. △ Less

Submitted 16 March, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

Comments: In Proceedings of the 12th International Conference on Computer Vision Theory and Applications (VISAPP 2017), 8 pages

arXiv:1703.04101 [pdf, other]

Evaluating Deep Convolutional Neural Networks for Material Classification

Authors: Grigorios Kalliatakis, Georgios Stamatiadis, Shoaib Ehsan, Ales Leonardis, Juergen Gall, Anca Sticlaru, Klaus D. McDonald-Maier

Abstract: Determining the material category of a surface from an image is a demanding task in perception that is drawing increasing attention. Following the recent remarkable results achieved for image classification and object detection utilising Convolutional Neural Networks (CNNs), we empirically study material classification of everyday objects employing these techniques. More specifically, we conduct a… ▽ More Determining the material category of a surface from an image is a demanding task in perception that is drawing increasing attention. Following the recent remarkable results achieved for image classification and object detection utilising Convolutional Neural Networks (CNNs), we empirically study material classification of everyday objects employing these techniques. More specifically, we conduct a rigorous evaluation of how state-of-the art CNN architectures compare on a common ground over widely used material databases. Experimental results on three challenging material databases show that the best performing CNN architectures can achieve up to 94.99\% mean average precision when classifying materials. △ Less

Submitted 16 March, 2017; v1 submitted 12 March, 2017; originally announced March 2017.

Comments: In Proceedings of the 12th International Conference on Computer Vision Theory and Applications (VISAPP 2017), 7 pages

arXiv:1611.07727 [pdf, other]

PoseTrack: Joint Multi-Person Pose Estimation and Tracking

Authors: Umar Iqbal, Anton Milan, Juergen Gall

Abstract: In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We theref… ▽ More In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore propose a novel method that jointly models multi-person pose estimation and tracking in a single formulation. To this end, we represent body joint detections in a video by a spatio-temporal graph and solve an integer linear program to partition the graph into sub-graphs that correspond to plausible body pose trajectories for each person. The proposed approach implicitly handles occlusion and truncation of persons. Since the problem has not been addressed quantitatively in the literature, we introduce a challenging "Multi-Person PoseTrack" dataset, and also propose a completely unconstrained evaluation protocol that does not make any assumptions about the scale, size, location or the number of persons. Finally, we evaluate the proposed approach and several baseline methods on our new dataset. △ Less

Submitted 7 April, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

Comments: Accepted to CVPR 2017

arXiv:1610.02237 [pdf, other]

Weakly supervised learning of actions from transcripts

Authors: Hilde Kuehne, Alexander Richard, Juergen Gall

Abstract: We present an approach for weakly supervised learning of human actions from video transcriptions. Our system is based on the idea that, given a sequence of input data and a transcript, i.e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation. Starti… ▽ More We present an approach for weakly supervised learning of human actions from video transcriptions. Our system is based on the idea that, given a sequence of input data and a transcript, i.e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation. Starting from the transcript information at hand, we split the given data sequences uniformly based on the number of expected actions. We then learn action models for each class by maximizing the probability that the training video sequences are generated by the action models given the sequence order as defined by the transcripts. The learned model can be used to temporally segment an unseen video with or without transcript. We evaluate our approach on four distinct activity datasets, namely Hollywood Extended, MPII Cooking, Breakfast and CRIM13. We show that our system is able to align the scripted actions with the video data and that the learned models localize and classify actions competitively in comparison to models trained with full supervision, i.e. with frame level annotations, and that they outperform any current state-of-the-art approach for aligning transcripts with video data. △ Less

Submitted 19 June, 2017; v1 submitted 7 October, 2016; originally announced October 2016.

Comments: 33 pages, 9 figures, to appear in CVIU

arXiv:1609.01371 [pdf, other]

Reconstructing Articulated Rigged Models from RGB-D Videos

Authors: Dimitrios Tzionas, Juergen Gall

Abstract: Although commercial and open-source software exist to reconstruct a static object from a sequence recorded with an RGB-D sensor, there is a lack of tools that build rigged models of articulated objects that deform realistically and can be used for tracking or animation. In this work, we fill this gap and propose a method that creates a fully rigged model of an articulated object from depth data of… ▽ More Although commercial and open-source software exist to reconstruct a static object from a sequence recorded with an RGB-D sensor, there is a lack of tools that build rigged models of articulated objects that deform realistically and can be used for tracking or animation. In this work, we fill this gap and propose a method that creates a fully rigged model of an articulated object from depth data of a single sensor. To this end, we combine deformable mesh tracking, motion segmentation based on spectral clustering and skeletonization based on mean curvature flow. The fully rigged model then consists of a watertight mesh, embedded skeleton, and skinning weights. △ Less

Submitted 9 September, 2016; v1 submitted 5 September, 2016; originally announced September 2016.

Comments: Accepted for publication - European Conference on Computer Vision Workshops 2016 (ECCVW'16) - Workshop on Recovering 6D Object Pose (R6D'16)

arXiv:1608.08526 [pdf, other]

Multi-Person Pose Estimation with Local Joint-to-Person Associations

Authors: Umar Iqbal, Juergen Gall

Abstract: Despite of the recent success of neural networks for human pose estimation, current approaches are limited to pose estimation of a single person and cannot handle humans in groups or crowds. In this work, we propose a method that estimates the poses of multiple persons in an image in which a person can be occluded by another person or might be truncated. To this end, we consider multi-person pose… ▽ More Despite of the recent success of neural networks for human pose estimation, current approaches are limited to pose estimation of a single person and cannot handle humans in groups or crowds. In this work, we propose a method that estimates the poses of multiple persons in an image in which a person can be occluded by another person or might be truncated. To this end, we consider multi-person pose estimation as a joint-to-person association problem. We construct a fully connected graph from a set of detected joint candidates in an image and resolve the joint-to-person association and outlier detection using integer linear programming. Since solving joint-to-person association jointly for all persons in an image is an NP-hard problem and even approximations are expensive, we solve the problem locally for each person. On the challenging MPII Human Pose Dataset for multiple persons, our approach achieves the accuracy of a state-of-the-art method, but it is 6,000 to 19,000 times faster. △ Less

Submitted 31 August, 2016; v1 submitted 30 August, 2016; originally announced August 2016.

Comments: Accepted to European Conference on Computer Vision (ECCV) Workshops, Crowd Understanding, 2016

arXiv:1605.07601 [pdf, other]

Subordination of trees and the Brownian map

Authors: Jean-François Le Gall

Abstract: We discuss subordination of random compact R-trees. We focus on the case of the Brownian tree, where the subordination function is given by the past maximum process of Brownian motion indexed by the tree. In that particular case, the subordinate tree is identified as a stable Levy tree with index 3/2. As a more precise alternative formulation, we show that the maximum process of the Brownian snake… ▽ More We discuss subordination of random compact R-trees. We focus on the case of the Brownian tree, where the subordination function is given by the past maximum process of Brownian motion indexed by the tree. In that particular case, the subordinate tree is identified as a stable Levy tree with index 3/2. As a more precise alternative formulation, we show that the maximum process of the Brownian snake is a time change of the height process coding the Levy tree. We then apply our results to properties of the Brownian map. In particular, we recover, in a more precise form, a recent result of Miller and Sheffield identifying the metric net associated with the Brownian map. △ Less

Submitted 24 May, 2016; originally announced May 2016.

Comments: 39 pages

MSC Class: 60J80; 60J65

arXiv:1605.02964 [pdf, other]

Weakly Supervised Learning of Affordances

Authors: Abhilash Srikantha, Juergen Gall

Abstract: Localizing functional regions of objects or affordances is an important aspect of scene understanding. In this work, we cast the problem of affordance segmentation as that of semantic image segmentation. In order to explore various levels of supervision, we introduce a pixel-annotated affordance dataset of 3090 images containing 9916 object instances with rich contextual information in terms of hu… ▽ More Localizing functional regions of objects or affordances is an important aspect of scene understanding. In this work, we cast the problem of affordance segmentation as that of semantic image segmentation. In order to explore various levels of supervision, we introduce a pixel-annotated affordance dataset of 3090 images containing 9916 object instances with rich contextual information in terms of human-object interactions. We use a deep convolutional neural network within an expectation maximization framework to take advantage of weakly labeled data like image level annotations or keypoint annotations. We show that a further reduction in supervision is possible with a minimal loss in performance when human pose is used as context. △ Less

Submitted 29 July, 2016; v1 submitted 10 May, 2016; originally announced May 2016.

arXiv:1603.04037 [pdf, other]

Pose for Action - Action for Pose

Authors: Umar Iqbal, Martin Garbade, Juergen Gall

Abstract: In this work we propose to utilize information about human actions to improve pose estimation in monocular videos. To this end, we present a pictorial structure model that exploits high-level information about activities to incorporate higher-order part dependencies by modeling action specific appearance models and pose priors. However, instead of using an additional expensive action recognition f… ▽ More In this work we propose to utilize information about human actions to improve pose estimation in monocular videos. To this end, we present a pictorial structure model that exploits high-level information about activities to incorporate higher-order part dependencies by modeling action specific appearance models and pose priors. However, instead of using an additional expensive action recognition framework, the action priors are efficiently estimated by our pose estimation framework. This is achieved by starting with a uniform action prior and updating the action prior during pose estimation. We also show that learning the right amount of appearance sharing among action classes improves the pose estimation. We demonstrate the effectiveness of the proposed method on two challenging datasets for pose estimation and action recognition with over 80,000 test images. △ Less

Submitted 10 February, 2017; v1 submitted 13 March, 2016; originally announced March 2016.

Comments: Accepted to FG-2017

arXiv:1511.04264 [pdf, other]

First-passage percolation and local modifications of distances in random triangulations

Authors: Nicolas Curien, Jean-François Le Gall

Abstract: We study local modifications of the graph distance in large random triangulations. Our main results show that, in large scales, the modified distance behaves like a deterministic constant $\mathbf{c}~\in~(0,\infty)$ times the usual graph distance. This applies in particular to the first-passage percolation distance obtained by assigning independent random weights to the edges of the graph. We also… ▽ More We study local modifications of the graph distance in large random triangulations. Our main results show that, in large scales, the modified distance behaves like a deterministic constant $\mathbf{c}~\in~(0,\infty)$ times the usual graph distance. This applies in particular to the first-passage percolation distance obtained by assigning independent random weights to the edges of the graph. We also consider the graph distance on the dual map, and the first-passage percolation on the dual map with exponential edge weights, which is closely related to the so-called Eden model. In the latter two cases, we are able to compute explicitly the constant $\mathbf{c}$ by using earlier results about asymptotics for the peeling process. In general however, the constant $\mathbf{c}$ is obtained from a subadditivity argument in the infinite half-plane model that describes the asymptotic shape of the triangulation near the boundary of a large ball. Our results apply in particular to the infinite random triangulation known as the UIPT, and show that balls of the UIPT for the modified distance are asymptotically close to balls for the graph distance. △ Less

Submitted 13 November, 2015; originally announced November 2015.

arXiv:1509.06720 [pdf, other]

A Dual-Source Approach for 3D Pose Estimation from a Single Image

Authors: Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, Juergen Gall

Abstract: One major challenge for 3D pose estimation from a single RGB image is the acquisition of sufficient training data. In particular, collecting large amounts of training data that contain unconstrained images and are annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of images with annotated 2D poses and the second s… ▽ More One major challenge for 3D pose estimation from a single RGB image is the acquisition of sufficient training data. In particular, collecting large amounts of training data that contain unconstrained images and are annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of images with annotated 2D poses and the second source consists of accurate 3D motion capture data. To integrate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient and robust 3D pose retrieval. In our experiments, we show that our approach achieves state-of-the-art results and is even competitive when the skeleton structure of the two sources differ substantially. △ Less

Submitted 27 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

Comments: Accepted to CVPR 2016. The source code and models are publicly available. Title changed from the previous version

arXiv:1509.06616 [pdf, ps, other]

Excursion theory for Brownian motion indexed by the Brownian tree

Authors: Céline Abraham, Jean-François Le Gall

Abstract: We develop an excursion theory for Brownian motion indexed by the Brownian tree, which in many respects is analogous to the classical Itô theory for linear Brownian motion. Each excursion is associated with a connected component of the complement of the zero set of the tree-indexed Brownian motion. Each such connectedcomponent is itself a continuous tree, and we introduce a quantity measuring the… ▽ More We develop an excursion theory for Brownian motion indexed by the Brownian tree, which in many respects is analogous to the classical Itô theory for linear Brownian motion. Each excursion is associated with a connected component of the complement of the zero set of the tree-indexed Brownian motion. Each such connectedcomponent is itself a continuous tree, and we introduce a quantity measuring the length of its boundary. The collection of boundary lengths coincides with the collection of jumps of a continuous-state branching process with branching mechanism $ψ(u)=\sqrt{8/3}\,u^{3/2}$. Furthermore, conditionally on the boundary lengths, the different excursions are independent, and we determine their conditional distribution in terms of an excursion measure $\mathbb{M}_0$ which is the analog of the Itô measure of Brownian excursions. We provide various descriptions of the excursion measure $\mathbb{M}_0$, and we also determine several explicit distributions, such as the joint distribution of the boundary length and the mass of an excursion under $\mathbb{M}_0$. We use the Brownian snake as a convenient tool for defining and analysing the excursions of our tree-indexed Brownian motion. △ Less

Submitted 12 September, 2018; v1 submitted 22 September, 2015; originally announced September 2015.

Comments: 46 pages, final version with very few minor corrections, to appear in JEMS

MSC Class: 60J68; 60J80; 60J65

arXiv:1509.01947 [pdf, other]

An end-to-end generative framework for video segmentation and recognition

Authors: Hilde Kuehne, Juergen Gall, Thomas Serre

Abstract: We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system… ▽ More We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system is evaluated for both the recognition of complex activities as well as their parsing into action units. Using a variety of video datasets ranging from human cooking activities to animal behaviors, our experiments demonstrate that the resulting architecture outperforms state-of-the-art approaches for larger datasets, i.e. when sufficient amount of data is available for training structured generative models. △ Less

Submitted 17 March, 2016; v1 submitted 7 September, 2015; originally announced September 2015.

Comments: Proc. of IEEE Winter Conference on Applications of Computer Vision (WACV), 2016

arXiv:1508.06073 [pdf, other]

Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos

Authors: Hilde Kuehne, Juergen Gall, Thomas Serre

Abstract: As research on action recognition matures, the focus is shifting away from categorizing basic task-oriented actions using hand-segmented video datasets to understanding complex goal-oriented daily human activities in real-world settings. Temporally structured models would seem obvious to tackle this set of problems, but so far, cases where these models have outperformed simpler unstructured bag-of… ▽ More As research on action recognition matures, the focus is shifting away from categorizing basic task-oriented actions using hand-segmented video datasets to understanding complex goal-oriented daily human activities in real-world settings. Temporally structured models would seem obvious to tackle this set of problems, but so far, cases where these models have outperformed simpler unstructured bag-of-word types of models are scarce. With the increasing availability of large human activity datasets, combined with the development of novel feature coding techniques that yield more compact representations, it is time to revisit structured generative approaches. Here, we describe an end-to-end generative approach from the encoding of features to the structural modeling of complex human activities by applying Fisher vectors and temporal models for the analysis of video sequences. We systematically evaluate the proposed approach on several available datasets (ADL, MPIICooking, and Breakfast datasets) using a variety of performance metrics. Through extensive system evaluations, we demonstrate that combining compact video representations based on Fisher Vectors with HMM-based modeling yields very significant gains in accuracy and when properly trained with sufficient training samples, structured temporal models outperform unstructured bag-of-word types of models by a large margin on the tested performance metric. △ Less

Submitted 17 March, 2016; v1 submitted 25 August, 2015; originally announced August 2015.

Comments: 15 pages, 12 figures

arXiv:1506.02178 [pdf, other]

doi 10.1007/s11263-016-0895-4

Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

Authors: Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, Juergen Gall

Abstract: Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated object… ▽ More Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom. △ Less

Submitted 7 March, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

Comments: Accepted for publication by the International Journal of Computer Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14 hand tracking paper with several extensions, additional experiments and details

arXiv:1412.5509 [pdf, other]

Scaling limits for the peeling process on random maps

Authors: Nicolas Curien, Jean-François Le Gall

Abstract: We study the scaling limit of the volume and perimeter of the discovered regions in the Markovian explorations known as peeling processes for infinite random planar maps such as the uniform infinite planar triangulation (UIPT) or quadrangulation (UIPQ). In particular, our results apply to the metric exploration or peeling by layers algorithm, where the discovered regions are (almost) completed bal… ▽ More We study the scaling limit of the volume and perimeter of the discovered regions in the Markovian explorations known as peeling processes for infinite random planar maps such as the uniform infinite planar triangulation (UIPT) or quadrangulation (UIPQ). In particular, our results apply to the metric exploration or peeling by layers algorithm, where the discovered regions are (almost) completed balls, or hulls, centered at the root vertex. The scaling limits of the perimeter and volume of hulls can be expressed in terms of the hull process of the Brownian plane studied in our previous work. Other applications include the metric exploration of the dual graph of our infinite random lattices, and first-passage percolation with exponential edge weights on the dual graph, also known as the Eden model or uniform peeling. △ Less

Submitted 19 July, 2015; v1 submitted 17 December, 2014; originally announced December 2014.

Comments: New version with additional Theorem 3

arXiv:1409.4026 [pdf, other]

The hull process of the Brownian plane

Authors: Nicolas Curien, Jean-François Le Gall

Abstract: We study the random metric space called the Brownian plane, which is closely related to the Brownian map and is conjectured to be the universal scaling limit of many discrete random lattices such as the uniform infinite planar triangulation. We obtain a number of explicit distributions for the Brownian plane. In particular, we consider, for every $r>0$, the hull of radius $r$, which is obtained by… ▽ More We study the random metric space called the Brownian plane, which is closely related to the Brownian map and is conjectured to be the universal scaling limit of many discrete random lattices such as the uniform infinite planar triangulation. We obtain a number of explicit distributions for the Brownian plane. In particular, we consider, for every $r>0$, the hull of radius $r$, which is obtained by "filling in the holes" in the ball of radius $r$ centered at the root. We introduce a quantity $Z_r$ which is interpreted as the (generalized) length of the boundary of the hull of radius $r$. We identify the law of the process $(Z_r)_{r>0}$ as the time-reversal of a continuous-state branching process starting from $+\infty$ at time $-\infty$ and conditioned to hit $0$ at time $0$, and we give an explicit description of the process of hull volumes given the process $(Z_r)_{r>0}$. We obtain an explicit formula for the Laplace transform of the volume of the hull of radius $r$, and we also determine the conditional distribution of this volume given the length of the boundary. Our proofs involve certain new formulas for super-Brownian motion and the Brownian snake in dimension one, which are of independent interest. △ Less

Submitted 14 September, 2014; originally announced September 2014.

Comments: 38 pages

MSC Class: 60D05; 60J68; 60J80

arXiv:1407.0237 [pdf, other]

Bessel processes, the Brownian snake and super-Brownian motion

Authors: Jean-François Le Gall

Abstract: We prove that, both for the Brownian snake and for super-Brownian motion in dimension one, the historical path corresponding to the minimal spatial position is a Bessel process of dimension -5. We also discuss a spine decomposition for the Brownian snake conditioned on the minimizing path. We prove that, both for the Brownian snake and for super-Brownian motion in dimension one, the historical path corresponding to the minimal spatial position is a Bessel process of dimension -5. We also discuss a spine decomposition for the Brownian snake conditioned on the minimizing path. △ Less

Submitted 1 July, 2014; originally announced July 2014.

Comments: Submitted to the special volume of Séminaire de Probabilités in memory of Marc Yor

MSC Class: 60J68; 60J80

arXiv:1403.7943 [pdf, other]

Random geometry on the sphere

Authors: Jean-François Le Gall

Abstract: We introduce and study a universal model of random geometry in two dimensions. To this end, we start from a discrete graph drawn on the sphere, which is chosen uniformly at random in a certain class of graphs with a given size $n$, for instance the class of all triangulations of the sphere with $n$ faces. We equip the vertex set of the graph with the usual graph distance rescaled by the factor… ▽ More We introduce and study a universal model of random geometry in two dimensions. To this end, we start from a discrete graph drawn on the sphere, which is chosen uniformly at random in a certain class of graphs with a given size $n$, for instance the class of all triangulations of the sphere with $n$ faces. We equip the vertex set of the graph with the usual graph distance rescaled by the factor $n^{-1/4}$. We then prove that the resulting random metric space converges in distribution as $n\to\infty$, in the Gromov-Hausdorff sense, toward a limiting random compact metric space called the Brownian map, which is universal in the sense that it does not depend on the class of graphs chosen initially. The Brownian map is homeomorphic to the sphere, but its Hausdorff dimension is equal to $4$. We obtain detailed information about the structure of geodesics in the Brownian map. We also present the infinite-volume variant of the Brownian map called the Brownian plane, which arises as the scaling limit of the uniform infinite planar quadrangulation. Finally, we discuss certain open problems. This study is motivated in part by the use of random geometry in the physical theory of two-dimensional quantum gravity. △ Less

Submitted 31 March, 2014; originally announced March 2014.

Comments: To appear in the Proceedings of ICM 2014, Seoul

MSC Class: 05C80; 60D05

arXiv:1401.7830 [pdf, ps, other]

doi 10.1214/14-AOP947

The range of tree-indexed random walk in low dimensions

Authors: Jean-François Le Gall, Shen Lin

Abstract: We study the range $R_n$ of a random walk on the $d$-dimensional lattice $\mathbb{Z}^d$ indexed by a random tree with $n$ vertices. Under the assumption that the random walk is centered and has finite fourth moments, we prove in dimension $d\leq3$ that $n^{-d/4}R_n$ converges in distribution to the Lebesgue measure of the support of the integrated super-Brownian excursion (ISE). An auxiliary resul… ▽ More We study the range $R_n$ of a random walk on the $d$-dimensional lattice $\mathbb{Z}^d$ indexed by a random tree with $n$ vertices. Under the assumption that the random walk is centered and has finite fourth moments, we prove in dimension $d\leq3$ that $n^{-d/4}R_n$ converges in distribution to the Lebesgue measure of the support of the integrated super-Brownian excursion (ISE). An auxiliary result shows that the suitably rescaled local times of the tree-indexed random walk converge in distribution to the density process of ISE. We obtain similar results for the range of critical branching random walk in $\mathbb{Z}^d$, $d\leq3$. As an intermediate estimate, we get exact asymptotics for the probability that a critical branching random walk starting with a single particle at the origin hits a distant point. The results of the present article complement those derived in higher dimensions in our earlier work. △ Less

Submitted 17 November, 2015; v1 submitted 30 January, 2014; originally announced January 2014.

Comments: Published at http://dx.doi.org/10.1214/14-AOP947 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOP-AOP947

Journal ref: Annals of Probability 2015, Vol. 43, No. 5, 2701-2728

arXiv:1401.3191 [pdf, ps, other]

Joint ML estimation of all parameters in a discrete time random field HJM type interest rate model

Authors: József Gáll, Gyula Pap, Martien van Zuijlen

Abstract: We consider discrete time Heath-Jarrow-Morton type interest rate models, where the interest rate curves are driven by a geometric spatial autoregression field. Strong consistency and asymptotic normality of the maximum likelihood estimators of the parameters are proved for stable no-arbitrage models containing a general stochastic discounting factor, where explicit form of the ML estimators is not… ▽ More We consider discrete time Heath-Jarrow-Morton type interest rate models, where the interest rate curves are driven by a geometric spatial autoregression field. Strong consistency and asymptotic normality of the maximum likelihood estimators of the parameters are proved for stable no-arbitrage models containing a general stochastic discounting factor, where explicit form of the ML estimators is not available given a non-i.i.d. sample. The results form the basis of further statistical problems in such models. △ Less

Submitted 14 January, 2014; originally announced January 2014.

MSC Class: 62F12; 62P05

arXiv:1312.4859 [pdf, ps, other]

doi 10.1140/epja/i2014-14002-5

High-spin structures of 124-131Te: Competition of proton and neutron pair breakings

Authors: A. Astier, M. -G. Porquet, Ts. Venkova, Ch. Theisen, G. Duchene, F. Azaiez, G. Barreau, D. Curien, I. Deloncle, O. Dorvaux, B. J. P. Gall, M. Houry, R. Lucas, N. Redon, M. Rousseau, O. Stezowski

Abstract: The 124-131Te nuclei have been produced as fission fragments in two fusion reactions induced by heavy-ions (12C + 238U at 90 MeV bombarding energy and 18O + 208Pb at 85 MeV) and studied with the Euroball array. Their high-spin level schemes have been extended to higher excitation energy from the triple gamma-ray coincidence data. The gamma-gamma angular correlations have been analyzed in order to… ▽ More The 124-131Te nuclei have been produced as fission fragments in two fusion reactions induced by heavy-ions (12C + 238U at 90 MeV bombarding energy and 18O + 208Pb at 85 MeV) and studied with the Euroball array. Their high-spin level schemes have been extended to higher excitation energy from the triple gamma-ray coincidence data. The gamma-gamma angular correlations have been analyzed in order to assign spin and parity values to many observed states. Moreover the half-lives of isomeric states have been measured from the delayed coincidences between the fission-fragment detector SAPhIR and Euroball, as well as from the timing information of the Ge detectors. The behaviors of the yrast structures identified in the present work are first discussed in comparison with the general features known in the mass region, particularly the breakings of neutron pairs occupying the nuh11/2 orbit identified in the neighboring Sn nuclei. The experimental level schemes are then compared to shell-model calculations performed in this work. The analysis of the wave functions shows the effects of the proton-pair breaking along the yrast lines of the heavy Te isotopes. △ Less

Submitted 17 December, 2013; originally announced December 2013.

Comments: accepted for publication in Eur. Phys. J. A

Journal ref: Eur. Phys. J. A. (2014) 50: 2

arXiv:1308.6762 [pdf, other]

The Brownian cactus II. Upcrossings and local times of super-Brownian motion

Authors: Jean-François Le Gall

Abstract: We study properties of the random metric space called the Brownian map. For every h>0, we consider the connected components of the complement of the open ball of radius h centered at the root, and we let N(h,r) be the number of those connected components that intersect the complement of the ball of radius h+r. We then prove that r^3N(h,r) converges as r tends to 0 to a constant times the density a… ▽ More We study properties of the random metric space called the Brownian map. For every h>0, we consider the connected components of the complement of the open ball of radius h centered at the root, and we let N(h,r) be the number of those connected components that intersect the complement of the ball of radius h+r. We then prove that r^3N(h,r) converges as r tends to 0 to a constant times the density at h of the profile of distances from the root. In terms of the Brownian cactus, this gives asymptotics for the number of vertices at height h that have descendants at height h+r. Our proofs are based on a similar approximation result for local times of super-Brownian motion by upcrossing numbers. Our arguments make a heavy use of the Brownian snake and its special Markov property. △ Less

Submitted 30 August, 2013; originally announced August 2013.

Comments: 26 pages

MSC Class: 05C80; 60J68

arXiv:1308.0162 [pdf, ps, other]

doi 10.1103/PhysRevC.88.024321

High-spin structures of 88Kr and 89Rb: Evolution from collective to single-particle behaviors

Authors: A. Astier, M. -G. Porquet, Ts. Venkova, G. Duchene, F. Azaiez, D. Curien, I. Deloncle, O. Dorvaux, B. J. P. Gall, N. Redon, M. Rousseau, O. Stezowski

Abstract: The high-spin states of the two neutron-rich nuclei, 88Kr and 89R have been studied from the 18O + 208Pb fusion-fission reaction. Their level schemes were built from triple gamma-ray coincidence data and gamma-gamma angular correlations were analyzed in order to assign spin and parity values to most of the observed states. The two levels schemes evolve from collective structures to single-particle… ▽ More The high-spin states of the two neutron-rich nuclei, 88Kr and 89R have been studied from the 18O + 208Pb fusion-fission reaction. Their level schemes were built from triple gamma-ray coincidence data and gamma-gamma angular correlations were analyzed in order to assign spin and parity values to most of the observed states. The two levels schemes evolve from collective structures to single-particle excitations as a function of the excitation energy. Comparison with results of shell-model calculations gives the specific proton and neutron configurations which are involved to generate the angular momentum along the yrast lines. △ Less

Submitted 1 August, 2013; originally announced August 2013.

Comments: 12 pages, 9 figures, Physical Review C (2013) in press

Journal ref: Physical Review C 88, 024321 (2013)

Showing 101–150 of 181 results for author: Gall, J