Search | arXiv e-print repository

TT-NF: Tensor Train Neural Fields

Authors: Anton Obukhov, Mikhail Usvyatsov, Christos Sakaridis, Konrad Schindler, Luc Van Gool

Abstract: Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterizat… ▽ More Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: Preprint, under review

arXiv:2208.01421 [pdf, other]

T4DT: Tensorizing Time for Learning Temporal 3D Visual Data

Authors: Mikhail Usvyatsov, Rafael Ballester-Rippoll, Lina Bashaeva, Konrad Schindler, Gonzalo Ferrer, Ivan Oseledets

Abstract: Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern… ▽ More Unlike 2D raster images, there is no single dominant representation for 3D visual data processing. Different formats like point clouds, meshes, or implicit functions each have their strengths and weaknesses. Still, grid representations such as signed distance functions have attractive properties also in 3D. In particular, they offer constant-time random access and are eminently suitable for modern machine learning. Unfortunately, the storage size of a grid grows exponentially with its dimension. Hence they often exceed memory limits even at moderate resolution. This work proposes using low-rank tensor formats, including the Tucker, tensor train, and quantics tensor train decompositions, to compress time-varying 3D data. Our method iteratively computes, voxelizes, and compresses each frame's truncated signed distance function and applies tensor rank truncation to condense all frames into a single, compressed tensor that represents the entire 4D scene. We show that low-rank tensor compression is extremely compact to store and query time-varying signed distance functions. It significantly reduces the memory footprint of 4D scenes while remarkably preserving their geometric quality. Unlike existing, iterative learning-based approaches like DeepSDF and NeRF, our method uses a closed-form algorithm with theoretical guarantees. △ Less

Submitted 5 October, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

arXiv:2206.11128 [pdf, other]

tntorch: Tensor Network Learning with PyTorch

Authors: Mikhail Usvyatsov, Rafael Ballester-Ripoll, Konrad Schindler

Abstract: We present tntorch, a tensor learning framework that supports multiple decompositions (including Candecomp/Parafac, Tucker, and Tensor Train) under a unified interface. With our library, the user can learn and handle low-rank tensors with automatic differentiation, seamless GPU support, and the convenience of PyTorch's API. Besides decomposition algorithms, tntorch implements differentiable tensor… ▽ More We present tntorch, a tensor learning framework that supports multiple decompositions (including Candecomp/Parafac, Tucker, and Tensor Train) under a unified interface. With our library, the user can learn and handle low-rank tensors with automatic differentiation, seamless GPU support, and the convenience of PyTorch's API. Besides decomposition algorithms, tntorch implements differentiable tensor algebra, rank truncation, cross-approximation, batch processing, comprehensive tensor arithmetics, and more. △ Less

Submitted 21 September, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Journal ref: JMLR (2022) 23-208

arXiv:2105.14250 [pdf, other]

Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation

Authors: Mikhail Usvyatsov, Anastasia Makarova, Rafael Ballester-Ripoll, Maxim Rakhuba, Andreas Krause, Konrad Schindler

Abstract: We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking at a fraction of their entries only. Our method combines a neural network encoder with a tensor train decomposition to learn a low-rank latent encoding, coupled with cross-approximation (CA) to learn the representation through a subset of the original samples. CA is an adaptive sampling algorithm… ▽ More We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking at a fraction of their entries only. Our method combines a neural network encoder with a tensor train decomposition to learn a low-rank latent encoding, coupled with cross-approximation (CA) to learn the representation through a subset of the original samples. CA is an adaptive sampling algorithm that is native to tensor decompositions and avoids working with the full high-resolution data explicitly. Instead, it actively selects local representative samples that we fetch out-of-core and on-demand. The required number of samples grows only logarithmically with the size of the input. Our implicit representation of the tensor in the network enables processing large grids that could not be otherwise tractable in their uncompressed form. The proposed approach is particularly useful for large-scale multidimensional grid data (e.g., 3D tomography), and for tasks that require context over a large receptive field (e.g., predicting the medical condition of entire organs). The code is available at https://github.com/aelphy/c-pic. △ Less

Submitted 12 November, 2021; v1 submitted 29 May, 2021; originally announced May 2021.

Journal ref: Proc. International Conference on Computer Vision (ICCV) 2021

arXiv:2011.13005 [pdf, other]

PREDATOR: Registration of 3D Point Clouds with Low Overlap

Authors: Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, Konrad Schindler

Abstract: We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with low overlap. Its key novelty is an overlap-attention block for early information exchange between the latent encodings of the two point clouds. In this way the subsequent decoding of… ▽ More We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with low overlap. Its key novelty is an overlap-attention block for early information exchange between the latent encodings of the two point clouds. In this way the subsequent decoding of the latent representations into per-point features is conditioned on the respective other point cloud, and thus can predict which points are not only salient, but also lie in the overlap region between the two point clouds. The ability to focus on points that are relevant for matching greatly improves performance: PREDATOR raises the rate of successful registrations by more than 20% in the low-overlap scenario, and also sets a new state of the art for the 3DMatch benchmark with 89% registration recall. △ Less

Submitted 6 August, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: CVPR 2021 (Oral) - Improved performance after fixing GNN bug

arXiv:2002.12819 [pdf, other]

Indoor Scene Recognition in 3D

Authors: Shengyu Huang, Mikhail Usvyatsov, Konrad Schindler

Abstract: Recognising in what type of environment one is located is an important perception task. For instance, for a robot operating in indoors it is helpful to be aware whether it is in a kitchen, a hallway or a bedroom. Existing approaches attempt to classify the scene based on 2D images or 2.5D range images. Here, we study scene recognition from 3D point cloud (or voxel) data, and show that it greatly o… ▽ More Recognising in what type of environment one is located is an important perception task. For instance, for a robot operating in indoors it is helpful to be aware whether it is in a kitchen, a hallway or a bedroom. Existing approaches attempt to classify the scene based on 2D images or 2.5D range images. Here, we study scene recognition from 3D point cloud (or voxel) data, and show that it greatly outperforms methods based on 2D birds-eye views. Moreover, we advocate multi-task learning as a way of improving scene recognition, building on the fact that the scene type is highly correlated with the objects in the scene, and therefore with its semantic segmentation into different object classes. In a series of ablation studies, we show that successful scene recognition is not just the recognition of individual objects unique to some scene type (such as a bathtub), but depends on several different cues, including coarse 3D geometry, colour, and the (implicit) distribution of object categories. Moreover, we demonstrate that surprisingly sparse 3D data is sufficient to classify indoor scenes with good accuracy. △ Less

Submitted 2 July, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: IROS 2020 - Camera Ready

arXiv:1903.06837 [pdf, other]

Visual recognition in the wild by sampling deep similarity functions

Authors: Mikhail Usvyatsov, Konrad Schindler

Abstract: Recognising relevant objects or object states in its environment is a basic capability for an autonomous robot. The dominant approach to object recognition in images and range images is classification by supervised machine learning, nowadays mostly with deep convolutional neural networks (CNNs). This works well for target classes whose variability can be completely covered with training examples.… ▽ More Recognising relevant objects or object states in its environment is a basic capability for an autonomous robot. The dominant approach to object recognition in images and range images is classification by supervised machine learning, nowadays mostly with deep convolutional neural networks (CNNs). This works well for target classes whose variability can be completely covered with training examples. However, a robot moving in the wild, i.e., in an environment that is not known at the time the recognition system is trained, will often face \emph{domain shift}: the training data cannot be assumed to exhaustively cover all the within-class variability that will be encountered in the test data. In that situation, learning is in principle possible, since the training set does capture the defining properties, respectively dissimilarities, of the target classes. But directly training a CNN to predict class probabilities is prone to overfitting to irrelevant correlations between the class labels and the specific subset of the target class that is represented in the training set. We explore the idea to instead learn a Siamese CNN that acts as similarity function between pairs of training examples. Class predictions are then obtained by measuring the similarities between a new test instance and the training samples. We show that the CNN embedding correctly recovers the relative similarities to arbitrary class exemplars in the training set. And that therefore few, randomly picked training exemplars are sufficient to achieve good predictions, making the procedure efficient. △ Less

Submitted 15 March, 2019; originally announced March 2019.

arXiv:1801.10585 [pdf, other]

doi 10.1007/s11263-020-01302-5

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks

Authors: Timo Hackel, Mikhail Usvyatsov, Silvano Galliani, Jan D. Wegner, Konrad Schindler

Abstract: While CNNs naturally lend themselves to densely sampled data, and sophisticated implementations are available, they lack the ability to efficiently process sparse data. In this work we introduce a suite of tools that exploit sparsity in both the feature maps and the filter weights, and thereby allow for significantly lower memory footprints and computation times than the conventional dense framewo… ▽ More While CNNs naturally lend themselves to densely sampled data, and sophisticated implementations are available, they lack the ability to efficiently process sparse data. In this work we introduce a suite of tools that exploit sparsity in both the feature maps and the filter weights, and thereby allow for significantly lower memory footprints and computation times than the conventional dense framework when processing data with a high degree of sparsity. Our scheme provides (i) an efficient GPU implementation of a convolution layer based on direct, sparse convolution; (ii) a filter step within the convolution layer, which we call attention, that prevents fill-in, i.e., the tendency of convolution to rapidly decrease sparsity, and guarantees an upper bound on the computational resources; and (iii) an adaptation of the back-propagation algorithm, which makes it possible to combine our approach with standard learning frameworks, while still exploiting sparsity in the data and the model. △ Less

Submitted 12 March, 2020; v1 submitted 31 January, 2018; originally announced January 2018.

Comments: Updated to IJCV version

arXiv:1709.08605 [pdf, other]

doi 10.1088/1742-6596/898/3/032048

Muon Trigger for Mobile Phones

Authors: Maxim Borisyak, Michail Usvyatsov, Michael Mulhearn, Chase Shimmin, Andrey Ustyuzhanin

Abstract: The CRAYFIS experiment proposes to use privately owned mobile phones as a ground detector array for Ultra High Energy Cosmic Rays. Upon interacting with Earth's atmosphere, these events produce extensive particle showers which can be detected by cameras on mobile phones. A typical shower contains minimally-ionizing particles such as muons. As these particles interact with CMOS image sensors, they… ▽ More The CRAYFIS experiment proposes to use privately owned mobile phones as a ground detector array for Ultra High Energy Cosmic Rays. Upon interacting with Earth's atmosphere, these events produce extensive particle showers which can be detected by cameras on mobile phones. A typical shower contains minimally-ionizing particles such as muons. As these particles interact with CMOS image sensors, they may leave tracks of faintly-activated pixels that are sometimes hard to distinguish from random detector noise. Triggers that rely on the presence of very bright pixels within an image frame are not efficient in this case. We present a trigger algorithm based on Convolutional Neural Networks which selects images containing such tracks and are evaluated in a lazy manner: the response of each successive layer is computed only if activation of the current layer satisfies a continuation criterion. Usage of neural networks increases the sensitivity considerably comparable with image thresholding, while the lazy evaluation allows for execution of the trigger under the limited computational power of mobile phones. △ Less

Submitted 25 September, 2017; originally announced September 2017.

Showing 1–9 of 9 results for author: Usvyatsov, M