-
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Authors:
Navve Wasserman,
Noam Rotstein,
Roy Ganz,
Ron Kimmel
Abstract:
Image editing has advanced significantly with the introduction of text-conditioned diffusion models. Despite this progress, seamlessly adding objects to images based on textual instructions without requiring user-provided input masks remains a challenge. We address this by leveraging the insight that removing objects (Inpaint) is significantly simpler than its inverse process of adding them (Paint…
▽ More
Image editing has advanced significantly with the introduction of text-conditioned diffusion models. Despite this progress, seamlessly adding objects to images based on textual instructions without requiring user-provided input masks remains a challenge. We address this by leveraging the insight that removing objects (Inpaint) is significantly simpler than its inverse process of adding them (Paint), attributed to the utilization of segmentation mask datasets alongside inpainting models that inpaint within these masks. Capitalizing on this realization, by implementing an automated and extensive pipeline, we curate a filtered large-scale image dataset containing pairs of images and their corresponding object-removed versions. Using these pairs, we train a diffusion model to inverse the inpainting process, effectively adding objects into images. Unlike other editing datasets, ours features natural target images instead of synthetic ones; moreover, it maintains consistency between source and target by construction. Additionally, we utilize a large Vision-Language Model to provide detailed descriptions of the removed objects and a Large Language Model to convert these descriptions into diverse, natural-language instructions. We show that the trained model surpasses existing ones both qualitatively and quantitatively, and release the large-scale dataset alongside the trained models for the community.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Authors:
Yaniv Wolf,
Amit Bracha,
Ron Kimmel
Abstract:
The Gaussian splatting for radiance field rendering method has recently emerged as an efficient approach for accurate scene representation. It optimizes the location, size, color, and shape of a cloud of 3D Gaussian elements to visually match, after projection, or splatting, a set of given images taken from various viewing directions. And yet, despite the proximity of Gaussian elements to the shap…
▽ More
The Gaussian splatting for radiance field rendering method has recently emerged as an efficient approach for accurate scene representation. It optimizes the location, size, color, and shape of a cloud of 3D Gaussian elements to visually match, after projection, or splatting, a set of given images taken from various viewing directions. And yet, despite the proximity of Gaussian elements to the shape boundaries, direct surface reconstruction of objects in the scene is a challenge.
We propose a novel approach for surface reconstruction from Gaussian splatting models. Rather than relying on the Gaussian elements' locations as a prior for surface reconstruction, we leverage the superior novel-view synthesis capabilities of 3DGS. To that end, we use the Gaussian splatting model to render pairs of stereo-calibrated novel views from which we extract depth profiles using a stereo matching method. We then combine the extracted RGB-D images into a geometrically consistent surface. The resulting reconstruction is more accurate and shows finer details when compared to other methods for surface reconstruction from Gaussian splatting models, while requiring significantly less compute time compared to other surface reconstruction methods.
We performed extensive testing of the proposed method on in-the-wild scenes, taken by a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the proposed method on the Tanks and Temples benchmark, and it has surpassed the current leading method for surface reconstruction from Gaussian splatting models. Project page: https://gs2mesh.github.io/.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
On Partial Shape Correspondence and Functional Maps
Authors:
Amit Bracha,
Thomas Dagès,
Ron Kimmel
Abstract:
While dealing with matching shapes to their parts, we often apply a tool known as functional maps. The idea is to translate the shape matching problem into ``convenient'' spaces by which matching is performed algebraically by solving a least squares problem. Here, we argue that such formulations, though popular in this field, introduce errors in the estimated match when partiality is invoked. Such…
▽ More
While dealing with matching shapes to their parts, we often apply a tool known as functional maps. The idea is to translate the shape matching problem into ``convenient'' spaces by which matching is performed algebraically by solving a least squares problem. Here, we argue that such formulations, though popular in this field, introduce errors in the estimated match when partiality is invoked. Such errors are unavoidable even for advanced feature extraction networks, and they can be shown to escalate with increasing degrees of shape partiality, adversely affecting the learning capability of such systems. To circumvent these limitations, we propose a novel approach for partial shape matching. Our study of functional maps led us to a novel method that establishes direct correspondence between partial and full shapes through feature matching bypassing the need for functional map intermediate spaces. The Gromov distance between metric spaces leads to the construction of the first part of our loss functions. For regularization we use two options: a term based on the area preserving property of the map**, and a relaxed version that avoids the need to resort to functional maps. The proposed approach shows superior performance on the SHREC'16 dataset, outperforming existing unsupervised methods for partial shape matching. Notably, it achieves state-of-the-art results on the SHREC'16 HOLES benchmark, superior also compared to supervised methods. We demonstrate the benefits of the proposed unsupervised method when applied to a new dataset PFAUST for part-to-full shape correspondence
△ Less
Submitted 13 May, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Authors:
Noam Rotstein,
David Bensaid,
Shaked Brody,
Roy Ganz,
Ron Kimmel
Abstract:
The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce generic captions and may omit semantically important image details. This limitation can be traced back to the image-text datasets; while their captions typically offer a general description of image content, they frequently…
▽ More
The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce generic captions and may omit semantically important image details. This limitation can be traced back to the image-text datasets; while their captions typically offer a general description of image content, they frequently omit salient details. Considering the magnitude of these datasets, manual reannotation is impractical, emphasizing the need for an automated approach. To address this challenge, we leverage existing captions and explore augmenting them with visual details using "frozen" vision experts including an object detector, an attribute recognizer, and an Optical Character Recognizer (OCR). Our proposed method, FuseCap, fuses the outputs of such vision experts with the original captions using a large language model (LLM), yielding comprehensive image descriptions. We automatically curate a training set of 12M image-enriched caption pairs. These pairs undergo extensive evaluation through both quantitative and qualitative analyses. Subsequently, this data is utilized to train a captioning generation BLIP-based model. This model outperforms current state-of-the-art approaches, producing more precise and detailed descriptions, demonstrating the effectiveness of the proposed data-centric approach. We release this large-scale dataset of enriched image-caption pairs for the community.
△ Less
Submitted 15 November, 2023; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Learning Differential Invariants of Planar Curves
Authors:
Roy Velich,
Ron Kimmel
Abstract:
We propose a learning paradigm for the numerical approximation of differential invariants of planar curves. Deep neural-networks' (DNNs) universal approximation properties are utilized to estimate geometric measures. The proposed framework is shown to be a preferable alternative to axiomatic constructions. Specifically, we show that DNNs can learn to overcome instabilities and sampling artifacts a…
▽ More
We propose a learning paradigm for the numerical approximation of differential invariants of planar curves. Deep neural-networks' (DNNs) universal approximation properties are utilized to estimate geometric measures. The proposed framework is shown to be a preferable alternative to axiomatic constructions. Specifically, we show that DNNs can learn to overcome instabilities and sampling artifacts and produce consistent signatures for curves subject to a given group of transformations in the plane. We compare the proposed schemes to alternative state-of-the-art axiomatic constructions of differential invariants. We evaluate our models qualitatively and quantitatively and propose a benchmark dataset to evaluate approximation models of differential invariants of planar curves.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Partial Shape Similarity via Alignment of Multi-Metric Hamiltonian Spectra
Authors:
David Bensaïd,
Amit Bracha,
Ron Kimmel
Abstract:
Evaluating the similarity of non-rigid shapes with significant partiality is a fundamental task in numerous computer vision applications. Here, we propose a novel axiomatic method to match similar regions across shapes. Matching similar regions is formulated as the alignment of the spectra of operators closely related to the Laplace-Beltrami operator (LBO). The main novelty of the proposed approac…
▽ More
Evaluating the similarity of non-rigid shapes with significant partiality is a fundamental task in numerous computer vision applications. Here, we propose a novel axiomatic method to match similar regions across shapes. Matching similar regions is formulated as the alignment of the spectra of operators closely related to the Laplace-Beltrami operator (LBO). The main novelty of the proposed approach is the consideration of differential operators defined on a manifold with multiple metrics. The choice of a metric relates to fundamental shape properties while considering the same manifold under different metrics can thus be viewed as analyzing the underlying manifold from different perspectives. Specifically, we examine the scale-invariant metric and the corresponding scale-invariant Laplace-Beltrami operator (SI-LBO) along with the regular metric and the regular LBO. We demonstrate that the scale-invariant metric emphasizes the locations of important semantic features in articulated shapes. A truncated spectrum of the SI-LBO consequently better captures locally curved regions and complements the global information encapsulated in the truncated spectrum of the regular LBO. We show that matching these dual spectra outperforms competing axiomatic frameworks when tested on standard benchmarks. We introduced a new dataset and compare the proposed method with the state-of-the-art learning based approach in a cross-database configuration. Specifically, we show that, when trained on one data set and tested on another, the proposed axiomatic approach which does not involve training, outperforms the deep learning alternative.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Garment Avatars: Realistic Cloth Driving using Pattern Registration
Authors:
Oshri Halimi,
Fabian Prada,
Tuur Stuyck,
Donglai Xiang,
Timur Bagautdinov,
He Wen,
Ron Kimmel,
Takaaki Shiratori,
Chenglei Wu,
Yaser Sheikh
Abstract:
Virtual telepresence is the future of online communication. Clothing is an essential part of a person's identity and self-expression. Yet, ground truth data of registered clothes is currently unavailable in the required resolution and accuracy for training telepresence models for realistic cloth animation. Here, we propose an end-to-end pipeline for building drivable representations for clothing.…
▽ More
Virtual telepresence is the future of online communication. Clothing is an essential part of a person's identity and self-expression. Yet, ground truth data of registered clothes is currently unavailable in the required resolution and accuracy for training telepresence models for realistic cloth animation. Here, we propose an end-to-end pipeline for building drivable representations for clothing. The core of our approach is a multi-view patterned cloth tracking algorithm capable of capturing deformations with high accuracy. We further rely on the high-quality data produced by our tracking method to build a Garment Avatar: an expressive and fully-drivable geometry model for a piece of clothing. The resulting model can be animated using a sparse set of views and produces highly realistic reconstructions which are faithful to the driving signals. We demonstrate the efficacy of our pipeline on a realistic virtual telepresence application, where a garment is being reconstructed from two views, and a user can pick and swap garment design as they wish. In addition, we show a challenging scenario when driven exclusively with body pose, our drivable garment avatar is capable of producing realistic cloth geometry of significantly higher quality than the state-of-the-art.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Elastica Models for Color Image Regularization
Authors:
Hao Liu,
Xue-Cheng Tai,
Ron Kimmel,
Roland Glowinski
Abstract:
One classical approach to regularize color is to tream them as two dimensional surfaces embedded in a five dimensional spatial-chromatic space. In this case, a natural regularization term arises as the image surface area. Choosing the chromatic coordinates as dominating over the spatial ones, the image spatial coordinates could be thought of as a paramterization of the image surface manifold in a…
▽ More
One classical approach to regularize color is to tream them as two dimensional surfaces embedded in a five dimensional spatial-chromatic space. In this case, a natural regularization term arises as the image surface area. Choosing the chromatic coordinates as dominating over the spatial ones, the image spatial coordinates could be thought of as a paramterization of the image surface manifold in a three dimensional color space. Minimizing the area of the image manifold leads to the Beltrami flow or mean curvature flow of the image surface in the 3D color space, while minimizing the elastica of the image surface yields an additional interesting regularization. Recently, the authors proposed a color elastica model, which minimizes both the surface area and elastica of the image manifold. In this paper, we propose to modify the color elastica and introduce two new models for color image regularization. The revised measures are motivated by the relations between the color elastica model, Euler's elastica model and the total variation model for gray level images. Compared to our previous color elastica model, the new models are direct extensions of Euler's elastica model to color images. The proposed models are nonlinear and challenging to minimize. To overcome this difficulty, two operator-splitting methods are suggested. Specifically, nonlinearities are decoupled by introducing new vector- and matrix-valued variables. Then, the minimization problems are converted to solving initial value problems which are time-discretized by operator splitting. Each subproblem, after splitting either, has a closed-form solution or can be solved efficiently. The effectiveness and advantages of the proposed models are demonstrated by comprehensive experiments. The benefits of incorporating the elastica of the image surface as regularization terms compared to common alternatives are empirically validated.
△ Less
Submitted 24 November, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Deep Signatures -- Learning Invariants of Planar Curves
Authors:
Roy Velich,
Ron Kimmel
Abstract:
We propose a learning paradigm for numerical approximation of differential invariants of planar curves. Deep neural-networks' (DNNs) universal approximation properties are utilized to estimate geometric measures. The proposed framework is shown to be a preferable alternative to axiomatic constructions. Specifically, we show that DNNs can learn to overcome instabilities and sampling artifacts and p…
▽ More
We propose a learning paradigm for numerical approximation of differential invariants of planar curves. Deep neural-networks' (DNNs) universal approximation properties are utilized to estimate geometric measures. The proposed framework is shown to be a preferable alternative to axiomatic constructions. Specifically, we show that DNNs can learn to overcome instabilities and sampling artifacts and produce numerically-stable signatures for curves subject to a given group of transformations in the plane. We compare the proposed schemes to alternative state-of-the-art axiomatic constructions of group invariant arc-lengths and curvatures.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Depth Refinement for Improved Stereo Reconstruction
Authors:
Amit Bracha,
Noam Rotstein,
David Bensaïd,
Ron Slossberg,
Ron Kimmel
Abstract:
Depth estimation is a cornerstone of a vast number of applications requiring 3D assessment of the environment, such as robotics, augmented reality, and autonomous driving to name a few. One prominent technique for depth estimation is stereo matching which has several advantages: it is considered more accessible than other depth-sensing technologies, can produce dense depth estimates in real-time,…
▽ More
Depth estimation is a cornerstone of a vast number of applications requiring 3D assessment of the environment, such as robotics, augmented reality, and autonomous driving to name a few. One prominent technique for depth estimation is stereo matching which has several advantages: it is considered more accessible than other depth-sensing technologies, can produce dense depth estimates in real-time, and has benefited greatly from the advances of deep learning in recent years. However, current techniques for depth estimation from stereoscopic images still suffer from a built-in drawback. To reconstruct depth, a stereo matching algorithm first estimates the disparity map between the left and right images before applying a geometric triangulation. A simple analysis reveals that the depth error is quadratically proportional to the object's distance. Therefore, constant disparity errors are translated to large depth errors for objects far from the camera. To mitigate this quadratic relation, we propose a simple but effective method that uses a refinement network for depth estimation. We show analytical and empirical results suggesting that the proposed learning procedure reduces this quadratic relation. We evaluate the proposed refinement procedure on well-known benchmarks and datasets, like Sceneflow and KITTI datasets, and demonstrate significant improvements in the depth accuracy metric.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Unsupervised High-Fidelity Facial Texture Generation and Reconstruction
Authors:
Ron Slossberg,
Ibrahim Jubran,
Ron Kimmel
Abstract:
Many methods have been proposed over the years to tackle the task of facial 3D geometry and texture recovery from a single image. Such methods often fail to provide high-fidelity texture without relying on 3D facial scans during training. In contrast, the complementary task of 3D facial generation has not received as much attention. As opposed to the 2D texture domain, where GANs have proven to pr…
▽ More
Many methods have been proposed over the years to tackle the task of facial 3D geometry and texture recovery from a single image. Such methods often fail to provide high-fidelity texture without relying on 3D facial scans during training. In contrast, the complementary task of 3D facial generation has not received as much attention. As opposed to the 2D texture domain, where GANs have proven to produce highly realistic facial images, the more challenging 3D geometry domain has not yet caught up to the same levels of realism and diversity.
In this paper, we propose a novel unified pipeline for both tasks, generation of both geometry and texture, and recovery of high-fidelity texture. Our texture model is learned, in an unsupervised fashion, from natural images as opposed to scanned texture maps. To the best of our knowledge, this is the first such unified framework independent of scanned textures.
Our novel training pipeline incorporates a pre-trained 2D facial generator coupled with a deep feature manipulation methodology. By applying precise 3DMM fitting, we can seamlessly integrate our modeled textures into synthetically generated background images forming a realistic composition of our textured model with background, hair, teeth, and body. This enables us to apply transfer learning from the domain of 2D image generation, thus, benefiting greatly from the impressive results obtained in this domain.
We provide a comprehensive study on several recent methods comparing our model in generation and reconstruction tasks. As the extensive qualitative, as well as quantitative analysis, demonstrate, we achieve state-of-the-art results for both tasks.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Multimodal Colored Point Cloud to Image Alignment
Authors:
Noam Rotstein,
Amit Bracha,
Ron Kimmel
Abstract:
Reconstruction of geometric structures from images using supervised learning suffers from limited available amount of accurate data. One type of such data is accurate real-world RGB-D images. A major challenge in acquiring such ground truth data is the accurate alignment between RGB images and the point cloud measured by a depth scanner. To overcome this difficulty, we consider a differential opti…
▽ More
Reconstruction of geometric structures from images using supervised learning suffers from limited available amount of accurate data. One type of such data is accurate real-world RGB-D images. A major challenge in acquiring such ground truth data is the accurate alignment between RGB images and the point cloud measured by a depth scanner. To overcome this difficulty, we consider a differential optimization method that aligns a colored point cloud with a given color image through iterative geometric and color matching. In the proposed framework, the optimization minimizes the photometric difference between the colors of the point cloud and the corresponding colors of the image pixels. Unlike other methods that try to reduce this photometric error, we analyze the computation of the gradient on the image plane and propose a different direct scheme. We assume that the colors produced by the geometric scanner camera and the color camera sensor are different and therefore characterized by different chromatic acquisition properties. Under these multimodal conditions, we find the transformation between the camera image and the point cloud colors. We alternately optimize for aligning the position of the point cloud and matching the different color spaces. The alignments produced by the proposed method are demonstrated on both synthetic data with quantitative evaluation and real scenes with qualitative results.
△ Less
Submitted 12 April, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
U-mesh: Human Correspondence Matching with Mesh Convolutional Networks
Authors:
Benjamin Groisser,
Alon Wolf,
Ron Kimmel
Abstract:
The proliferation of 3D scanning technology has driven a need for methods to interpret geometric data, particularly for human subjects. In this paper we propose an elegant fusion of regression (bottom-up) and generative (top-down) methods to fit a parametric template model to raw scan meshes.
Our first major contribution is an intrinsic convolutional mesh U-net architecture that predicts pointwi…
▽ More
The proliferation of 3D scanning technology has driven a need for methods to interpret geometric data, particularly for human subjects. In this paper we propose an elegant fusion of regression (bottom-up) and generative (top-down) methods to fit a parametric template model to raw scan meshes.
Our first major contribution is an intrinsic convolutional mesh U-net architecture that predicts pointwise correspondence to a template surface. Soft-correspondence is formulated as coordinates in a newly-constructed Cartesian space. Modeling correspondence as Euclidean proximity enables efficient optimization, both for network training and for the next step of the algorithm.
Our second contribution is a generative optimization algorithm that uses the U-net correspondence predictions to guide a parametric Iterative Closest Point registration. By employing pre-trained human surface parametric models we maximally leverage domain-specific prior knowledge.
The pairing of a mesh-convolutional network with generative model fitting enables us to predict correspondence for real human surface scans including occlusions, partialities, and varying genus (e.g. from self-contact). We evaluate the proposed method on the FAUST correspondence challenge where we achieve 20% (33%) improvement over state of the art methods for inter- (intra-) subject correspondence.
△ Less
Submitted 22 August, 2021; v1 submitted 15 August, 2021;
originally announced August 2021.
-
Provably Approximated ICP
Authors:
Ibrahim Jubran,
Alaa Maalouf,
Ron Kimmel,
Dan Feldman
Abstract:
The goal of the \emph{alignment problem} is to align a (given) point cloud $P = \{p_1,\cdots,p_n\}$ to another (observed) point cloud $Q = \{q_1,\cdots,q_n\}$. That is, to compute a rotation matrix $R \in \mathbb{R}^{3 \times 3}$ and a translation vector $t \in \mathbb{R}^{3}$ that minimize the sum of paired distances $\sum_{i=1}^n D(Rp_i-t,q_i)$ for some distance function $D$. A harder version is…
▽ More
The goal of the \emph{alignment problem} is to align a (given) point cloud $P = \{p_1,\cdots,p_n\}$ to another (observed) point cloud $Q = \{q_1,\cdots,q_n\}$. That is, to compute a rotation matrix $R \in \mathbb{R}^{3 \times 3}$ and a translation vector $t \in \mathbb{R}^{3}$ that minimize the sum of paired distances $\sum_{i=1}^n D(Rp_i-t,q_i)$ for some distance function $D$. A harder version is the \emph{registration problem}, where the correspondence is unknown, and the minimum is also over all possible correspondence functions from $P$ to $Q$. Heuristics such as the Iterative Closest Point (ICP) algorithm and its variants were suggested for these problems, but none yield a provable non-trivial approximation for the global optimum.
We prove that there \emph{always} exists a "witness" set of $3$ pairs in $P \times Q$ that, via novel alignment algorithm, defines a constant factor approximation (in the worst case) to this global optimum. We then provide algorithms that recover this witness set and yield the first provable constant factor approximation for the: (i) alignment problem in $O(n)$ expected time, and (ii) registration problem in polynomial time. Such small witness sets exist for many variants including points in $d$-dimensional space, outlier-resistant cost functions, and different correspondence types.
Extensive experimental results on real and synthetic datasets show that our approximation constants are, in practice, close to $1$, and up to x$10$ times smaller than state-of-the-art algorithms.
△ Less
Submitted 10 January, 2021;
originally announced January 2021.
-
Abiotic Stress Prediction from RGB-T Images of Banana Plantlets
Authors:
Sagi Levanon,
Oshry Markovich,
Itamar Gozlan,
Ortal Bakhshian,
Alon Zvirin,
Yaron Honen,
Ron Kimmel
Abstract:
Prediction of stress conditions is important for monitoring plant growth stages, disease detection, and assessment of crop yields. Multi-modal data, acquired from a variety of sensors, offers diverse perspectives and is expected to benefit the prediction process. We present several methods and strategies for abiotic stress prediction in banana plantlets, on a dataset acquired during a two and a ha…
▽ More
Prediction of stress conditions is important for monitoring plant growth stages, disease detection, and assessment of crop yields. Multi-modal data, acquired from a variety of sensors, offers diverse perspectives and is expected to benefit the prediction process. We present several methods and strategies for abiotic stress prediction in banana plantlets, on a dataset acquired during a two and a half weeks period, of plantlets subject to four separate water and fertilizer treatments. The dataset consists of RGB and thermal images, taken once daily of each plant. Results are encouraging, in the sense that neural networks exhibit high prediction rates (over $90\%$ amongst four classes), in cases where there are hardly any noticeable features distinguishing the treatments, much higher than field experts can supply.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
A Color Elastica Model for Vector-Valued Image Regularization
Authors:
Hao Liu,
Xue-Cheng Tai,
Ron Kimmel,
Roland Glowinski
Abstract:
Models related to the Euler's elastica energy have proven to be useful for many applications including image processing. Extending elastica models to color images and multi-channel data is a challenging task, as stable and consistent numerical solvers for these geometric models often involve high order derivatives. Like the single channel Euler's elastica model and the total variation (TV) models,…
▽ More
Models related to the Euler's elastica energy have proven to be useful for many applications including image processing. Extending elastica models to color images and multi-channel data is a challenging task, as stable and consistent numerical solvers for these geometric models often involve high order derivatives. Like the single channel Euler's elastica model and the total variation (TV) models, geometric measures that involve high order derivatives could help when considering image formation models that minimize elastic properties. In the past, the Polyakov action from high energy physics has been successfully applied to color image processing. Here, we introduce an addition to the Polyakov action for color images that minimizes the color manifold curvature. The color image curvature is computed by applying of the Laplace-Beltrami operator to the color image channels. When reduced to gray-scale images, while selecting appropriate scaling between space and color, the proposed model minimizes the Euler's elastica operating on the image level sets. Finding a minimizer for the proposed nonlinear geometric model is a challenge we address in this paper. Specifically, we present an operator-splitting method to minimize the proposed functional. The non-linearity is decoupled by introducing three vector-valued and matrix-valued variables. The problem is then converted into solving for the steady state of an associated initial-value problem. The initial-value problem is time-split into three fractional steps, such that each sub-problem has a closed form solution, or can be solved by fast algorithms. The efficiency and robustness of the proposed method are demonstrated by systematic numerical experiments.
△ Less
Submitted 3 March, 2021; v1 submitted 19 August, 2020;
originally announced August 2020.
-
LIMP: Learning Latent Shape Representations with Metric Preservation Priors
Authors:
Luca Cosmo,
Antonio Norelli,
Oshri Halimi,
Ron Kimmel,
Emanuele Rodolà
Abstract:
In this paper, we advocate the adoption of metric preservation as a powerful prior for learning latent representations of deformable 3D shapes. Key to our construction is the introduction of a geometric distortion criterion, defined directly on the decoded shapes, translating the preservation of the metric on the decoding to the formation of linear paths in the underlying latent space. Our rationa…
▽ More
In this paper, we advocate the adoption of metric preservation as a powerful prior for learning latent representations of deformable 3D shapes. Key to our construction is the introduction of a geometric distortion criterion, defined directly on the decoded shapes, translating the preservation of the metric on the decoding to the formation of linear paths in the underlying latent space. Our rationale lies in the observation that training samples alone are often insufficient to endow generative models with high fidelity, motivating the need for large training datasets. In contrast, metric preservation provides a rigorous way to control the amount of geometric distortion incurring in the construction of the latent space, leading in turn to synthetic samples of higher quality. We further demonstrate, for the first time, the adoption of differentiable intrinsic distances in the backpropagation of a geodesic loss. Our geometric priors are particularly relevant in the presence of scarce training data, where learning any meaningful latent structure can be especially challenging. The effectiveness and potential of our generative model is showcased in applications of style transfer, content generation, and shape completion.
△ Less
Submitted 2 September, 2020; v1 submitted 27 March, 2020;
originally announced March 2020.
-
Do We Need Depth in State-Of-The-Art Face Authentication?
Authors:
Amir Livne,
Alex Bronstein,
Ron Kimmel,
Ziv Aviv,
Shahaf Grofit
Abstract:
Some face recognition methods are designed to utilize geometric information extracted from depth sensors to overcome the weaknesses of single-image based recognition technologies. However, the accurate acquisition of the depth profile is an expensive and challenging process. Here, we introduce a novel method that learns to recognize faces from stereo camera systems without the need to explicitly c…
▽ More
Some face recognition methods are designed to utilize geometric information extracted from depth sensors to overcome the weaknesses of single-image based recognition technologies. However, the accurate acquisition of the depth profile is an expensive and challenging process. Here, we introduce a novel method that learns to recognize faces from stereo camera systems without the need to explicitly compute the facial surface or depth map. The raw face stereo images along with the location in the image from which the face is extracted allow the proposed CNN to improve the recognition task while avoiding the need to explicitly handle the geometric structure of the face. This way, we keep the simplicity and cost efficiency of identity authentication from a single image, while enjoying the benefits of geometric data without explicitly reconstructing it. We demonstrate that the suggested method outperforms both existing single-image and explicit depth based methods on large-scale benchmarks, and even capable of recognize spoofing attacks. We also provide an ablation study that shows that the suggested method uses the face locations in the left and right images to encode informative features that improve the overall performance.
△ Less
Submitted 10 November, 2020; v1 submitted 24 March, 2020;
originally announced March 2020.
-
The Whole Is Greater Than the Sum of Its Nonrigid Parts
Authors:
Oshri Halimi,
Ido Imanuel,
Or Litany,
Giovanni Trappolini,
Emanuele Rodolà,
Leonidas Guibas,
Ron Kimmel
Abstract:
According to Aristotle, a philosopher in Ancient Greece, "the whole is greater than the sum of its parts". This observation was adopted to explain human perception by the Gestalt psychology school of thought in the twentieth century. Here, we claim that observing part of an object which was previously acquired as a whole, one could deal with both partial matching and shape completion in a holistic…
▽ More
According to Aristotle, a philosopher in Ancient Greece, "the whole is greater than the sum of its parts". This observation was adopted to explain human perception by the Gestalt psychology school of thought in the twentieth century. Here, we claim that observing part of an object which was previously acquired as a whole, one could deal with both partial matching and shape completion in a holistic manner. More specifically, given the geometry of a full, articulated object in a given pose, as well as a partial scan of the same object in a different pose, we address the problem of matching the part to the whole while simultaneously reconstructing the new pose from its partial observation. Our approach is data-driven, and takes the form of a Siamese autoencoder without the requirement of a consistent vertex labeling at inference time; as such, it can be used on unorganized point clouds as well as on triangle meshes. We demonstrate the practical effectiveness of our model in the applications of single-view deformable shape completion and dense shape correspondence, both on synthetic and real-world geometric data, where we outperform prior work on these tasks by a large margin.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Bilateral Operators for Functional Maps
Authors:
Gautam Pai,
Mor Joseph-Rivlin,
Ron Kimmel
Abstract:
A majority of shape correspondence frameworks are based on devising pointwise and pairwise constraints on the correspondence map. The functional maps framework allows for formulating these constraints in the spectral domain. In this paper, we develop a functional map framework for the shape correspondence problem by constructing pairwise constraints using point-wise descriptors. Our core observati…
▽ More
A majority of shape correspondence frameworks are based on devising pointwise and pairwise constraints on the correspondence map. The functional maps framework allows for formulating these constraints in the spectral domain. In this paper, we develop a functional map framework for the shape correspondence problem by constructing pairwise constraints using point-wise descriptors. Our core observation is that, every point-wise descriptor allows for the construction a pairwise kernel operator whose low frequency eigenfunctions depict regions of similar descriptor values at various scales of frequency. By aggregating the pairwise information from the descriptor and the intrinsic geometry of the surface encoded in the heat kernel, we construct a hybrid kernel and call it the bilateral operator. Analogous to the edge preserving bilateral filter in image processing, the action of the bilateral operator on a function defined over the manifold yields a descriptor dependent local smoothing of that function. By forcing the correspondence map to commute with the Bilateral operator, we show that we can maximally exploit the information from a given set of pointwise descriptors in a functional map framework.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Data Augmentation for Leaf Segmentation and Counting Tasks in Rosette Plants
Authors:
Dmitry Kuznichov,
Alon Zvirin,
Yaron Honen,
Ron Kimmel
Abstract:
Deep learning techniques involving image processing and data analysis are constantly evolving. Many domains adapt these techniques for object segmentation, instantiation and classification. Recently, agricultural industries adopted those techniques in order to bring automation to farmers around the globe. One analysis procedure required for automatic visual inspection in this domain is leaf count…
▽ More
Deep learning techniques involving image processing and data analysis are constantly evolving. Many domains adapt these techniques for object segmentation, instantiation and classification. Recently, agricultural industries adopted those techniques in order to bring automation to farmers around the globe. One analysis procedure required for automatic visual inspection in this domain is leaf count and segmentation. Collecting labeled data from field crops and greenhouses is a complicated task due to the large variety of crops, growth seasons, climate changes, phenotype diversity, and more, especially when specific learning tasks require a large amount of labeled data for training. Data augmentation for training deep neural networks is well established, examples include data synthesis, using generative semi-synthetic models, and applying various kinds of transformations. In this paper we propose a method that preserves the geometric structure of the data objects, thus kee** the physical appearance of the data-set as close as possible to imaged plants in real agricultural scenes. The proposed method provides state of the art results when applied to the standard benchmark in the field, namely, the ongoing Leaf Segmentation Challenge hosted by Computer Vision Problems in Plant Phenoty**.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
Deep Eikonal Solvers
Authors:
Moshe Lichtenstein,
Gautam Pai,
Ron Kimmel
Abstract:
A deep learning approach to numerically approximate the solution to the Eikonal equation is introduced. The proposed method is built on the fast marching scheme which comprises of two components: a local numerical solver and an update scheme. We replace the formulaic local numerical solver with a trained neural network to provide highly accurate estimates of local distances for a variety of differ…
▽ More
A deep learning approach to numerically approximate the solution to the Eikonal equation is introduced. The proposed method is built on the fast marching scheme which comprises of two components: a local numerical solver and an update scheme. We replace the formulaic local numerical solver with a trained neural network to provide highly accurate estimates of local distances for a variety of different geometries and sampling conditions. Our learning approach generalizes not only to flat Euclidean domains but also to curved surfaces enabled by the incorporation of certain invariant features in the neural network architecture. We show a considerable gain in performance, validated by smaller errors and higher orders of accuracy for the numerical solutions of the Eikonal equation computed on different surfaces The proposed approach leverages the approximation power of neural networks to enhance the performance of numerical algorithms, thereby, connecting the somewhat disparate themes of numerical geometry and learning.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Learning to Optimize Multigrid PDE Solvers
Authors:
Daniel Greenfeld,
Meirav Galun,
Ron Kimmel,
Irad Yavneh,
Ronen Basri
Abstract:
Constructing fast numerical solvers for partial differential equations (PDEs) is crucial for many scientific disciplines. A leading technique for solving large-scale PDEs is using multigrid methods. At the core of a multigrid solver is the prolongation matrix, which relates between different scales of the problem. This matrix is strongly problem-dependent, and its optimal construction is critical…
▽ More
Constructing fast numerical solvers for partial differential equations (PDEs) is crucial for many scientific disciplines. A leading technique for solving large-scale PDEs is using multigrid methods. At the core of a multigrid solver is the prolongation matrix, which relates between different scales of the problem. This matrix is strongly problem-dependent, and its optimal construction is critical to the efficiency of the solver. In practice, however, devising multigrid algorithms for new problems often poses formidable challenges. In this paper we propose a framework for learning multigrid solvers. Our method learns a (single) map** from a family of parameterized PDEs to prolongation operators. We train a neural network once for the entire class of PDEs, using an efficient and unsupervised loss function. Experiments on a broad class of 2D diffusion problems demonstrate improved convergence rates compared to the widely used Black-Box multigrid scheme, suggesting that our method successfully learned rules for constructing prolongation matrices.
△ Less
Submitted 5 August, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Synthesizing facial photometries and corresponding geometries using generative adversarial networks
Authors:
Gil Shamai,
Ron Slossberg,
Ron Kimmel
Abstract:
Artificial data synthesis is currently a well studied topic with useful applications in data science, computer vision, graphics and many other fields. Generating realistic data is especially challenging since human perception is highly sensitive to non realistic appearance. In recent times, new levels of realism have been achieved by advances in GAN training procedures and architectures. These suc…
▽ More
Artificial data synthesis is currently a well studied topic with useful applications in data science, computer vision, graphics and many other fields. Generating realistic data is especially challenging since human perception is highly sensitive to non realistic appearance. In recent times, new levels of realism have been achieved by advances in GAN training procedures and architectures. These successful models, however, are tuned mostly for use with regularly sampled data such as images, audio and video. Despite the successful application of the architecture on these types of media, applying the same tools to geometric data poses a far greater challenge. The study of geometric deep learning is still a debated issue within the academic community as the lack of intrinsic parametrization inherent to geometric objects prohibits the direct use of convolutional filters, a main building block of today's machine learning systems. In this paper we propose a new method for generating realistic human facial geometries coupled with overlayed textures. We circumvent the parametrization issue by imposing a global map** from our data to the unit rectangle. We further discuss how to design such a map** to control the map** distortion and conserve area within the mapped image. By representing geometric textures and geometries as images, we are able to use advanced GAN methodologies to generate new geometries. We address the often neglected topic of relation between texture and geometry and propose to use this correlation to match between generated textures and their corresponding geometries. We offer a new method for training GAN models on partially corrupted data. Finally, we provide empirical evidence demonstrating our generative model's ability to produce examples of new identities independent from the training data while maintaining a high level of realism, two traits that are often at odds.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
Momen(e)t: Flavor the Moments in Learning to Classify Shapes
Authors:
Mor Joseph-Rivlin,
Alon Zvirin,
Ron Kimmel
Abstract:
A fundamental question in learning to classify 3D shapes is how to treat the data in a way that would allow us to construct efficient and accurate geometric processing and analysis procedures. Here, we restrict ourselves to networks that operate on point clouds. There were several attempts to treat point clouds as non-structured data sets by which a neural network is trained to extract discriminat…
▽ More
A fundamental question in learning to classify 3D shapes is how to treat the data in a way that would allow us to construct efficient and accurate geometric processing and analysis procedures. Here, we restrict ourselves to networks that operate on point clouds. There were several attempts to treat point clouds as non-structured data sets by which a neural network is trained to extract discriminative properties. The idea of using 3D coordinates as class identifiers motivated us to extend this line of thought to that of shape classification by comparing attributes that could easily account for the shape moments. Here, we propose to add polynomial functions of the coordinates allowing the network to account for higher order moments of a given shape. Experiments on two benchmarks show that the suggested network is able to provide state of the art results and at the same token learn more efficiently in terms of memory and computational complexity.
△ Less
Submitted 3 October, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Self-supervised Learning of Dense Shape Correspondence
Authors:
Oshri Halimi,
Or Litany,
Emanuele Rodolà,
Alex Bronstein,
Ron Kimmel
Abstract:
We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for ann…
▽ More
We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
High Quality Facial Surface and Texture Synthesis via Generative Adversarial Networks
Authors:
Ron Slossberg,
Gil Shamai,
Ron Kimmel
Abstract:
In the past several decades, many attempts have been made to model synthetic realistic geometric data. The goal of such models is to generate plausible 3D geometries and textures. Perhaps the best known of its kind is the linear 3D morphable model (3DMM) for faces. Such models can be found at the core of many computer vision applications such as face reconstruction, recognition and authentication…
▽ More
In the past several decades, many attempts have been made to model synthetic realistic geometric data. The goal of such models is to generate plausible 3D geometries and textures. Perhaps the best known of its kind is the linear 3D morphable model (3DMM) for faces. Such models can be found at the core of many computer vision applications such as face reconstruction, recognition and authentication to name just a few.
Generative adversarial networks (GANs) have shown great promise in imitating high dimensional data distributions. State of the art GANs are capable of performing tasks such as image to image translation as well as auditory and image signal synthesis, producing novel plausible samples from the data distribution at hand.
Geometric data is generally more difficult to process due to the inherent lack of an intrinsic parametrization. By bringing geometric data into an aligned space, we are able to map the data onto a 2D plane using a universal parametrization. This alignment process allows for efficient processing of digitally scanned geometric data via image processing tools.
Using this methodology, we propose a novel face synthesis model for generation of realistic facial textures together with their corresponding geometry. A GAN is employed in order to imitate the space of parametrized human textures, while corresponding facial geometries are generated by learning the best 3DMM coefficients for each texture. The generated textures are mapped back onto the corresponding geometries to obtain new generated high resolution 3D faces.
△ Less
Submitted 24 August, 2018;
originally announced August 2018.
-
Specular-to-Diffuse Translation for Multi-View Reconstruction
Authors:
Shihao Wu,
Hui Huang,
Tiziano Portenier,
Matan Sela,
Danny Cohen-Or,
Ron Kimmel,
Matthias Zwicker
Abstract:
Most multi-view 3D reconstruction algorithms, especially when shape-from-shading cues are used, assume that object appearance is predominantly diffuse. To alleviate this restriction, we introduce S2Dnet, a generative adversarial network for transferring multiple views of objects with specular reflection into diffuse ones, so that multi-view reconstruction methods can be applied more effectively. O…
▽ More
Most multi-view 3D reconstruction algorithms, especially when shape-from-shading cues are used, assume that object appearance is predominantly diffuse. To alleviate this restriction, we introduce S2Dnet, a generative adversarial network for transferring multiple views of objects with specular reflection into diffuse ones, so that multi-view reconstruction methods can be applied more effectively. Our network extends unsupervised image-to-image translation to multi-view "specular to diffuse" translation. To preserve object appearance across multiple views, we introduce a Multi-View Coherence loss (MVC) that evaluates the similarity and faithfulness of local patches after the view-transformation. Our MVC loss ensures that the similarity of local correspondences among multi-view images is preserved under the image-to-image translation. As a result, our network yields significantly better results than several single-view baseline techniques. In addition, we carefully design and generate a large synthetic training data set using physically-based rendering. During testing, our network takes only the raw glossy images as input, without extra information such as segmentation masks or lighting estimation. Results demonstrate that multi-view reconstruction can be significantly improved using the images filtered by our network. We also show promising performance on real world training and testing data.
△ Less
Submitted 30 July, 2018; v1 submitted 14 July, 2018;
originally announced July 2018.
-
Self Functional Maps
Authors:
Oshri Halimi,
Ron Kimmel
Abstract:
A classical approach for surface classification is to find a compact algebraic representation for each surface that would be similar for objects within the same class and preserve dissimilarities between classes. We introduce Self Functional Maps as a novel surface representation that satisfies these properties, translating the geometric problem of surface classification into an algebraic form of…
▽ More
A classical approach for surface classification is to find a compact algebraic representation for each surface that would be similar for objects within the same class and preserve dissimilarities between classes. We introduce Self Functional Maps as a novel surface representation that satisfies these properties, translating the geometric problem of surface classification into an algebraic form of classifying matrices. The proposed map transforms a given surface into a universal isometry invariant form defined by a unique matrix. The suggested representation is realized by applying the functional maps framework to map the surface into itself. The key idea is to use two different metric spaces of the same surface for which the functional map serves as a signature. Specifically, in this paper, we use the regular and the scale invariant surface laplacian operators to construct two families of eigenfunctions. The result is a matrix that encodes the interaction between the eigenfunctions resulted from two different Riemannian manifolds of the same surface. Using this representation, geometric shape similarity is converted into algebraic distances between matrices.
△ Less
Submitted 29 April, 2018; v1 submitted 22 April, 2018;
originally announced April 2018.
-
DIMAL: Deep Isometric Manifold Learning Using Sparse Geodesic Sampling
Authors:
Gautam Pai,
Ronen Talmon,
Alex Bronstein,
Ron Kimmel
Abstract:
This paper explores a fully unsupervised deep learning approach for computing distance-preserving maps that generate low-dimensional embeddings for a certain class of manifolds. We use the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances. By training with only a few land…
▽ More
This paper explores a fully unsupervised deep learning approach for computing distance-preserving maps that generate low-dimensional embeddings for a certain class of manifolds. We use the Siamese configuration to train a neural network to solve the problem of least squares multidimensional scaling for generating maps that approximately preserve geodesic distances. By training with only a few landmarks, we show a significantly improved local and nonlocal generalization of the isometric map** as compared to analogous non-parametric counterparts. Importantly, the combination of a deep-learning framework with a multidimensional scaling objective enables a numerical analysis of network architectures to aid in understanding their representation power. This provides a geometric perspective to the generalizability of deep learning.
△ Less
Submitted 13 November, 2018; v1 submitted 16 November, 2017;
originally announced November 2017.
-
CoBe -- Coded Beacons for Localization, Object Tracking, and SLAM Augmentation
Authors:
Roman Rabinovich,
Ibrahim Jubran,
Aaron Wetzler,
Ron Kimmel
Abstract:
This paper presents a novel beacon light coding protocol, which enables fast and accurate identification of the beacons in an image. The protocol is provably robust to a predefined set of detection and decoding errors, and does not require any synchronization between the beacons themselves and the optical sensor. A detailed guide is then given for develo** an optical tracking and localization sy…
▽ More
This paper presents a novel beacon light coding protocol, which enables fast and accurate identification of the beacons in an image. The protocol is provably robust to a predefined set of detection and decoding errors, and does not require any synchronization between the beacons themselves and the optical sensor. A detailed guide is then given for develo** an optical tracking and localization system, which is based on the suggested protocol and readily available hardware. Such a system operates either as a standalone system for recovering the six degrees of freedom of fast moving objects, or integrated with existing SLAM pipelines providing them with error-free and easily identifiable landmarks. Based on this guide, we implemented a low-cost positional tracking system which can run in real-time on an IoT board. We evaluate our system's accuracy and compare it to other popular methods which utilize the same optical hardware, in experiments where the ground truth is known. A companion video containing multiple real-world experiments demonstrates the accuracy, speed, and applicability of the proposed system in a wide range of environments and real-world tasks. Open source code is provided to encourage further development of low-cost localization systems integrating the suggested technology at its navigation core.
△ Less
Submitted 21 April, 2020; v1 submitted 18 August, 2017;
originally announced August 2017.
-
Efficient Deformable Shape Correspondence via Kernel Matching
Authors:
Zorah Lähner,
Matthias Vestner,
Amit Boyarski,
Or Litany,
Ron Slossberg,
Tal Remez,
Emanuele Rodolà,
Alex Bronstein,
Michael Bronstein,
Ron Kimmel,
Daniel Cremers
Abstract:
We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the map**, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of t…
▽ More
We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the map**, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of the highly non-convex nature of the resulting quadratic assignment problem, our method converges to a semantically meaningful and continuous map** in most of our experiments, and scales well. We provide preliminary theoretical analysis and several interpretations of the method.
△ Less
Submitted 15 September, 2017; v1 submitted 24 July, 2017;
originally announced July 2017.
-
Sparse Approximation of 3D Meshes using the Spectral Geometry of the Hamiltonian Operator
Authors:
Yoni Choukroun,
Gautam Pai,
Ron Kimmel
Abstract:
The discrete Laplace operator is ubiquitous in spectral shape analysis, since its eigenfunctions are provably optimal in representing smooth functions defined on the surface of the shape. Indeed, subspaces defined by its eigenfunctions have been utilized for shape compression, treating the coordinates as smooth functions defined on the given surface. However, surfaces of shapes in nature often con…
▽ More
The discrete Laplace operator is ubiquitous in spectral shape analysis, since its eigenfunctions are provably optimal in representing smooth functions defined on the surface of the shape. Indeed, subspaces defined by its eigenfunctions have been utilized for shape compression, treating the coordinates as smooth functions defined on the given surface. However, surfaces of shapes in nature often contain geometric structures for which the general smoothness assumption may fail to hold. At the other end, some explicit mesh compression algorithms utilize the order by which vertices that represent the surface are traversed, a property which has been ignored in spectral approaches. Here, we incorporate the order of vertices into an operator that defines a novel spectral domain. We propose a method for representing 3D meshes using the spectral geometry of the Hamiltonian operator, integrated within a sparse approximation framework. We adapt the concept of a potential function from quantum physics and incorporate vertex ordering information into the potential, yielding a novel data-dependent operator. The potential function modifies the spectral geometry of the Laplacian to focus on regions with finer details of the given surface. By sparsely encoding the geometry of the shape using the proposed data-dependent basis, we improve compression performance compared to previous results that use the standard Laplacian basis and spectral graph wavelets.
△ Less
Submitted 12 May, 2018; v1 submitted 7 July, 2017;
originally announced July 2017.
-
A Deep Learning Perspective on the Origin of Facial Expressions
Authors:
Ran Breuer,
Ron Kimmel
Abstract:
Facial expressions play a significant role in human communication and behavior. Psychologists have long studied the relationship between facial expressions and emotions. Paul Ekman et al., devised the Facial Action Coding System (FACS) to taxonomize human facial expressions and model their behavior. The ability to recognize facial expressions automatically, enables novel applications in fields lik…
▽ More
Facial expressions play a significant role in human communication and behavior. Psychologists have long studied the relationship between facial expressions and emotions. Paul Ekman et al., devised the Facial Action Coding System (FACS) to taxonomize human facial expressions and model their behavior. The ability to recognize facial expressions automatically, enables novel applications in fields like human-computer interaction, social gaming, and psychological research. There has been a tremendously active research in this field, with several recent papers utilizing convolutional neural networks (CNN) for feature extraction and inference. In this paper, we employ CNN understanding methods to study the relation between the features these computational networks are using, the FACS and Action Units (AU). We verify our findings on the Extended Cohn-Kanade (CK+), NovaEmotions and FER2013 datasets. We apply these models to various tasks and tests using transfer learning, including cross-dataset validation and cross-task performance. Finally, we exploit the nature of the FER based CNN models for the detection of micro-expressions and achieve state-of-the-art accuracy using a simple long-short-term-memory (LSTM) recurrent neural network (RNN).
△ Less
Submitted 10 May, 2017; v1 submitted 4 May, 2017;
originally announced May 2017.
-
Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation
Authors:
Matan Sela,
Elad Richardson,
Ron Kimmel
Abstract:
It has been recently shown that neural networks can recover the geometric structure of a face from a single given image. A common denominator of most existing face geometry reconstruction methods is the restriction of the solution space to some low-dimensional subspace. While such a model significantly simplifies the reconstruction problem, it is inherently limited in its expressiveness. As an alt…
▽ More
It has been recently shown that neural networks can recover the geometric structure of a face from a single given image. A common denominator of most existing face geometry reconstruction methods is the restriction of the solution space to some low-dimensional subspace. While such a model significantly simplifies the reconstruction problem, it is inherently limited in its expressiveness. As an alternative, we propose an Image-to-Image translation network that jointly maps the input image to a depth image and a facial correspondence map. This explicit pixel-based map** can then be utilized to provide high quality reconstructions of diverse faces under extreme expressions, using a purely geometric refinement process. In the spirit of recent approaches, the network is trained only with synthetic data, and is then evaluated on in-the-wild facial images. Both qualitative and quantitative analyses demonstrate the accuracy and the robustness of our approach.
△ Less
Submitted 15 September, 2017; v1 submitted 29 March, 2017;
originally announced March 2017.
-
Deep Stereo Matching with Dense CRF Priors
Authors:
Ron Slossberg,
Aaron Wetzler,
Ron Kimmel
Abstract:
Stereo reconstruction from rectified images has recently been revisited within the context of deep learning. Using a deep Convolutional Neural Network to obtain patch-wise matching cost volumes has resulted in state of the art stereo reconstruction on classic datasets like Middlebury and Kitti. By introducing this cost into a classical stereo pipeline, the final results are improved dramatically o…
▽ More
Stereo reconstruction from rectified images has recently been revisited within the context of deep learning. Using a deep Convolutional Neural Network to obtain patch-wise matching cost volumes has resulted in state of the art stereo reconstruction on classic datasets like Middlebury and Kitti. By introducing this cost into a classical stereo pipeline, the final results are improved dramatically over non-learning based cost models. However these pipelines typically include hand engineered post processing steps to effectively regularize and clean the result. Here, we show that it is possible to take a more holistic approach by training a fully end-to-end network which directly includes regularization in the form of a densely connected Conditional Random Field (CRF) that acts as a prior on inter-pixel interactions. We demonstrate that our approach on both synthetic and real world datasets outperforms an alternative end-to-end network and compares favorably to more hand engineered approaches.
△ Less
Submitted 24 January, 2017; v1 submitted 6 December, 2016;
originally announced December 2016.
-
Learning Invariant Representations Of Planar Curves
Authors:
Gautam Pai,
Aaron Wetzler,
Ron Kimmel
Abstract:
We propose a metric learning framework for the construction of invariant geometric functions of planar curves for the Eucledian and Similarity group of transformations. We leverage on the representational power of convolutional neural networks to compute these geometric quantities. In comparison with axiomatic constructions, we show that the invariants approximated by the learning architectures ha…
▽ More
We propose a metric learning framework for the construction of invariant geometric functions of planar curves for the Eucledian and Similarity group of transformations. We leverage on the representational power of convolutional neural networks to compute these geometric quantities. In comparison with axiomatic constructions, we show that the invariants approximated by the learning architectures have better numerical qualities such as robustness to noise, resiliency to sampling, as well as the ability to adapt to occlusion and partiality. Finally, we develop a novel multi-scale representation in a similarity metric learning paradigm.
△ Less
Submitted 16 February, 2017; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Geodesic Distance Descriptors
Authors:
Gil Shamai,
Ron Kimmel
Abstract:
The Gromov-Hausdorff (GH) distance is traditionally used for measuring distances between metric spaces. It is defined as the minimal distortion of embedding one surface into the other, while the optimal correspondence can be described as the map that minimizes this distortion. Solving such a minimization is a hard combinatorial problem that requires pre-computation and storing of all pairwise geod…
▽ More
The Gromov-Hausdorff (GH) distance is traditionally used for measuring distances between metric spaces. It is defined as the minimal distortion of embedding one surface into the other, while the optimal correspondence can be described as the map that minimizes this distortion. Solving such a minimization is a hard combinatorial problem that requires pre-computation and storing of all pairwise geodesic distances for the matched surfaces. A popular way for compact representation of functions on surfaces is by projecting them into the leading eigenfunctions of the Laplace-Beltrami Operator (LBO). When truncated, The basis of the LBO is known to be the optimal for representing functions with bounded gradient in a min-max sense. Methods such as Spectral-GMDS exploit this idea to simplify and efficiently approximate a minimization related to the GH distance by operating in the truncated spectral domain, and obtain state of the art results for matching of nearly isometric shapes. However, when considering only a specific set of functions on the surface, such as geodesic distances, an optimized basis could be considered as an even better alternative. Here, we define the geodesic distance basis, which is optimal for compact approximation of geodesic distances, in terms of Frobenius norm. We use the suggested basis to extract the Geodesic Distance Descriptor (GDD), which encodes the geodesic distances information as a linear combination of the basis functions. We then show how these ideas can be used to efficiently and accurately approximate the metric spaces matching problem with almost no loss of information. These observations are used to construct a very simple and efficient procedure for shape correspondence. Experimental results show that the GDD improves both accuracy and efficiency of state of the art shape matching procedures.
△ Less
Submitted 20 November, 2016;
originally announced November 2016.
-
Efficient Inter-Geodesic Distance Computation and Fast Classical Scaling
Authors:
Gil Shamai,
Michael Zibulevsky,
Ron Kimmel
Abstract:
Multidimensional scaling (MDS) is a dimensionality reduction tool used for information analysis, data visualization and manifold learning. Most MDS procedures embed data points in low-dimensional Euclidean (flat) domains, such that distances between the points are as close as possible to given inter-point dissimilarities. We present an efficient solver for classical scaling, a specific MDS model,…
▽ More
Multidimensional scaling (MDS) is a dimensionality reduction tool used for information analysis, data visualization and manifold learning. Most MDS procedures embed data points in low-dimensional Euclidean (flat) domains, such that distances between the points are as close as possible to given inter-point dissimilarities. We present an efficient solver for classical scaling, a specific MDS model, by extrapolating the information provided by distances measured from a subset of the points to the remainder. The computational and space complexities of the new MDS methods are thereby reduced from quadratic to quasi-linear in the number of data points. Incorporating both local and global information about the data allows us to construct a low-rank approximation of the inter-geodesic distances between the data points. As a by-product, the proposed method allows for efficient computation of geodesic distances.
△ Less
Submitted 22 October, 2018; v1 submitted 20 November, 2016;
originally announced November 2016.
-
Learning Detailed Face Reconstruction from a Single Image
Authors:
Elad Richardson,
Matan Sela,
Roy Or-El,
Ron Kimmel
Abstract:
Reconstructing the detailed geometric structure of a face from a given image is a key to many computer vision and graphics applications, such as motion capture and reenactment. The reconstruction task is challenging as human faces vary extensively when considering expressions, poses, textures, and intrinsic geometries. While many approaches tackle this complexity by using additional data to recons…
▽ More
Reconstructing the detailed geometric structure of a face from a given image is a key to many computer vision and graphics applications, such as motion capture and reenactment. The reconstruction task is challenging as human faces vary extensively when considering expressions, poses, textures, and intrinsic geometries. While many approaches tackle this complexity by using additional data to reconstruct the face of a single subject, extracting facial surface from a single image remains a difficult problem. As a result, single-image based methods can usually provide only a rough estimate of the facial geometry. In contrast, we propose to leverage the power of convolutional neural networks to produce a highly detailed face reconstruction from a single image. For this purpose, we introduce an end-to-end CNN framework which derives the shape in a coarse-to-fine fashion. The proposed architecture is composed of two main blocks, a network that recovers the coarse facial geometry (CoarseNet), followed by a CNN that refines the facial features of that geometry (FineNet). The proposed networks are connected by a novel layer which renders a depth image given a mesh in 3D. Unlike object recognition and detection problems, there are no suitable datasets for training CNNs to perform face geometry reconstruction. Therefore, our training regime begins with a supervised phase, based on synthetic images, followed by an unsupervised phase that uses only unconstrained facial images. The accuracy and robustness of the proposed model is demonstrated by both qualitative and quantitative evaluation tests.
△ Less
Submitted 6 April, 2017; v1 submitted 15 November, 2016;
originally announced November 2016.
-
Hamiltonian operator for spectral shape analysis
Authors:
Yoni Choukroun,
Alon Shtern,
Alex Bronstein,
Ron Kimmel
Abstract:
Many shape analysis methods treat the geometry of an object as a metric space that can be captured by the Laplace-Beltrami operator. In this paper, we propose to adapt the classical Hamiltonian operator from quantum mechanics to the field of shape analysis. To this end we study the addition of a potential function to the Laplacian as a generator for dual spaces in which shape processing is perform…
▽ More
Many shape analysis methods treat the geometry of an object as a metric space that can be captured by the Laplace-Beltrami operator. In this paper, we propose to adapt the classical Hamiltonian operator from quantum mechanics to the field of shape analysis. To this end we study the addition of a potential function to the Laplacian as a generator for dual spaces in which shape processing is performed. We present a general optimization approach for solving variational problems involving the basis defined by the Hamiltonian using perturbation theory for its eigenvectors. The suggested operator is shown to produce better functional spaces to operate with, as demonstrated on different shape analysis tasks.
△ Less
Submitted 25 June, 2017; v1 submitted 7 November, 2016;
originally announced November 2016.
-
Fast Blended Transformations for Partial Shape Registration
Authors:
Alon Shtern,
Matan Sela,
Ron Kimmel
Abstract:
Automatic estimation of skinning transformations is a popular way to deform a single reference shape into a new pose by providing a small number of control parameters. We generalize this approach by efficiently enabling the use of multiple exemplar shapes. Using a small set of representative natural poses, we propose to express an unseen appearance by a low-dimensional linear subspace, specified b…
▽ More
Automatic estimation of skinning transformations is a popular way to deform a single reference shape into a new pose by providing a small number of control parameters. We generalize this approach by efficiently enabling the use of multiple exemplar shapes. Using a small set of representative natural poses, we propose to express an unseen appearance by a low-dimensional linear subspace, specified by a redundant dictionary of weighted vertex positions. Minimizing a nonlinear functional that regulates the example manifold, the suggested approach supports local-rigid deformations of articulated objects, as well as nearly isometric embeddings of smooth shapes. A real-time non-rigid deformation system is demonstrated, and a shape completion and partial registration framework is introduced. These applications can recover a target pose and implicit inverse kinematics from a small number of examples and just a few vertex positions. The result reconstruction is more accurate compared to state-of-the-art reduced deformable models.
△ Less
Submitted 25 September, 2016;
originally announced September 2016.
-
Customized Facial Constant Positive Air Pressure (CPAP) Masks
Authors:
Matan Sela,
Nadav Toledo,
Yaron Honen,
Ron Kimmel
Abstract:
Sleep apnea is a syndrome that is characterized by sudden breathing halts while slee**. One of the common treatments involves wearing a mask that delivers continuous air flow into the nostrils so as to maintain a steady air pressure. These masks are designed for an average facial model and are often difficult to adjust due to poor fit to the actual patient. The incompatibility is characterized b…
▽ More
Sleep apnea is a syndrome that is characterized by sudden breathing halts while slee**. One of the common treatments involves wearing a mask that delivers continuous air flow into the nostrils so as to maintain a steady air pressure. These masks are designed for an average facial model and are often difficult to adjust due to poor fit to the actual patient. The incompatibility is characterized by gaps between the mask and the face, which deteriorates the impermeability of the mask and leads to air leakage. We suggest a fully automatic approach for designing a personalized nasal mask interface using a facial depth scan. The interfaces generated by the proposed method accurately fit the geometry of the scanned face, and are easy to manufacture. The proposed method utilizes cheap commodity depth sensors and 3D printing technologies to efficiently design and manufacture customized masks for patients suffering from sleep apnea.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Randomized Independent Component Analysis
Authors:
Matan Sela,
Ron Kimmel
Abstract:
Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such ap…
▽ More
Independent component analysis (ICA) is a method for recovering statistically independent signals from observations of unknown linear combinations of the sources. Some of the most accurate ICA decomposition methods require searching for the inverse transformation which minimizes different approximations of the Mutual Information, a measure of statistical independence of random vectors. Two such approximations are the Kernel Generalized Variance or the Kernel Canonical Correlation which has been shown to reach the highest performance of ICA methods. However, the computational effort necessary just for computing these measures is cubic in the sample size. Hence, optimizing them becomes even more computationally demanding, in terms of both space and time. Here, we propose a couple of alternative novel measures based on randomized features of the samples - the Randomized Generalized Variance and the Randomized Canonical Correlation. The computational complexity of calculating the proposed alternatives is linear in the sample size and provide a controllable approximation of their Kernel-based non-random versions. We also show that optimization of the proposed statistical properties yields a comparable separation error at an order of magnitude faster compared to Kernel-based measures.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Consistent Discretization and Minimization of the L1 Norm on Manifolds
Authors:
Alex Bronstein,
Yoni Choukroun,
Ron Kimmel,
Matan Sela
Abstract:
The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal…
▽ More
The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes. The continuous L1 norm on the manifold is often replaced by the vector l1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed l2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.
△ Less
Submitted 18 September, 2016;
originally announced September 2016.
-
3D Face Reconstruction by Learning from Synthetic Data
Authors:
Elad Richardson,
Matan Sela,
Ron Kimmel
Abstract:
Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Con…
▽ More
Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.
△ Less
Submitted 26 September, 2016; v1 submitted 14 September, 2016;
originally announced September 2016.
-
Real-Time Depth Refinement for Specular Objects
Authors:
Roy Or - El,
Rom Hershkovitz,
Aaron Wetzler,
Guy Rosman,
Alfred M. Bruckstein,
Ron Kimmel
Abstract:
The introduction of consumer RGB-D scanners set off a major boost in 3D computer vision research. Yet, the precision of existing depth scanners is not accurate enough to recover fine details of a scanned object. While modern shading based depth refinement methods have been proven to work well with Lambertian objects, they break down in the presence of specularities. We present a novel shape from s…
▽ More
The introduction of consumer RGB-D scanners set off a major boost in 3D computer vision research. Yet, the precision of existing depth scanners is not accurate enough to recover fine details of a scanned object. While modern shading based depth refinement methods have been proven to work well with Lambertian objects, they break down in the presence of specularities. We present a novel shape from shading framework that addresses this issue and enhances both diffuse and specular objects' depth profiles. We take advantage of the built-in monochromatic IR projector and IR images of the RGB-D scanners and present a lighting model that accounts for the specular regions in the input image. Using this model, we reconstruct the depth map in real-time. Both quantitative tests and visual evaluations prove that the proposed method produces state of the art depth reconstruction results.
△ Less
Submitted 30 March, 2016; v1 submitted 28 November, 2015;
originally announced November 2015.
-
Rule Of Thumb: Deep derotation for improved fingertip detection
Authors:
Aaron Wetzler,
Ron Slossberg,
Ron Kimmel
Abstract:
We investigate a novel global orientation regression approach for articulated objects using a deep convolutional neural network. This is integrated with an in-plane image derotation scheme, DeROT, to tackle the problem of per-frame fingertip detection in depth images. The method reduces the complexity of learning in the space of articulated poses which is demonstrated by using two distinct state-o…
▽ More
We investigate a novel global orientation regression approach for articulated objects using a deep convolutional neural network. This is integrated with an in-plane image derotation scheme, DeROT, to tackle the problem of per-frame fingertip detection in depth images. The method reduces the complexity of learning in the space of articulated poses which is demonstrated by using two distinct state-of-the-art learning based hand pose estimation methods applied to fingertip detection. Significant classification improvements are shown over the baseline implementation. Our framework involves no tracking, kinematic constraints or explicit prior model of the articulated object in hand. To support our approach we also describe a new pipeline for high accuracy magnetic annotation and labeling of objects imaged by a depth camera.
△ Less
Submitted 21 July, 2015;
originally announced July 2015.
-
On the optimality of shape and data representation in the spectral domain
Authors:
Yonathan Aflalo,
Haim Brezis,
Ron Kimmel
Abstract:
A proof of the optimality of the eigenfunctions of the Laplace-Beltrami operator (LBO) in representing smooth functions on surfaces is provided and adapted to the field of applied shape and data analysis. It is based on the Courant-Fischer min-max principle adapted to our case. % The theorem we present supports the new trend in geometry processing of treating geometric structures by using th…
▽ More
A proof of the optimality of the eigenfunctions of the Laplace-Beltrami operator (LBO) in representing smooth functions on surfaces is provided and adapted to the field of applied shape and data analysis. It is based on the Courant-Fischer min-max principle adapted to our case. % The theorem we present supports the new trend in geometry processing of treating geometric structures by using their projection onto the leading eigenfunctions of the decomposition of the LBO. Utilisation of this result can be used for constructing numerically efficient algorithms to process shapes in their spectrum. We review a couple of applications as possible practical usage cases of the proposed optimality criteria. % We refer to a scale invariant metric, which is also invariant to bending of the manifold. This novel pseudo-metric allows constructing an LBO by which a scale invariant eigenspace on the surface is defined. We demonstrate the efficiency of an intermediate metric, defined as an interpolation between the scale invariant and the regular one, in representing geometric structures while capturing both coarse and fine details. Next, we review a numerical acceleration technique for classical scaling, a member of a family of flattening methods known as multidimensional scaling (MDS). There, the optimality is exploited to efficiently approximate all geodesic distances between pairs of points on a given surface, and thereby match and compare between almost isometric surfaces. Finally, we revisit the classical principal component analysis (PCA) definition by coupling its variational form with a Dirichlet energy on the data manifold. By pairing the PCA with the LBO we can handle cases that go beyond the scope defined by the observation set that is handled by regular PCA.
△ Less
Submitted 15 September, 2014;
originally announced September 2014.
-
Graph matching: relax or not?
Authors:
Yonathan Aflalo,
Alex Bronstein,
Ron Kimmel
Abstract:
We consider the problem of exact and inexact matching of weighted undirected graphs, in which a bijective correspondence is sought to minimize a quadratic weight disagreement. This computationally challenging problem is often relaxed as a convex quadratic program, in which the space of permutations is replaced by the space of doubly-stochastic matrices. However, the applicability of such a relaxat…
▽ More
We consider the problem of exact and inexact matching of weighted undirected graphs, in which a bijective correspondence is sought to minimize a quadratic weight disagreement. This computationally challenging problem is often relaxed as a convex quadratic program, in which the space of permutations is replaced by the space of doubly-stochastic matrices. However, the applicability of such a relaxation is poorly understood. We define a broad class of friendly graphs characterized by an easily verifiable spectral property. We prove that for friendly graphs, the convex relaxation is guaranteed to find the exact isomorphism or certify its inexistence. This result is further extended to approximately isomorphic graphs, for which we develop an explicit bound on the amount of weight disagreement under which the relaxation is guaranteed to find the globally optimal approximate isomorphism. We also show that in many cases, the graph matching problem can be further harmlessly relaxed to a convex quadratic program with only n separable linear equality constraints, which is substantially more efficient than the standard relaxation involving 2n equality and n^2 inequality constraints. Finally, we show that our results are still valid for unfriendly graphs if additional information in the form of seeds or attributes is allowed, with the latter satisfying an easy to verify spectral characteristic.
△ Less
Submitted 12 October, 2014; v1 submitted 29 January, 2014;
originally announced January 2014.