Search | arXiv e-print repository

Consistent Two-Flow Network for Tele-Registration of Point Clouds

Authors: Zihao Yan, Zimu Yi, Ruizhen Hu, Niloy J. Mitra, Daniel Cohen-Or, Hui Huang

Abstract: Rigid registration of partial observations is a fundamental problem in various applied fields. In computer graphics, special attention has been given to the registration between two partial point clouds generated by scanning devices. State-of-the-art registration techniques still struggle when the overlap region between the two point clouds is small, and completely fail if there is no overlap betw… ▽ More Rigid registration of partial observations is a fundamental problem in various applied fields. In computer graphics, special attention has been given to the registration between two partial point clouds generated by scanning devices. State-of-the-art registration techniques still struggle when the overlap region between the two point clouds is small, and completely fail if there is no overlap between the scan pairs. In this paper, we present a learning-based technique that alleviates this problem, and allows registration between point clouds, presented in arbitrary poses, and having little or even no overlap, a setting that has been referred to as tele-registration. Our technique is based on a novel neural network design that learns a prior of a class of shapes and can complete a partial shape. The key idea is combining the registration and completion tasks in a way that reinforces each other. In particular, we simultaneously train the registration network and completion network using two coupled flows, one that register-and-complete, and one that complete-and-register, and encourage the two flows to produce a consistent result. We show that, compared with each separate flow, this two-flow training leads to robust and reliable tele-registration, and hence to a better point cloud prediction that completes the registered scans. It is also worth mentioning that each of the components in our neural network outperforms state-of-the-art methods in both completion and registration. We further analyze our network with several ablation studies and demonstrate its performance on a large number of partial point clouds, both synthetic and real-world, that have only small or no overlap. △ Less

Submitted 10 October, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: Accepted to IEEE TVCG 2021, project page at https://vcc.tech/research/2021/CTFNet

arXiv:2105.14548 [pdf, other]

Z2P: Instant Visualization of Point Clouds

Authors: Gal Metzer, Rana Hanocka, Raja Giryes, Niloy J. Mitra, Daniel Cohen-Or

Abstract: We present a technique for visualizing point clouds using a neural network. Our technique allows for an instant preview of any point cloud, and bypasses the notoriously difficult surface reconstruction problem or the need to estimate oriented normals for splat-based rendering. We cast the preview problem as a conditional image-to-image translation task, and design a neural network that translates… ▽ More We present a technique for visualizing point clouds using a neural network. Our technique allows for an instant preview of any point cloud, and bypasses the notoriously difficult surface reconstruction problem or the need to estimate oriented normals for splat-based rendering. We cast the preview problem as a conditional image-to-image translation task, and design a neural network that translates point depth-map directly into an image, where the point cloud is visualized as though a surface was reconstructed from it. Furthermore, the resulting appearance of the visualized point cloud can be, optionally, conditioned on simple control variables (e.g., color and light). We demonstrate that our technique instantly produces plausible images, and can, on-the-fly effectively handle noise, non-uniform sampling, and thin surfaces sheets. △ Less

Submitted 21 February, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: Eurographics 2022

arXiv:2104.06392 [pdf, other]

doi 10.1145/3450626.3459821

ShapeMOD: Macro Operation Discovery for 3D Shape Programs

Authors: R. Kenny Jones, David Charatan, Paul Guerrero, Niloy J. Mitra, Daniel Ritchie

Abstract: A popular way to create detailed yet easily controllable 3D shapes is via procedural modeling, i.e. generating geometry using programs. Such programs consist of a series of instructions along with their associated parameter values. To fully realize the benefits of this representation, a shape program should be compact and only expose degrees of freedom that allow for meaningful manipulation of out… ▽ More A popular way to create detailed yet easily controllable 3D shapes is via procedural modeling, i.e. generating geometry using programs. Such programs consist of a series of instructions along with their associated parameter values. To fully realize the benefits of this representation, a shape program should be compact and only expose degrees of freedom that allow for meaningful manipulation of output geometry. One way to achieve this goal is to design higher-level macro operators that, when executed, expand into a series of commands from the base shape modeling language. However, manually authoring such macros, much like shape programs themselves, is difficult and largely restricted to domain experts. In this paper, we present ShapeMOD, an algorithm for automatically discovering macros that are useful across large datasets of 3D shape programs. ShapeMOD operates on shape programs expressed in an imperative, statement-based language. It is designed to discover macros that make programs more compact by minimizing the number of function calls and free parameters required to represent an input shape collection. We run ShapeMOD on multiple collections of programs expressed in a domain-specific language for 3D shape structures. We show that it automatically discovers a concise set of macros that abstract out common structural and parametric patterns that generalize over large shape collections. We also demonstrate that the macros found by ShapeMOD improve performance on downstream tasks including shape generative modeling and inferring programs from point clouds. Finally, we conduct a user study that indicates that ShapeMOD's discovered macros make interactive shape editing more efficient. △ Less

Submitted 22 March, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: SIGGRAPH 2021. Project Page: https://rkjones4.github.io/shapeMOD.html

arXiv:2103.16942 [pdf, other]

Neural Surface Maps

Authors: Luca Morreale, Noam Aigerman, Vladimir Kim, Niloy J. Mitra

Abstract: Maps are arguably one of the most fundamental concepts used to define and operate on manifold surfaces in differentiable geometry. Accordingly, in geometry processing, maps are ubiquitous and are used in many core applications, such as paramterization, shape analysis, remeshing, and deformation. Unfortunately, most computational representations of surface maps do not lend themselves to manipulatio… ▽ More Maps are arguably one of the most fundamental concepts used to define and operate on manifold surfaces in differentiable geometry. Accordingly, in geometry processing, maps are ubiquitous and are used in many core applications, such as paramterization, shape analysis, remeshing, and deformation. Unfortunately, most computational representations of surface maps do not lend themselves to manipulation and optimization, usually entailing hard, discrete problems. While algorithms exist to solve these problems, they are problem-specific, and a general framework for surface maps is still in need. In this paper, we advocate considering neural networks as encoding surface maps. Since neural networks can be composed on one another and are differentiable, we show it is easy to use them to define surfaces via atlases, compose them for surface-to-surface map**s, and optimize differentiable objectives relating to them, such as any notion of distortion, in a trivial manner. In our experiments, we represent surfaces by generating a neural map that approximates a UV parameterization of a 3D model. Then, we compose this map with other neural maps which we optimize with respect to distortion measures. We show that our formulation enables trivial optimization of rather elusive map** tasks, such as maps between a collection of surfaces. △ Less

Submitted 31 March, 2021; originally announced March 2021.

Comments: project page: http://geometry.cs.ucl.ac.uk/projects/2021/neuralmaps/

arXiv:2103.14968 [pdf, other]

Labels4Free: Unsupervised Segmentation using StyleGAN

Authors: Rameen Abdal, Peihao Zhu, Niloy Mitra, Peter Wonka

Abstract: We propose an unsupervised segmentation framework for StyleGAN generated objects. We build on two main observations. First, the features generated by StyleGAN hold valuable information that can be utilized towards training segmentation networks. Second, the foreground and background can often be treated to be largely independent and be composited in different ways. For our solution, we propose to… ▽ More We propose an unsupervised segmentation framework for StyleGAN generated objects. We build on two main observations. First, the features generated by StyleGAN hold valuable information that can be utilized towards training segmentation networks. Second, the foreground and background can often be treated to be largely independent and be composited in different ways. For our solution, we propose to augment the StyleGAN2 generator architecture with a segmentation branch and to split the generator into a foreground and background network. This enables us to generate soft segmentation masks for the foreground object in an unsupervised fashion. On multiple object classes, we report comparable results against state-of-the-art supervised segmentation networks, while against the best unsupervised segmentation approach we demonstrate a clear improvement, both in qualitative and quantitative metrics. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: "Project Page: https://rameenabdal.github.io/Labels4Free/"

Journal ref: ICCV 2021

arXiv:2103.00262 [pdf, other]

Walk2Map: Extracting Floor Plans from Indoor Walk Trajectories

Authors: Claudio Mura, Renato Pajarola, Konrad Schindler, Niloy Mitra

Abstract: Recent years have seen a proliferation of new digital products for the efficient management of indoor spaces, with important applications like emergency management, virtual property showcasing and interior design. These products rely on accurate 3D models of the environments considered, including information on both architectural and non-permanent elements. These models must be created from measur… ▽ More Recent years have seen a proliferation of new digital products for the efficient management of indoor spaces, with important applications like emergency management, virtual property showcasing and interior design. These products rely on accurate 3D models of the environments considered, including information on both architectural and non-permanent elements. These models must be created from measured data such as RGB-D images or 3D point clouds, whose capture and consolidation involves lengthy data workflows. This strongly limits the rate at which 3D models can be produced, preventing the adoption of many digital services for indoor space management. We provide an alternative to such data-intensive procedures by presenting Walk2Map, a data-driven approach to generate floor plans only from trajectories of a person walking inside the rooms. Thanks to recent advances in data-driven inertial odometry, such minimalistic input data can be acquired from the IMU readings of consumer-level smartphones, which allows for an effortless and scalable map** of real-world indoor spaces. Our work is based on learning the latent relation between an indoor walk trajectory and the information represented in a floor plan: interior space footprint, portals, and furniture. We distinguish between recovering area-related (interior footprint, furniture) and wall-related (doors) information and use two different neural architectures for the two tasks: an image-based Encoder-Decoder and a Graph Convolutional Network, respectively. We train our networks using scanned 3D indoor models and apply them in a cascaded fashion on an indoor walk trajectory at inference time. We perform a qualitative and quantitative evaluation using both simulated and measured, real-world trajectories, and compare against a baseline method for image-to-image translation. The experiments confirm the feasibility of our approach. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: To be published in Computer Graphics Forum (Proc. Eurographics 2021)

arXiv:2102.11861 [pdf, other]

Generative Modelling of BRDF Textures from Flash Images

Authors: Philipp Henzler, Valentin Deschaintre, Niloy J. Mitra, Tobias Ritschel

Abstract: We learn a latent space for easy capture, consistent interpolation, and efficient reproduction of visual material appearance. When users provide a photo of a stationary natural material captured under flashlight illumination, first it is converted into a latent material code. Then, in the second step, conditioned on the material code, our method produces an infinite and diverse spatial field of BR… ▽ More We learn a latent space for easy capture, consistent interpolation, and efficient reproduction of visual material appearance. When users provide a photo of a stationary natural material captured under flashlight illumination, first it is converted into a latent material code. Then, in the second step, conditioned on the material code, our method produces an infinite and diverse spatial field of BRDF model parameters (diffuse albedo, normals, roughness, specular albedo) that subsequently allows rendering in complex scenes and illuminations, matching the appearance of the input photograph. Technically, we jointly embed all flash images into a latent space using a convolutional encoder, and -- conditioned on these latent codes -- convert random spatial fields into fields of BRDF parameters using a convolutional neural network (CNN). We condition these BRDF parameters to match the visual characteristics (statistics and spectra of visual features) of the input under matching light. A user study compares our approach favorably to previous work, even those with access to BRDF supervision. △ Less

Submitted 10 September, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

arXiv:2102.11811 [pdf, other]

Dynamic Neural Garments

Authors: Meng Zhang, Duygu Ceylan, Tuanfeng Wang, Niloy J. Mitra

Abstract: A vital task of the wider digital human effort is the creation of realistic garments on digital avatars, both in the form of characteristic fold patterns and wrinkles in static frames as well as richness of garment dynamics under avatars' motion. Existing workflow of modeling, simulation, and rendering closely replicates the physics behind real garments, but is tedious and requires repeating most… ▽ More A vital task of the wider digital human effort is the creation of realistic garments on digital avatars, both in the form of characteristic fold patterns and wrinkles in static frames as well as richness of garment dynamics under avatars' motion. Existing workflow of modeling, simulation, and rendering closely replicates the physics behind real garments, but is tedious and requires repeating most of the workflow under changes to characters' motion, camera angle, or garment resizing. Although data-driven solutions exist, they either focus on static scenarios or only handle dynamics of tight garments. We present a solution that, at test time, takes in body joint motion to directly produce realistic dynamic garment image sequences. Specifically, given the target joint motion sequence of an avatar, we propose dynamic neural garments to jointly simulate and render plausible dynamic garment appearance from an unseen viewpoint. Technically, our solution generates a coarse garment proxy sequence, learns deep dynamic features attached to this template, and neurally renders the features to produce appearance changes such as folds, wrinkles, and silhouettes. We demonstrate generalization behavior to both unseen motion and unseen camera views. Further, our network can be fine-tuned to adopt to new body shape and/or background images. We also provide comparisons against existing neural rendering and image sequence translation approaches, and report clear quantitative improvements. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: 13 pages

arXiv:2102.02798 [pdf, other]

Im2Vec: Synthesizing Vector Graphics without Vector Supervision

Authors: Pradyumna Reddy, Michael Gharbi, Michal Lukac, Niloy J. Mitra

Abstract: Vector graphics are widely used to represent fonts, logos, digital artworks, and graphic designs. But, while a vast body of work has focused on generative algorithms for raster images, only a handful of options exists for vector graphics. One can always rasterize the input graphic and resort to image-based generative approaches, but this negates the advantages of the vector representation. The cur… ▽ More Vector graphics are widely used to represent fonts, logos, digital artworks, and graphic designs. But, while a vast body of work has focused on generative algorithms for raster images, only a handful of options exists for vector graphics. One can always rasterize the input graphic and resort to image-based generative approaches, but this negates the advantages of the vector representation. The current alternative is to use specialized models that require explicit supervision on the vector graphics representation at training time. This is not ideal because large-scale high quality vector-graphics datasets are difficult to obtain. Furthermore, the vector representation for a given design is not unique, so models that supervise on the vector representation are unnecessarily constrained. Instead, we propose a new neural network that can generate complex vector graphics with varying topologies, and only requires indirect supervision from readily-available raster training images (i.e., with no vector counterparts). To enable this, we use a differentiable rasterization pipeline that renders the generated vector shapes and composites them together onto a raster canvas. We demonstrate our method on a range of datasets, and provide comparison with state-of-the-art SVG-VAE and DeepSVG, both of which require explicit vector graphics supervision. Finally, we also demonstrate our approach on the MNIST dataset, for which no groundtruth vector representation is available. Source code, datasets, and more results are available at geometry.cs.ucl.ac.uk/projects/2021/Im2Vec/ △ Less

Submitted 1 April, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

arXiv:2101.10466 [pdf, other]

A regression framework for a probabilistic measure of cost-effectiveness

Authors: Nicholas Illenberger, Nandita Mitra, Andrew J. Spieker

Abstract: To make informed health policy decisions regarding a treatment, we must consider both its cost and its clinical effectiveness. In past work, we introduced the net benefit separation (NBS) as a novel measure of cost-effectiveness. The NBS is a probabilistic measure that characterizes the extent to which a treated patient will be more likely to experience benefit as compared to an untreated patient.… ▽ More To make informed health policy decisions regarding a treatment, we must consider both its cost and its clinical effectiveness. In past work, we introduced the net benefit separation (NBS) as a novel measure of cost-effectiveness. The NBS is a probabilistic measure that characterizes the extent to which a treated patient will be more likely to experience benefit as compared to an untreated patient. Due to variation in treatment response across patients, uncovering factors that influence cost-effectiveness can assist policy makers in population-level decisions regarding resource allocation. In this paper, we introduce a regression framework for NBS in order to estimate covariate-specific NBS and find determinants of variation in NBS. Our approach is able to accommodate informative cost censoring through inverse probability weighting techniques, and addresses confounding through a semiparametric standardization procedure. Through simulations, we show that NBS regression performs well in a variety of common scenarios. We apply our proposed regression procedure to a realistic simulated data set as an illustration of how our approach could be used to investigate the association between cancer stage, comorbidities and cost-effectiveness when comparing adjuvant radiation therapy and chemotherapy in post-hysterectomy endometrial cancer patients. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2012.08197 [pdf, other]

Seeing Behind Objects for 3D Multi-Object Tracking in RGB-D Sequences

Authors: Norman Müller, Yu-Shiang Wong, Niloy J. Mitra, Angela Dai, Matthias Nießner

Abstract: Multi-object tracking from RGB-D video sequences is a challenging problem due to the combination of changing viewpoints, motion, and occlusions over time. We observe that having the complete geometry of objects aids in their tracking, and thus propose to jointly infer the complete geometry of objects as well as track them, for rigidly moving objects over time. Our key insight is that inferring the… ▽ More Multi-object tracking from RGB-D video sequences is a challenging problem due to the combination of changing viewpoints, motion, and occlusions over time. We observe that having the complete geometry of objects aids in their tracking, and thus propose to jointly infer the complete geometry of objects as well as track them, for rigidly moving objects over time. Our key insight is that inferring the complete geometry of the objects significantly helps in tracking. By hallucinating unseen regions of objects, we can obtain additional correspondences between the same instance, thus providing robust tracking even under strong change of appearance. From a sequence of RGB-D frames, we detect objects in each frame and learn to predict their complete object geometry as well as a dense correspondence map** into a canonical space. This allows us to derive 6DoF poses for the objects in each frame, along with their correspondence between frames, providing robust object tracking across the RGB-D sequence. Experiments on both synthetic and real-world RGB-D data demonstrate that we achieve state-of-the-art performance on dynamic object tracking. Furthermore, we show that our object completion significantly helps tracking, providing an improvement of $6.5\%$ in mean MOTA. △ Less

Submitted 16 December, 2020; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: Video: https://youtu.be/F2zs9AMRxeg

arXiv:2012.01203 [pdf, other]

Learning Delaunay Surface Elements for Mesh Reconstruction

Authors: Marie-Julie Rakotosaona, Paul Guerrero, Noam Aigerman, Niloy Mitra, Maks Ovsjanikov

Abstract: We present a method for reconstructing triangle meshes from point clouds. Existing learning-based methods for mesh reconstruction mostly generate triangles individually, making it hard to create manifold meshes. We leverage the properties of 2D Delaunay triangulations to construct a mesh from manifold surface elements. Our method first estimates local geodesic neighborhoods around each point. We t… ▽ More We present a method for reconstructing triangle meshes from point clouds. Existing learning-based methods for mesh reconstruction mostly generate triangles individually, making it hard to create manifold meshes. We leverage the properties of 2D Delaunay triangulations to construct a mesh from manifold surface elements. Our method first estimates local geodesic neighborhoods around each point. We then perform a 2D projection of these neighborhoods using a learned logarithmic map. A Delaunay triangulation in this 2D domain is guaranteed to produce a manifold patch, which we call a Delaunay surface element. We synchronize the local 2D projections of neighboring elements to maximize the manifoldness of the reconstructed mesh. Our results show that we achieve better overall manifoldness of our reconstructed meshes than current methods to reconstruct meshes with arbitrary topology. Our code, data and pretrained models can be found online: https://github.com/mrakotosaon/dse-meshing △ Less

Submitted 6 May, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

Journal ref: CVPR 2021

arXiv:2009.10839 [pdf, other]

doi 10.1515/ijb-2022-0051

Hierarchical Bayesian Bootstrap for Heterogeneous Treatment Effect Estimation

Authors: Arman Oganisian, Nandita Mitra, Jason Roy

Abstract: A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. St… ▽ More A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. Standard practice involves estimating these stratum-specific confounder distributions independently (e.g. via the empirical distribution or Rubin's Bayesian bootstrap), which becomes problematic for sparsely populated strata with few observed confounder vectors. In this paper, we develop a nonparametric hierarchical Bayesian bootstrap (HBB) prior over the stratum-specific confounder distributions for HTE estimation. The HBB partially pools the stratum-specific distributions, thereby allowing principled borrowing of confounder information across strata when sparsity is a concern. We show that posterior inference under the HBB can yield efficiency gains over standard marginalization approaches while avoiding strong parametric assumptions about the confounder distribution. We use our approach to estimate the adverse event risk of proton versus photon chemoradiotherapy across various cancer types. △ Less

Submitted 4 January, 2023; v1 submitted 22 September, 2020; originally announced September 2020.

Journal ref: The International Journal of Biostatistics, 2022

arXiv:2009.08026 [pdf, other]

doi 10.1145/3414685.3417812

ShapeAssembly: Learning to Generate Programs for 3D Shape Structure Synthesis

Authors: R. Kenny Jones, Theresa Barton, Xianghao Xu, Kai Wang, Ellen Jiang, Paul Guerrero, Niloy J. Mitra, Daniel Ritchie

Abstract: Manually authoring 3D shapes is difficult and time consuming; generative models of 3D shapes offer compelling alternatives. Procedural representations are one such possibility: they offer high-quality and editable results but are difficult to author and often produce outputs with limited diversity. On the other extreme are deep generative models: given enough data, they can learn to generate any c… ▽ More Manually authoring 3D shapes is difficult and time consuming; generative models of 3D shapes offer compelling alternatives. Procedural representations are one such possibility: they offer high-quality and editable results but are difficult to author and often produce outputs with limited diversity. On the other extreme are deep generative models: given enough data, they can learn to generate any class of shape but their outputs have artifacts and the representation is not editable. In this paper, we take a step towards achieving the best of both worlds for novel 3D shape synthesis. We propose ShapeAssembly, a domain-specific "assembly-language" for 3D shape structures. ShapeAssembly programs construct shapes by declaring cuboid part proxies and attaching them to one another, in a hierarchical and symmetrical fashion. Its functions are parameterized with free variables, so that one program structure is able to capture a family of related shapes. We show how to extract ShapeAssembly programs from existing shape structures in the PartNet dataset. Then we train a deep generative model, a hierarchical sequence VAE, that learns to write novel ShapeAssembly programs. The program captures the subset of variability that is interpretable and editable. The deep model captures correlations across shape collections that are hard to express procedurally. We evaluate our approach by comparing shapes output by our generated programs to those from other recent shape structure synthesis models. We find that our generated shapes are more plausible and physically-valid than those of other methods. Additionally, we assess the latent spaces of these models, and find that ours is better structured and produces smoother interpolations. As an application, we use our generative model and differentiable program interpreter to infer and fit shape programs to unstructured geometry, such as point clouds. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: Accepted to Siggraph Asia 2020; project page: https://rkjones4.github.io/shapeAssembly.html

arXiv:2009.04927 [pdf, other]

Sketch2CAD: Sequential CAD Modeling by Sketching in Context

Authors: Changjian Li, Hao Pan, Adrien Bousseau, Niloy J. Mitra

Abstract: We present a sketch-based CAD modeling system, where users create objects incrementally by sketching the desired shape edits, which our system automatically translates to CAD operations. Our approach is motivated by the close similarities between the steps industrial designers follow to draw 3D shapes, and the operations CAD modeling systems offer to create similar shapes. To overcome the strong a… ▽ More We present a sketch-based CAD modeling system, where users create objects incrementally by sketching the desired shape edits, which our system automatically translates to CAD operations. Our approach is motivated by the close similarities between the steps industrial designers follow to draw 3D shapes, and the operations CAD modeling systems offer to create similar shapes. To overcome the strong ambiguity with parsing 2D sketches, we observe that in a sketching sequence, each step makes sense and can be interpreted in the \emph{context} of what has been drawn before. In our system, this context corresponds to a partial CAD model, inferred in the previous steps, which we feed along with the input sketch to a deep neural network in charge of interpreting how the model should be modified by that sketch. Our deep network architecture then recognizes the intended CAD operation and segments the sketch accordingly, such that a subsequent optimization estimates the parameters of the operation that best fit the segmented sketch strokes. Since there exists no datasets of paired sketching and CAD modeling sequences, we train our system by generating synthetic sequences of CAD operations that we render as line drawings. We present a proof of concept realization of our algorithm supporting four frequently used CAD operations. Using our system, participants are able to quickly model a large and diverse set of objects, demonstrating Sketch2CAD to be an alternate way of interacting with current CAD modeling systems. △ Less

Submitted 10 September, 2020; originally announced September 2020.

arXiv:2009.02473 [pdf, other]

Examining Machine Learning for 5G and Beyond through an Adversarial Lens

Authors: Muhammad Usama, Rupendra Nath Mitra, Inaam Ilahi, Junaid Qadir, Mahesh K. Marina

Abstract: Spurred by the recent advances in deep learning to harness rich information hidden in large volumes of data and to tackle problems that are hard to model/solve (e.g., resource allocation problems), there is currently tremendous excitement in the mobile networks domain around the transformative potential of data-driven AI/ML based network automation, control and analytics for 5G and beyond. In this… ▽ More Spurred by the recent advances in deep learning to harness rich information hidden in large volumes of data and to tackle problems that are hard to model/solve (e.g., resource allocation problems), there is currently tremendous excitement in the mobile networks domain around the transformative potential of data-driven AI/ML based network automation, control and analytics for 5G and beyond. In this article, we present a cautionary perspective on the use of AI/ML in the 5G context by highlighting the adversarial dimension spanning multiple types of ML (supervised/unsupervised/RL) and support this through three case studies. We also discuss approaches to mitigate this adversarial ML risk, offer guidelines for evaluating the robustness of ML models, and call attention to issues surrounding ML oriented research in 5G more generally. △ Less

Submitted 5 September, 2020; originally announced September 2020.

arXiv:2009.01456 [pdf, other]

DeformSyncNet: Deformation Transfer via Synchronized Shape Deformation Spaces

Authors: Minhyuk Sung, Zhenyu Jiang, Panos Achlioptas, Niloy J. Mitra, Leonidas J. Guibas

Abstract: Shape deformation is an important component in any geometry processing toolbox. The goal is to enable intuitive deformations of single or multiple shapes or to transfer example deformations to new shapes while preserving the plausibility of the deformed shape(s). Existing approaches assume access to point-level or part-level correspondence or establish them in a preprocessing phase, thus limiting… ▽ More Shape deformation is an important component in any geometry processing toolbox. The goal is to enable intuitive deformations of single or multiple shapes or to transfer example deformations to new shapes while preserving the plausibility of the deformed shape(s). Existing approaches assume access to point-level or part-level correspondence or establish them in a preprocessing phase, thus limiting the scope and generality of such approaches. We propose DeformSyncNet, a new approach that allows consistent and synchronized shape deformations without requiring explicit correspondence information. Technically, we achieve this by encoding deformations into a class-specific idealized latent space while decoding them into an individual, model-specific linear deformation action space, operating directly in 3D. The underlying encoding and decoding are performed by specialized (jointly trained) neural networks. By design, the inductive bias of our networks results in a deformation space with several desirable properties, such as path invariance across different deformation pathways, which are then also approximately preserved in real space. We qualitatively and quantitatively evaluate our framework against multiple alternative approaches and demonstrate improved performance. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: SIGGRAPH Asia 2020

arXiv:2009.00785 [pdf, ps, other]

Analysis of survival data with non-proportional hazards: A comparison of propensity score weighted methods

Authors: Elizabeth A. Handorf, Marc Smaldone, Sujana Movva, Nandita Mitra

Abstract: One of the most common ways researchers compare survival outcomes across treatments when confounding is present is using Cox regression. This model is limited by its underlying assumption of proportional hazards; in some cases, substantial violations may occur. Here we present and compare approaches which attempt to address this issue, including Cox models with time-varying hazard ratios; parametr… ▽ More One of the most common ways researchers compare survival outcomes across treatments when confounding is present is using Cox regression. This model is limited by its underlying assumption of proportional hazards; in some cases, substantial violations may occur. Here we present and compare approaches which attempt to address this issue, including Cox models with time-varying hazard ratios; parametric accelerated failure time models; Kaplan-Meier curves; and pseudo-observations. To adjust for differences between treatment groups, we use Inverse Probability of Treatment Weighting based on the propensity score. We examine clinically meaningful outcome measures that can be computed and directly compared across each method, namely, survival probability at time T, median survival, and restricted mean survival. We conduct simulation studies under a range of scenarios, and determine the biases, coverages, and standard errors of the Average Treatment Effects for each method. We then apply these approaches to two published observational studies of survival after cancer treatment. The first examines chemotherapy in sarcoma, where survival is very similar initially, but after two years the chemotherapy group shows a benefit. The other study is a comparison of surgical techniques for kidney cancer, where survival differences are attenuated over time. △ Less

Submitted 29 January, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

arXiv:2008.08999 [pdf, other]

Object Properties Inferring from and Transfer for Human Interaction Motions

Authors: Qian Zheng, Weikai Wu, Hanting Pan, Niloy Mitra, Daniel Cohen-Or, Hui Huang

Abstract: Humans regularly interact with their surrounding objects. Such interactions often result in strongly correlated motion between humans and the interacting objects. We thus ask: "Is it possible to infer object properties from skeletal motion alone, even without seeing the interacting object itself?" In this paper, we present a fine-grained action recognition method that learns to infer such latent o… ▽ More Humans regularly interact with their surrounding objects. Such interactions often result in strongly correlated motion between humans and the interacting objects. We thus ask: "Is it possible to infer object properties from skeletal motion alone, even without seeing the interacting object itself?" In this paper, we present a fine-grained action recognition method that learns to infer such latent object properties from human interaction motion alone. This inference allows us to disentangle the motion from the object property and transfer object properties to a given motion. We collected a large number of videos and 3D skeletal motions of the performing actors using an inertial motion capture device. We analyze similar actions and learn subtle differences among them to reveal latent properties of the interacting objects. In particular, we learn to identify the interacting object, by estimating its weight, or its fragility or delicacy. Our results clearly demonstrate that the interaction motions and interacting objects are highly correlated and indeed relative object latent properties can be inferred from the 3D skeleton sequences alone, leading to new synthesis possibilities for human interaction motions. Dataset will be available at http://vcc.szu.edu.cn/research/2020/IT. △ Less

Submitted 20 August, 2020; originally announced August 2020.

arXiv:2008.07760 [pdf, other]

Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images

Authors: Jiahui Lei, Srinath Sridhar, Paul Guerrero, Minhyuk Sung, Niloy Mitra, Leonidas J. Guibas

Abstract: We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views. Previous work on learning shape reconstruction from multiple views uses discrete representations such as point clouds or voxels, while continuous surface generation approaches lack multi-view consistency. We address these issues by designing neural ne… ▽ More We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views. Previous work on learning shape reconstruction from multiple views uses discrete representations such as point clouds or voxels, while continuous surface generation approaches lack multi-view consistency. We address these issues by designing neural networks capable of generating high-quality parametric 3D surfaces which are also consistent between views. Furthermore, the generated 3D surfaces preserve accurate image pixel to 3D surface point correspondences, allowing us to lift texture information to reconstruct shapes with rich geometry and appearance. Our method is supervised and trained on a public dataset of shapes from common object categories. Quantitative results indicate that our method significantly outperforms previous work, while qualitative results demonstrate the high quality of our reconstructions. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: ECCV 2020

arXiv:2008.04367 [pdf, other]

Deep Detail Enhancement for Any Garment

Authors: Meng Zhang, Tuanfeng Wang, Duygu Ceylan, Niloy J. Mitra

Abstract: Creating fine garment details requires significant efforts and huge computational resources. In contrast, a coarse shape may be easy to acquire in many scenarios (e.g., via low-resolution physically-based simulation, linear blend skinning driven by skeletal motion, portable scanners). In this paper, we show how to enhance, in a data-driven manner, rich yet plausible details starting from a coarse… ▽ More Creating fine garment details requires significant efforts and huge computational resources. In contrast, a coarse shape may be easy to acquire in many scenarios (e.g., via low-resolution physically-based simulation, linear blend skinning driven by skeletal motion, portable scanners). In this paper, we show how to enhance, in a data-driven manner, rich yet plausible details starting from a coarse garment geometry. Once the parameterization of the garment is given, we formulate the task as a style transfer problem over the space of associated normal maps. In order to facilitate generalization across garment types and character motions, we introduce a patch-based formulation, that produces high-resolution details by matching a Gram matrix based style loss, to hallucinate geometric details (i.e., wrinkle density and shape). We extensively evaluate our method on a variety of production scenarios and show that our method is simple, light-weight, efficient, and generalizes across underlying garment types, sewing patterns, and body motion. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: 12 pages

arXiv:2008.02401 [pdf, other]

doi 10.1145/3447648

StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

Authors: Rameen Abdal, Peihao Zhu, Niloy Mitra, Peter Wonka

Abstract: High-quality, diverse, and photorealistic images can now be generated by unconditional GANs (e.g., StyleGAN). However, limited options exist to control the generation process using (semantic) attributes, while still preserving the quality of the output. Further, due to the entangled nature of the GAN latent space, performing edits along one attribute can easily result in unwanted changes along oth… ▽ More High-quality, diverse, and photorealistic images can now be generated by unconditional GANs (e.g., StyleGAN). However, limited options exist to control the generation process using (semantic) attributes, while still preserving the quality of the output. Further, due to the entangled nature of the GAN latent space, performing edits along one attribute can easily result in unwanted changes along other attributes. In this paper, in the context of conditional exploration of entangled latent spaces, we investigate the two sub-problems of attribute-conditioned sampling and attribute-controlled editing. We present StyleFlow as a simple, effective, and robust solution to both the sub-problems by formulating conditional exploration as an instance of conditional continuous normalizing flows in the GAN latent space conditioned by attribute features. We evaluate our method using the face and the car latent space of StyleGAN, and demonstrate fine-grained disentangled edits along various attributes on both real photographs and StyleGAN generated images. For example, for faces, we vary camera pose, illumination variation, expression, facial hair, gender, and age. Finally, via extensive qualitative and quantitative comparisons, we demonstrate the superiority of StyleFlow to other concurrent works. △ Less

Submitted 20 September, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: "Project Page https://rameenabdal.github.io/StyleFlow Video: https://youtu.be/LRAUJUn3EqQ "

Journal ref: ACM Transactions on Graphics (TOG), 2021

arXiv:2007.12973 [pdf, ps, other]

Doubly Robust Nonparametric Instrumental Variable Estimators for Survival Outcomes

Authors: You** Lee, Edward H. Kennedy, Nandita Mitra

Abstract: Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this work we propose nonparametric estimators for the local average treatment effect on survival probabilities under both nonignorable and ignora… ▽ More Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this work we propose nonparametric estimators for the local average treatment effect on survival probabilities under both nonignorable and ignorable censoring. We provide an efficient influence function-based estimator and a simple estimation procedure when the IV is either binary or continuous. The proposed estimators possess double-robustness properties and can easily incorporate nonparametric estimation using machine learning tools. In simulation studies, we demonstrate the flexibility and efficiency of our proposed estimators under various plausible scenarios. We apply our method to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial for estimating the causal effect of screening on survival probabilities and investigate the causal contrasts between the two interventions under different censoring assumptions. △ Less

Submitted 28 September, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

arXiv:2007.10453 [pdf, other]

doi 10.1007/978-3-030-58558-7_7

Points2Surf: Learning Implicit Surfaces from Point Cloud Patches

Authors: Philipp Erler, Paul Guerrero, Stefan Ohrhallinger, Michael Wimmer, Niloy J. Mitra

Abstract: A key step in any scanning-based asset creation workflow is to convert unordered point clouds to a surface. Classical methods (e.g., Poisson reconstruction) start to degrade in the presence of noisy and partial scans. Hence, deep learning based methods have recently been proposed to produce complete surfaces, even from partial scans. However, such data-driven methods struggle to generalize to new… ▽ More A key step in any scanning-based asset creation workflow is to convert unordered point clouds to a surface. Classical methods (e.g., Poisson reconstruction) start to degrade in the presence of noisy and partial scans. Hence, deep learning based methods have recently been proposed to produce complete surfaces, even from partial scans. However, such data-driven methods struggle to generalize to new shapes with large geometric and topological variations. We present Points2Surf, a novel patch-based learning framework that produces accurate surfaces directly from raw scans without normals. Learning a prior over a combination of detailed local patches and coarse global information improves generalization performance and reconstruction accuracy. Our extensive comparison on both synthetic and real data demonstrates a clear advantage of our method over state-of-the-art alternatives on previously unseen classes (on average, Points2Surf brings down reconstruction error by 30% over SPR and by 270%+ over deep learning based SotA methods) at the cost of longer computation times and a slight increase in small-scale topological noise in some cases. Our source code, pre-trained model, and dataset are available on: https://github.com/ErlerPhilipp/points2surf △ Less

Submitted 13 February, 2024; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: To be published at ECCV 2020 Repository: https://github.com/ErlerPhilipp/points2surf

ACM Class: I.4.5

Journal ref: Computer Vision -- ECCV 2020, 108--124

arXiv:2007.01272 [pdf, other]

RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

Authors: Sebastien Ehrhardt, Oliver Groth, Aron Monszpart, Martin Engelcke, Ingmar Posner, Niloy Mitra, Andrea Vedaldi

Abstract: We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects. Similar to other generative approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE combines an object-centric GAN formulation with a model that explicitly accounts for correlations between individual objects. This allows the model to generate realistic scenes… ▽ More We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects. Similar to other generative approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE combines an object-centric GAN formulation with a model that explicitly accounts for correlations between individual objects. This allows the model to generate realistic scenes and videos from a physically-interpretable parameterization. Furthermore, we show that modeling the object correlation is necessary to learn to disentangle object positions and identity. We find that RELATE is also amenable to physically realistic scene editing and that it significantly outperforms prior art in object-centric scene generation in both synthetic (CLEVR, ShapeStacks) and real-world data (cars). In addition, in contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity. Source code, datasets and more results are available at http://geometry.cs.ucl.ac.uk/projects/2020/relate/. △ Less

Submitted 9 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

arXiv:2006.10569 [pdf, other]

Towards a Neural Graphics Pipeline for Controllable Image Generation

Authors: Xuelin Chen, Daniel Cohen-Or, Baoquan Chen, Niloy J. Mitra

Abstract: In this paper, we leverage advances in neural networks towards forming a neural rendering for controllable image generation, and thereby bypassing the need for detailed modeling in conventional graphics pipeline. To this end, we present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models. NGP decomposes the image into a set o… ▽ More In this paper, we leverage advances in neural networks towards forming a neural rendering for controllable image generation, and thereby bypassing the need for detailed modeling in conventional graphics pipeline. To this end, we present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models. NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation. To form an image, NGP generates coarse 3D models that are fed into neural rendering modules to produce view-specific interpretable 2D maps, which are then composited into the final output image using a traditional image formation model. Our approach offers control over image generation by providing direct handles controlling illumination and camera parameters, in addition to control over shape and appearance variations. The key challenge is to learn these controls through unsupervised training that links generated coarse 3D models with unpaired real images via neural and traditional (e.g., Blinn- Phong) rendering functions, without establishing an explicit correspondence between them. We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes. We evaluate our hybrid modeling framework, compare with neural-only generation methods (namely, DCGAN, LSGAN, WGAN-GP, VON, and SRNs), report improvement in FID scores against real images, and demonstrate that NGP supports direct controls common in traditional forward rendering. Code is available at http://geometry.cs.ucl.ac.uk/projects/2021/ngp. △ Less

Submitted 22 February, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: Eurographics 2021

arXiv:2002.08988 [pdf, other]

BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

Authors: Thu Nguyen-Phuoc, Christian Richardt, Long Mai, Yong-Liang Yang, Niloy Mitra

Abstract: We present BlockGAN, an image generative model that learns object-aware 3D scene representations directly from unlabelled 2D images. Current work on scene representation learning either ignores scene background or treats the whole scene as one object. Meanwhile, work that considers scene compositionality treats scene objects only as image patches or 2D layers with alpha maps. Inspired by the compu… ▽ More We present BlockGAN, an image generative model that learns object-aware 3D scene representations directly from unlabelled 2D images. Current work on scene representation learning either ignores scene background or treats the whole scene as one object. Meanwhile, work that considers scene compositionality treats scene objects only as image patches or 2D layers with alpha maps. Inspired by the computer graphics pipeline, we design BlockGAN to learn to first generate 3D features of background and foreground objects, then combine them into 3D features for the wholes cene, and finally render them into realistic images. This allows BlockGAN to reason over occlusion and interaction between objects' appearance, such as shadow and lighting, and provides control over each object's 3D pose and identity, while maintaining image realism. BlockGAN is trained end-to-end, using only unlabelled single images, without the need for 3D geometry, pose labels, object masks, or multiple views of the same scene. Our experiments show that using explicit 3D features to represent objects allows BlockGAN to learn disentangled representations both in terms of objects (foreground and background) and their properties (pose and identity). △ Less

Submitted 2 December, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: For project page, see https://www.monkeyoverflow.com/#/blockgan/ Accepted to Conference on Neural Information Processing Systemsm, NeurIPS 2020

arXiv:2002.04706 [pdf, other]

Bayesian Nonparametric Cost-Effectiveness Analyses: Causal Estimation and Adaptive Subgroup Discovery

Authors: Arman Oganisian, Nandita Mitra, Jason Roy

Abstract: Cost-effectiveness analyses (CEAs) are at the center of health economic decision making. While these analyses help policy analysts and economists determine coverage, inform policy, and guide resource allocation, they are statistically challenging for several reasons. Cost and effectiveness are correlated and follow complex joint distributions which are difficult to capture parametrically. Effectiv… ▽ More Cost-effectiveness analyses (CEAs) are at the center of health economic decision making. While these analyses help policy analysts and economists determine coverage, inform policy, and guide resource allocation, they are statistically challenging for several reasons. Cost and effectiveness are correlated and follow complex joint distributions which are difficult to capture parametrically. Effectiveness (often measured as increased survival time) and accumulated cost tends to be right-censored in many applications. Moreover, CEAs are often conducted using observational data with non-random treatment assignment. Policy-relevant causal estimation therefore requires robust confounding control. Finally, current CEA methods do not address cost-effectiveness heterogeneity in a principled way - often presenting population-averaged estimates even though significant effect heterogeneity may exist. Motivated by these challenges, we develop a nonparametric Bayesian model for joint cost-survival distributions in the presence of censoring. Our approach utilizes a joint Enriched Dirichlet Process prior on the covariate effects of cost and survival time, while using a Gamma Process prior on the baseline survival time hazard. Causal CEA estimands, with policy-relevant interpretations, are identified and estimated via a Bayesian nonparametric g-computation procedure. Finally, we outline how the induced clustering of the Enriched Dirichlet Process can be used to adaptively detect presence of subgroups with different cost-effectiveness profiles. We outline an MCMC procedure for full posterior inference and evaluate frequentist properties via simulations. We use our model to assess the cost-efficacy of chemotherapy versus radiation adjuvant therapy for treating endometrial cancer in the SEER-Medicare database. △ Less

Submitted 8 September, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

arXiv:1912.04158 [pdf, other]

Learning a Neural 3D Texture Space from 2D Exemplars

Authors: Philipp Henzler, Niloy J. Mitra, Tobias Ritschel

Abstract: We propose a generative model of 2D and 3D natural textures with diversity, visual fidelity and at high computational efficiency. This is enabled by a family of methods that extend ideas from classic stochastic procedural texturing (Perlin noise) to learned, deep, non-linearities. The key idea is a hard-coded, tunable and differentiable step that feeds multiple transformed random 2D or 3D fields i… ▽ More We propose a generative model of 2D and 3D natural textures with diversity, visual fidelity and at high computational efficiency. This is enabled by a family of methods that extend ideas from classic stochastic procedural texturing (Perlin noise) to learned, deep, non-linearities. The key idea is a hard-coded, tunable and differentiable step that feeds multiple transformed random 2D or 3D fields into an MLP that can be sampled over infinite domains. Our model encodes all exemplars from a diverse set of textures without a need to be re-trained for each exemplar. Applications include texture interpolation, and learning 3D textures from 2D exemplars. △ Less

Submitted 2 April, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.00039 [pdf, other]

Net benefit separation and the determination curve: a probabilistic framework for cost-effectiveness estimation

Authors: Andrew J. Spieker, Nicholas Illenberger, Jason A. Roy, Nandita Mitra

Abstract: Considerations regarding clinical effectiveness and cost are essential in comparing the overall value of two treatments. There has been growing interest in methodology to integrate cost and effectiveness measures in order to inform policy and promote adequate resource allocation. The net monetary benefit aggregates information on differences in mean cost and clinical outcomes; the cost-effectivene… ▽ More Considerations regarding clinical effectiveness and cost are essential in comparing the overall value of two treatments. There has been growing interest in methodology to integrate cost and effectiveness measures in order to inform policy and promote adequate resource allocation. The net monetary benefit aggregates information on differences in mean cost and clinical outcomes; the cost-effectiveness acceptability curve was then developed to characterize the extent to which the strength of evidence regarding net monetary benefit changes with fluctuations in the willingness-to-pay threshold. Methods to derive insights from characteristics of the cost/clinical outcomes besides mean differences remain undeveloped but may also be informative. We propose a novel probabilistic measure of cost-effectiveness based on the stochastic ordering of the individual net benefit distribution under each treatment. Our approach is able to accommodate features frequently encountered in observational data including confounding and censoring, and complements the net monetary benefit in the insights it provides. We conduct a range of simulations to evaluate finite-sample performance and illustrate our proposed approach using simulated data based on a study of endometrial cancer patients. △ Less

Submitted 2 December, 2019; v1 submitted 29 November, 2019; originally announced December 2019.

Comments: 10 pages; 5 figures; 3 tables

arXiv:1911.11098 [pdf, other]

StructEdit: Learning Structural Shape Variations

Authors: Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

Abstract: Learning to encode differences in the geometry and (topological) structure of the shapes of ordinary objects is key to generating semantically plausible variations of a given shape, transferring edits from one shape to another, and many other applications in 3D content creation. The common approach of encoding shapes as points in a high-dimensional latent feature space suggests treating shape diff… ▽ More Learning to encode differences in the geometry and (topological) structure of the shapes of ordinary objects is key to generating semantically plausible variations of a given shape, transferring edits from one shape to another, and many other applications in 3D content creation. The common approach of encoding shapes as points in a high-dimensional latent feature space suggests treating shape differences as vectors in that space. Instead, we treat shape differences as primary objects in their own right and propose to encode them in their own latent space. In a setting where the shapes themselves are encoded in terms of fine-grained part hierarchies, we demonstrate that a separate encoding of shape deltas or differences provides a principled way to deal with inhomogeneities in the shape space due to different combinatorial part structures, while also allowing for compactness in the representation, as well as edit abstraction and transfer. Our approach is based on a conditional variational autoencoder for encoding and decoding shape deltas, conditioned on a source shape. We demonstrate the effectiveness and robustness of our approach in multiple shape modification and generation tasks, and provide comparison and ablation studies on the PartNet dataset, one of the largest publicly available 3D datasets. △ Less

Submitted 25 November, 2019; originally announced November 2019.

arXiv:1908.06217 [pdf, other]

Neural Re-Simulation for Generating Bounces in Single Images

Authors: Carlo Innamorati, Bryan Russell, Danny M. Kaufman, and Niloy J. Mitra

Abstract: We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image's environment. Given a starting trajectory, physically simulated with the estimated geometry of a single, static input image, we learn to 'correct' this trajectory to a visually plausible one via a neural network. The neural network can then be seen as learning to 'correct' t… ▽ More We introduce a method to generate videos of dynamic virtual objects plausibly interacting via collisions with a still image's environment. Given a starting trajectory, physically simulated with the estimated geometry of a single, static input image, we learn to 'correct' this trajectory to a visually plausible one via a neural network. The neural network can then be seen as learning to 'correct' traditional simulation output, generated with incomplete and imprecise world information, to obtain context-specific, visually plausible re-simulated output, a process we call neural re-simulation. We train our system on a set of 50k synthetic scenes where a virtual moving object (ball) has been physically simulated. We demonstrate our approach on both our synthetic dataset and a collection of real-life images depicting everyday scenes, obtaining consistent improvement over baseline alternatives throughout. △ Less

Submitted 24 August, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

Comments: Accepted to ICCV 2019

MSC Class: 68T45 ACM Class: I.4.9

arXiv:1908.00575 [pdf, other]

StructureNet: Hierarchical Graph Networks for 3D Shape Generation

Authors: Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

Abstract: The ability to generate novel, diverse, and realistic 3D shapes along with associated part semantics and structure is central to many applications requiring high-quality 3D assets or large volumes of realistic training data. A key challenge towards this goal is how to accommodate diverse shape variations, including both continuous deformations of parts as well as structural or discrete alterations… ▽ More The ability to generate novel, diverse, and realistic 3D shapes along with associated part semantics and structure is central to many applications requiring high-quality 3D assets or large volumes of realistic training data. A key challenge towards this goal is how to accommodate diverse shape variations, including both continuous deformations of parts as well as structural or discrete alterations which add to, remove from, or modify the shape constituents and compositional structure. Such object structure can typically be organized into a hierarchy of constituent object parts and relationships, represented as a hierarchy of n-ary graphs. We introduce StructureNet, a hierarchical graph network which (i) can directly encode shapes represented as such n-ary graphs; (ii) can be robustly trained on large and complex shape families; and (iii) can be used to generate a great diversity of realistic structured shape geometries. Technically, we accomplish this by drawing inspiration from recent advances in graph neural networks to propose an order-invariant encoding of n-ary graphs, considering jointly both part geometry and inter-part relations during network training. We extensively evaluate the quality of the learned latent spaces for various shape families and show significant advantages over baseline and competing methods. The learned latent spaces enable several structure-aware geometry processing applications, including shape generation and interpolation, shape editing, or shape structure discovery directly from un-annotated images, point clouds, or partial scans. △ Less

Submitted 1 August, 2019; originally announced August 2019.

Comments: Conditionally Accepted to Siggraph Asia 2019

arXiv:1907.00960 [pdf, other]

Going Deeper with Lean Point Networks

Authors: Eric-Tuan Le, Iasonas Kokkinos, Niloy J. Mitra

Abstract: In this work we introduce Lean Point Networks (LPNs) to train deeper and more accurate point processing networks by relying on three novel point processing blocks that improve memory consumption, inference time, and accuracy: a convolution-type block for point sets that blends neighborhood information in a memory-efficient manner; a crosslink block that efficiently shares information across low- a… ▽ More In this work we introduce Lean Point Networks (LPNs) to train deeper and more accurate point processing networks by relying on three novel point processing blocks that improve memory consumption, inference time, and accuracy: a convolution-type block for point sets that blends neighborhood information in a memory-efficient manner; a crosslink block that efficiently shares information across low- and high-resolution processing branches; and a multiresolution point cloud processing block for faster diffusion of information. By combining these blocks, we design wider and deeper point-based architectures. We report systematic accuracy and memory consumption improvements on multiple publicly available segmentation tasks by using our generic modules as drop-in replacements for the blocks of multiple architectures (PointNet++, DGCNN, SpiderNet, PointCNN). △ Less

Submitted 16 June, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: 16 pages, 11 figures, 9 tables

MSC Class: 68T45 ACM Class: I.2.10; I.3.0; I.4.8

arXiv:1905.10793 [pdf, other]

Unsupervised Intuitive Physics from Past Experiences

Authors: Sébastien Ehrhardt, Aron Monszpart, Niloy J. Mitra, Andrea Vedaldi

Abstract: We are interested in learning models of intuitive physics similar to the ones that animals use for navigation, manipulation and planning. In addition to learning general physical principles, however, we are also interested in learning ``on the fly'', from a few experiences, physical properties specific to new environments. We do all this in an unsupervised manner, using a meta-learning formulation… ▽ More We are interested in learning models of intuitive physics similar to the ones that animals use for navigation, manipulation and planning. In addition to learning general physical principles, however, we are also interested in learning ``on the fly'', from a few experiences, physical properties specific to new environments. We do all this in an unsupervised manner, using a meta-learning formulation where the goal is to predict videos containing demonstrations of physical phenomena, such as objects moving and colliding with a complex background. We introduce the idea of summarizing past experiences in a very compact manner, in our case using dynamic images, and show that this can be used to solve the problem well and efficiently. Empirically, we show via extensive experiments and ablation studies, that our model learns to perform physical predictions that generalize well in time and space, as well as to a variable number of interacting physical objects. △ Less

Submitted 26 May, 2019; originally announced May 2019.

Comments: Under review

arXiv:1904.00069 [pdf, other]

Unpaired Point Cloud Completion on Real Scans using Adversarial Training

Authors: Xuelin Chen, Baoquan Chen, Niloy J. Mitra

Abstract: As 3D scanning solutions become increasingly popular, several deep learning setups have been developed geared towards that task of scan completion, i.e., plausibly filling in regions there were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have bee… ▽ More As 3D scanning solutions become increasingly popular, several deep learning setups have been developed geared towards that task of scan completion, i.e., plausibly filling in regions there were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have been successfully demonstrated on synthetic data, the approaches cannot be directly used on real scans in absence of suitable paired training data. We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. We evaluate the approach qualitatively on several real-world datasets (ScanNet, Matterport, KITTI), quantitatively on 3D-EPN shape completion benchmark dataset, and demonstrate realistic completions under varying levels of incompleteness. △ Less

Submitted 23 February, 2020; v1 submitted 29 March, 2019; originally announced April 2019.

Comments: ICLR 2020

arXiv:1901.01060 [pdf, other]

PointCleanNet: Learning to Denoise and Remove Outliers from Dense Point Clouds

Authors: Marie-Julie Rakotosaona, Vittorio La Barbera, Paul Guerrero, Niloy J. Mitra, Maks Ovsjanikov

Abstract: Point clouds obtained with 3D scanners or by image-based reconstruction techniques are often corrupted with significant amount of noise and outliers. Traditional methods for point cloud denoising largely rely on local surface fitting (e.g., jets or MLS surfaces), local or non-local averaging, or on statistical assumptions about the underlying noise model. In contrast, we develop a simple data-driv… ▽ More Point clouds obtained with 3D scanners or by image-based reconstruction techniques are often corrupted with significant amount of noise and outliers. Traditional methods for point cloud denoising largely rely on local surface fitting (e.g., jets or MLS surfaces), local or non-local averaging, or on statistical assumptions about the underlying noise model. In contrast, we develop a simple data-driven method for removing outliers and reducing noise in unordered point clouds. We base our approach on a deep learning architecture adapted from PCPNet, which was recently proposed for estimating local 3D shape properties in point clouds. Our method first classifies and discards outlier samples, and then estimates correction vectors that project noisy points onto the original clean surfaces. The approach is efficient and robust to varying amounts of noise and outliers, while being able to handle large densely-sampled point clouds. In our extensive evaluation, both on synthesic and real data, we show an increased robustness to strong noise levels compared to various state-of-the-art methods, enabling accurate surface reconstruction from extremely noisy real data obtained by range scans. Finally, the simplicity and universality of our approach makes it very easy to integrate in any existing geometry processing pipeline. △ Less

Submitted 28 June, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

Journal ref: Computer Graphics Forum, 2020

arXiv:1811.11606 [pdf, other]

Esca** Plato's Cave: 3D Shape From Adversarial Rendering

Authors: Philipp Henzler, Niloy Mitra, Tobias Ritschel

Abstract: We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. The key idea is to train a deep neural network to generate 3D shapes which, when rendered to images, are indistinguishable from ground truth images (for a discriminator) u… ▽ More We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. The key idea is to train a deep neural network to generate 3D shapes which, when rendered to images, are indistinguishable from ground truth images (for a discriminator) under various camera poses. Discriminating 2D images instead of 3D shapes allows tap** into unstructured 2D photo collections instead of relying on curated (e.g., aligned, annotated, etc.) 3D data sets. To establish constraints between 2D image observation and their 3D interpretation, we suggest a family of rendering layers that are effectively differentiable. This family includes visual hull, absorption-only (akin to x-ray), and emission-absorption. We can successfully reconstruct 3D shapes from unstructured 2D images and extensively evaluate PlatonicGAN on a range of synthetic and real data sets achieving consistent improvements over baseline methods. We further show that PlatonicGAN can be combined with 3D supervision to improve on and in some cases even surpass the quality of 3D-supervised methods. △ Less

Submitted 10 June, 2021; v1 submitted 28 November, 2018; originally announced November 2018.

arXiv:1811.09567 [pdf, other]

How does Lipschitz Regularization Influence GAN Training?

Authors: Yipeng Qin, Niloy Mitra, Peter Wonka

Abstract: Despite the success of Lipschitz regularization in stabilizing GAN training, the exact reason of its effectiveness remains poorly understood. The direct effect of $K$-Lipschitz regularization is to restrict the $L2$-norm of the neural network gradient to be smaller than a threshold $K$ (e.g., $K=1$) such that $\|\nabla f\| \leq K$. In this work, we uncover an even more important effect of Lipschit… ▽ More Despite the success of Lipschitz regularization in stabilizing GAN training, the exact reason of its effectiveness remains poorly understood. The direct effect of $K$-Lipschitz regularization is to restrict the $L2$-norm of the neural network gradient to be smaller than a threshold $K$ (e.g., $K=1$) such that $\|\nabla f\| \leq K$. In this work, we uncover an even more important effect of Lipschitz regularization by examining its impact on the loss function: It degenerates GAN loss functions to almost linear ones by restricting their domain and interval of attainable gradient values. Our analysis shows that loss functions are only successful if they are degenerated to almost linear ones. We also show that loss functions perform poorly if they are not degenerated and that a wide range of functions can be used as loss function as long as they are sufficiently degenerated by regularization. Basically, Lipschitz regularization ensures that all loss functions effectively work in the same way. Empirically, we verify our proposition on the MNIST, CIFAR10 and CelebA datasets. △ Less

Submitted 25 August, 2020; v1 submitted 23 November, 2018; originally announced November 2018.

Comments: Accepted at ECCV 2020

arXiv:1810.09494 [pdf, other]

doi 10.1111/biom.13244

A Bayesian Nonparametric Model for Zero-Inflated Outcomes: Prediction, Clustering, and Causal Estimation

Authors: Arman Oganisian, Nandita Mitra, Jason Roy

Abstract: Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for co… ▽ More Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest - allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. Lastly, we use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy and radiation therapy in the SEER medicare database. △ Less

Submitted 9 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

arXiv:1808.08860 [pdf, other]

doi 10.1016/j.commatsci.2019.109154

Body centered phase of Cu at high temperature and pressure

Authors: Urmimala Dey, Nilanjan Mitra, A. Taraphder

Abstract: The existence of a body centered tetragonal phase of Cu has been investigated in this manuscript when the single crystal FCC Cu is subjected to both high pressure and high temperature. The results have been demonstrated through DFT calculations (which are typically done at 0 K) followed by Helmholtz free energy calculations (for high temperature). The new metastable phase of Cu demonstrates higher… ▽ More The existence of a body centered tetragonal phase of Cu has been investigated in this manuscript when the single crystal FCC Cu is subjected to both high pressure and high temperature. The results have been demonstrated through DFT calculations (which are typically done at 0 K) followed by Helmholtz free energy calculations (for high temperature). The new metastable phase of Cu demonstrates higher thermal conductivity compared to that of the FCC phase and thereby may be beneficial for high temperature engineering applications. △ Less

Submitted 27 August, 2018; originally announced August 2018.

Journal ref: Computational Materials Science 170 (2019) 109154

arXiv:1807.00687 [pdf]

Simplifying Urban Data Fusion with BigSUR

Authors: Tom Kelly, Niloy J. Mitra

Abstract: Our ability to understand data has always lagged behind our ability to collect it. This is particularly true in urban environments, where mass data capture is particularly valuable, but the objects captured are more varied, denser, and complex. To understand the structure and content of the environment, we must process the unstructured data to a structured form. BigSUR is an urban reconstruction a… ▽ More Our ability to understand data has always lagged behind our ability to collect it. This is particularly true in urban environments, where mass data capture is particularly valuable, but the objects captured are more varied, denser, and complex. To understand the structure and content of the environment, we must process the unstructured data to a structured form. BigSUR is an urban reconstruction algorithm which fuses GIS data, photogrammetric meshes, and street level photography, to create clean representative, semantically labelled, geometry. However, we have identified three problems with the system i) the street level photography is often difficult to acquire; ii) novel façade styles often frustrate the detection of windows and doors; iii) the computational requirements of the system are large, processing a large city block can take up to 15 hours. In this paper we describe the process of simplifying and validating the BigSUR semantic reconstruction system. In particular, the requirement for street level images is removed, and greedy post-process profile assignment is introduced to accelerate the system. We accomplish this by modifying the binary integer programming (BIP) optimization, and re-evaluating the effects of various parameters. The new variant of the system is evaluated over a variety of urban areas. We objectively measure mean squared error (MSE) terms over the unstructured geometry, showing that BigSUR is able to accurately recover omissions from the input meshes. Further, we evaluate the ability of the system to label the walls and roofs of input meshes, concluding that our new BigSUR variant achieves highly accurate semantic labelling with shorter computational time and less input data. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: Architecture_MPS under review

arXiv:1806.11335 [pdf, other]

Learning a Shared Shape Space for Multimodal Garment Design

Authors: Tuanfeng Y. Wang, Duygu Ceylan, Jovan Popovic, Niloy J. Mitra

Abstract: Designing real and virtual garments is becoming extremely demanding with rapidly changing fashion trends and increasing need for synthesizing realistic dressed digital humans for various applications. This necessitates creating simple and effective workflows to facilitate authoring sewing patterns customized to garment and target body shapes to achieve desired looks. Traditional workflow involves… ▽ More Designing real and virtual garments is becoming extremely demanding with rapidly changing fashion trends and increasing need for synthesizing realistic dressed digital humans for various applications. This necessitates creating simple and effective workflows to facilitate authoring sewing patterns customized to garment and target body shapes to achieve desired looks. Traditional workflow involves a trial-and-error procedure wherein a mannequin is draped to judge the resultant folds and the sewing pattern iteratively adjusted until the desired look is achieved. This requires time and experience. Instead, we present a data-driven approach wherein the user directly indicates desired fold patterns simply by sketching while our system estimates corresponding garment and body shape parameters at interactive rates. The recovered parameters can then be further edited and the updated draped garment previewed. Technically, we achieve this via a novel shared shape space that allows the user to seamlessly specify desired characteristics across multimodal input {\em without} requiring to run garment simulation at design time. We evaluate our approach qualitatively via a user study and quantitatively against test datasets, and demonstrate how our system can generate a rich quality of on-body garments targeted for a range of body shapes while achieving desired fold characteristics. △ Less

Submitted 14 July, 2018; v1 submitted 29 June, 2018; originally announced June 2018.

arXiv:1806.07889 [pdf, other]

doi 10.1145/3306346.3322961

iMapper: Interaction-guided Joint Scene and Human Motion Map** from Monocular Videos

Authors: Aron Monszpart, Paul Guerrero, Duygu Ceylan, Ersin Yumer, Niloy J. Mitra

Abstract: A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video. While the problem remains a subject of active research, concurrent advances have been made in the context of human pose reconstruction from monocular video, including image-space feature point detection and 3D pose recovery. These methods, however, sta… ▽ More A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video. While the problem remains a subject of active research, concurrent advances have been made in the context of human pose reconstruction from monocular video, including image-space feature point detection and 3D pose recovery. These methods, however, start to fail under moderate to heavy occlusion as the problem becomes severely under-constrained. We approach the problems differently. We observe that people interact similarly in similar scenes. Hence, we exploit the correlation between scene object arrangement and motions performed in that scene in both directions: first, typical motions performed when interacting with objects inform us about possible object arrangements; and second, object arrangements, in turn, constrain the possible motions. We present iMapper, a data-driven method that focuses on identifying human-object interactions, and jointly reasons about objects and human movement over space-time to recover both a plausible scene arrangement and consistent human interactions. We first introduce the notion of characteristic interactions as regions in space-time when an informative human-object interaction happens. This is followed by a novel occlusion-aware matching procedure that searches and aligns such characteristic snapshots from an interaction database to best explain the input monocular video. Through extensive evaluations, both quantitative and qualitative, we demonstrate that iMapper significantly improves performance over both dedicated state-of-the-art scene analysis and 3D human pose recovery approaches, especially under medium to heavy occlusion. △ Less

Submitted 20 June, 2018; originally announced June 2018.

Journal ref: Siggraph 2019

arXiv:1806.07179 [pdf, other]

doi 10.1145/3272127.3275065

FrankenGAN: Guided Detail Synthesis for Building Mass-Models Using Style-Synchonized GANs

Authors: Tom Kelly, Paul Guerrero, Anthony Steed, Peter Wonka, Niloy J. Mitra

Abstract: Coarse building mass models are now routinely generated at scales ranging from individual buildings through to whole cities. For example, they can be abstracted from raw measurements, generated procedurally, or created manually. However, these models typically lack any meaningful semantic or texture details, making them unsuitable for direct display. We introduce the problem of automatically and r… ▽ More Coarse building mass models are now routinely generated at scales ranging from individual buildings through to whole cities. For example, they can be abstracted from raw measurements, generated procedurally, or created manually. However, these models typically lack any meaningful semantic or texture details, making them unsuitable for direct display. We introduce the problem of automatically and realistically decorating such models by adding semantically consistent geometric details and textures. Building on the recent success of generative adversarial networks (GANs), we propose FrankenGAN, a cascade of GANs to create plausible details across multiple scales over large neighborhoods. The various GANs are synchronized to produce consistent style distributions over buildings and neighborhoods. We provide the user with direct control over the variability of the output. We allow her to interactively specify style via images and manipulate style-adapted sliders to control style variability. We demonstrate our system on several large-scale examples. The generated outputs are qualitatively evaluated via a set of user studies and are found to be realistic, semantically-plausible, and style-consistent. △ Less

Submitted 6 March, 2019; v1 submitted 19 June, 2018; originally announced June 2018.

Comments: project page: http://geometry.cs.ucl.ac.uk/projects/2018/frankengan/

Journal ref: ACM Trans. Graph. 2018, (37), 6

arXiv:1805.08634 [pdf, other]

Facade Segmentation in the Wild

Authors: John Femiani, Wamiq Reyaz Para, Niloy Mitra, Peter Wonka

Abstract: Urban facade segmentation from automatically acquired imagery, in contrast to traditional image segmentation, poses several unique challenges. 360-degree photospheres captured from vehicles are an effective way to capture a large number of images, but this data presents difficult-to-model war** and stitching artifacts. In addition, each pixel can belong to multiple facade elements, and different… ▽ More Urban facade segmentation from automatically acquired imagery, in contrast to traditional image segmentation, poses several unique challenges. 360-degree photospheres captured from vehicles are an effective way to capture a large number of images, but this data presents difficult-to-model war** and stitching artifacts. In addition, each pixel can belong to multiple facade elements, and different facade elements (e.g., window, balcony, sill, etc.) are correlated and vary wildly in their characteristics. In this paper, we propose three network architectures of varying complexity to achieve multilabel semantic segmentation of facade images while exploiting their unique characteristics. Specifically, we propose a MULTIFACSEGNET architecture to assign multiple labels to each pixel, a SEPARABLE architecture as a low-rank formulation that encourages extraction of rectangular elements, and a COMPATIBILITY network that simultaneously seeks segmentation across facade element types allowing the network to 'see' intermediate output probabilities of the various facade element classes. Our results on benchmark datasets show significant improvements over existing facade segmentation approaches for the typical facade elements. For example, on one commonly used dataset, the accuracy scores for window(the most important architectural element) increases from 0.91 to 0.97 percent compared to the best competing method, and comparable improvements on other element types. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: 16 pages, 7 figures

arXiv:1805.05086 [pdf, other]

Unsupervised Intuitive Physics from Visual Observations

Authors: Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi

Abstract: While learning models of intuitive physics is an increasingly active area of research, current approaches still fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test times. Some authors have relaxed such requirements by supplementing the model with an handcrafted physical si… ▽ More While learning models of intuitive physics is an increasingly active area of research, current approaches still fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test times. Some authors have relaxed such requirements by supplementing the model with an handcrafted physical simulator. Still, the resulting methods are unable to automatically learn new complex environments and to understand physical interactions within them. In this work, we demonstrated for the first time learning such predictors directly from raw visual observations and without relying on simulators. We do so in two steps: first, we learn to track mechanically-salient objects in videos using causality and equivariance, two unsupervised learning principles that do not require auto-encoding. Second, we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, ROLL4REAL, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that in all such cases it is possible to learn reliable extrapolators of the object trajectories from raw videos alone, without any form of external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture. △ Less

Submitted 29 March, 2019; v1 submitted 14 May, 2018; originally announced May 2018.

arXiv:1805.03106 [pdf, other]

Learning on the Edge: Explicit Boundary Handling in CNNs

Authors: Carlo Innamorati, Tobias Ritschel, Tim Weyrich, Niloy J. Mitra

Abstract: Convolutional neural networks (CNNs) handle the case where filters extend beyond the image boundary using several heuristics, such as zero, repeat or mean padding. These schemes are applied in an ad-hoc fashion and, being weakly related to the image content and oblivious of the target task, result in low output quality at the boundary. In this paper, we propose a simple and effective improvement t… ▽ More Convolutional neural networks (CNNs) handle the case where filters extend beyond the image boundary using several heuristics, such as zero, repeat or mean padding. These schemes are applied in an ad-hoc fashion and, being weakly related to the image content and oblivious of the target task, result in low output quality at the boundary. In this paper, we propose a simple and effective improvement that learns the boundary handling itself. At training-time, the network is provided with a separate set of explicit boundary filters. At testing-time, we use these filters which have learned to extrapolate features at the boundary in an optimal way for the specific task. Our extensive evaluation, over a wide range of architectural changes (variations of layers, feature channels, or both), shows how the explicit filters result in improved boundary handling. Consequently, we demonstrate an improvement of 5% to 20% across the board of typical CNN applications (colorization, de-Bayering, optical flow, and disparity estimation). △ Less

Submitted 8 May, 2018; originally announced May 2018.

ACM Class: I.4.0

arXiv:1712.09448 [pdf, other]

Taking Visual Motion Prediction To New Heightfields

Authors: Sebastien Ehrhardt, Aron Monszpart, Niloy Mitra, Andrea Vedaldi

Abstract: While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and estimating the associated parameters. In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted the relevant states, and then… ▽ More While the basic laws of Newtonian mechanics are well understood, explaining a physical scenario still requires manually modeling the problem with suitable equations and estimating the associated parameters. In order to be able to leverage the approximation capabilities of artificial intelligence techniques in such physics related contexts, researchers have handcrafted the relevant states, and then used neural networks to learn the state transitions using simulation runs as training data. Unfortunately, such approaches are unsuited for modeling complex real-world scenarios, where manually authoring relevant state spaces tend to be tedious and challenging. In this work, we investigate if neural networks can implicitly learn physical states of real-world mechanical processes only based on visual data while internally modeling non-homogeneous environment and in the process enable long-term physical extrapolation. We develop a recurrent neural network architecture for this task and also characterize resultant uncertainties in the form of evolving variance estimates. We evaluate our setup to extrapolate motion of rolling ball(s) on bowls of varying shape and orientation, and on arbitrary heightfields using only images as input. We report significant improvements over existing image-based methods both in terms of accuracy of predictions and complexity of scenarios; and report competitive performance with approaches that, unlike us, assume access to internal physical states. △ Less

Submitted 10 December, 2021; v1 submitted 22 December, 2017; originally announced December 2017.

Comments: arXiv admin note: text overlap with arXiv:1706.02179

arXiv:1710.10473 [pdf, other]

SeeThrough: Finding Chairs in Heavily Occluded Indoor Scene Images

Authors: Moos Hueting, Pradyumna Reddy, Vladimir Kim, Ersin Yumer, Nathan Carr, Niloy Mitra

Abstract: Discovering 3D arrangements of objects from single indoor images is important given its many applications including interior design, content creation, etc. Although heavily researched in the recent years, existing approaches break down under medium or heavy occlusion as the core object detection module starts failing in absence of directly visible cues. Instead, we take into account holistic conte… ▽ More Discovering 3D arrangements of objects from single indoor images is important given its many applications including interior design, content creation, etc. Although heavily researched in the recent years, existing approaches break down under medium or heavy occlusion as the core object detection module starts failing in absence of directly visible cues. Instead, we take into account holistic contextual 3D information, exploiting the fact that objects in indoor scenes co-occur mostly in typical near-regular configurations. First, we use a neural network trained on real indoor annotated images to extract 2D keypoints, and feed them to a 3D candidate object generation stage. Then, we solve a global selection problem among these 3D candidates using pairwise co-occurrence statistics discovered from a large 3D scene database. We iterate the process allowing for candidates with low keypoint response to be incrementally detected based on the location of the already discovered nearby objects. Focusing on chairs, we demonstrate significant performance improvement over combinations of state-of-the-art methods, especially for scenes with moderately to severely occluded objects. △ Less

Submitted 4 December, 2017; v1 submitted 28 October, 2017; originally announced October 2017.

Showing 51–100 of 158 results for author: Mitra, N