-
FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning
Authors:
Sofiane Bouaziz,
Hadjer Benmeziane,
Youcef Imine,
Leila Hamdad,
Smail Niar,
Hamza Ouarnoughi
Abstract:
Federated Learning (FL) has emerged as a promising Machine Learning paradigm, enabling multiple users to collaboratively train a shared model while preserving their local data. To minimize computing and communication costs associated with parameter transfer, it is common practice in FL to select a subset of clients in each training round. This selection must consider both system and static heterog…
▽ More
Federated Learning (FL) has emerged as a promising Machine Learning paradigm, enabling multiple users to collaboratively train a shared model while preserving their local data. To minimize computing and communication costs associated with parameter transfer, it is common practice in FL to select a subset of clients in each training round. This selection must consider both system and static heterogeneity. Therefore, we propose FLASH-RL, a framework that utilizes Double Deep QLearning (DDQL) to address both system and static heterogeneity in FL. FLASH-RL introduces a new reputation-based utility function to evaluate client contributions based on their current and past performances. Additionally, an adapted DDQL algorithm is proposed to expedite the learning process. Experimental results on MNIST and CIFAR-10 datasets have shown FLASH-RL's effectiveness in achieving a balanced trade-off between model performance and end-to-end latency against existing solutions. Indeed, FLASH-RL reduces latency by up to 24.83% compared to FedAVG and 24.67% compared to FAVOR. It also reduces the training rounds by up to 60.44% compared to FedAVG and +76% compared to FAVOR. In fall detection using the MobiAct dataset, FLASH-RL outperforms FedAVG by up to 2.82% in model's performance and reduces latency by up to 34.75%. Additionally, FLASH-RL achieves the target performance faster, with up to a 45.32% reduction in training rounds compared to FedAVG.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
SSDNeRF: Semantic Soft Decomposition of Neural Radiance Fields
Authors:
Siddhant Ranade,
Christoph Lassner,
Kai Li,
Christian Haene,
Shen-Chi Chen,
Jean-Charles Bazin,
Sofien Bouaziz
Abstract:
Neural Radiance Fields (NeRFs) encode the radiance in a scene parameterized by the scene's plenoptic function. This is achieved by using an MLP together with a map** to a higher-dimensional space, and has been proven to capture scenes with a great level of detail. Naturally, the same parameterization can be used to encode additional properties of the scene, beyond just its radiance. A particular…
▽ More
Neural Radiance Fields (NeRFs) encode the radiance in a scene parameterized by the scene's plenoptic function. This is achieved by using an MLP together with a map** to a higher-dimensional space, and has been proven to capture scenes with a great level of detail. Naturally, the same parameterization can be used to encode additional properties of the scene, beyond just its radiance. A particularly interesting property in this regard is the semantic decomposition of the scene. We introduce a novel technique for semantic soft decomposition of neural radiance fields (named SSDNeRF) which jointly encodes semantic signals in combination with radiance signals of a scene. Our approach provides a soft decomposition of the scene into semantic parts, enabling us to correctly encode multiple semantic classes blending along the same direction -- an impossible feat for existing methods. Not only does this lead to a detailed, 3D semantic representation of the scene, but we also show that the regularizing effects of the MLP used for encoding help to improve the semantic representation. We show state-of-the-art segmentation and reconstruction results on a dataset of common objects and demonstrate how the proposed approach can be applied for high quality temporally consistent video editing and re-compositing on a dataset of casually captured selfie videos.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Multiresolution Deep Implicit Functions for 3D Shape Representation
Authors:
Zhang Chen,
Yinda Zhang,
Kyle Genova,
Sean Fanello,
Sofien Bouaziz,
Christian Haene,
Ruofei Du,
Cem Keskin,
Thomas Funkhouser,
Danhang Tang
Abstract:
We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion. Our model represents a complex 3D shape with a hierarchy of latent grids, which can be decoded into different levels of detail and also achieve better accuracy. For shape completion, we propose late…
▽ More
We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion. Our model represents a complex 3D shape with a hierarchy of latent grids, which can be decoded into different levels of detail and also achieve better accuracy. For shape completion, we propose latent grid dropout to simulate partial data in the latent space and therefore defer the completing functionality to the decoder side. This along with our multires design significantly improves the shape completion quality under decoder-only latent optimization. To the best of our knowledge, MDIF is the first deep implicit function model that can at the same time (1) represent different levels of detail and allow progressive decoding; (2) support both encoder-decoder inference and decoder-only latent optimization, and fulfill multiple applications; (3) perform detailed decoder-only shape completion. Experiments demonstrate its superior performance against prior art in various 3D reconstruction tasks.
△ Less
Submitted 16 September, 2021; v1 submitted 12 September, 2021;
originally announced September 2021.
-
HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
Authors:
Keunhong Park,
Utkarsh Sinha,
Peter Hedman,
Jonathan T. Barron,
Sofien Bouaziz,
Dan B Goldman,
Ricardo Martin-Brualla,
Steven M. Seitz
Abstract:
Neural Radiance Fields (NeRF) are able to reconstruct scenes with unprecedented fidelity, and various recent works have extended NeRF to handle dynamic scenes. A common approach to reconstruct such non-rigid scenes is through the use of a learned deformation field map** from coordinates in each input image into a canonical template coordinate space. However, these deformation-based approaches st…
▽ More
Neural Radiance Fields (NeRF) are able to reconstruct scenes with unprecedented fidelity, and various recent works have extended NeRF to handle dynamic scenes. A common approach to reconstruct such non-rigid scenes is through the use of a learned deformation field map** from coordinates in each input image into a canonical template coordinate space. However, these deformation-based approaches struggle to model changes in topology, as topological changes require a discontinuity in the deformation field, but these deformation fields are necessarily continuous. We address this limitation by lifting NeRFs into a higher dimensional space, and by representing the 5D radiance field corresponding to each individual input image as a slice through this "hyper-space". Our method is inspired by level set methods, which model the evolution of surfaces as slices through a higher dimensional surface. We evaluate our method on two tasks: (i) interpolating smoothly between "moments", i.e., configurations of the scene, seen in the input images while maintaining visual plausibility, and (ii) novel-view synthesis at fixed moments. We show that our method, which we dub HyperNeRF, outperforms existing methods on both tasks. Compared to Nerfies, HyperNeRF reduces average error rates by 4.1% for interpolation and 8.6% for novel-view synthesis, as measured by LPIPS. Additional videos, results, and visualizations are available at https://hypernerf.github.io.
△ Less
Submitted 10 September, 2021; v1 submitted 24 June, 2021;
originally announced June 2021.
-
HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences
Authors:
Feitong Tan,
Danhang Tang,
Mingsong Dou,
Kaiwen Guo,
Rohit Pandey,
Cem Keskin,
Ruofei Du,
Deqing Sun,
Sofien Bouaziz,
Sean Fanello,
** Tan,
Yinda Zhang
Abstract:
In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a deep learning framework that maps each pixel to a fe…
▽ More
In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a deep learning framework that maps each pixel to a feature space, where the feature distances reflect the geodesic distances among pixels as if they were projected onto the surface of a 3D human scan. To this end, we introduce novel loss functions to push features apart according to their geodesic distances on the surface. Without any semantic annotation, the proposed embeddings automatically learn to differentiate visually similar parts and align different subjects into an unified feature space. Extensive experiments show that the learned embeddings can produce accurate correspondences between images with remarkable generalization capabilities on both intra and inter subjects.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Improved Detection of Face Presentation Attacks Using Image Decomposition
Authors:
Shlok Kumar Mishra,
Kuntal Sengupta,
Max Horowitz-Gelb,
Wen-Sheng Chu,
Sofien Bouaziz,
David Jacobs
Abstract:
Presentation attack detection (PAD) is a critical component in secure face authentication. We present a PAD algorithm to distinguish face spoofs generated by a photograph of a subject from live images. Our method uses an image decomposition network to extract albedo and normal. The domain gap between the real and spoof face images leads to easily identifiable differences, especially between the re…
▽ More
Presentation attack detection (PAD) is a critical component in secure face authentication. We present a PAD algorithm to distinguish face spoofs generated by a photograph of a subject from live images. Our method uses an image decomposition network to extract albedo and normal. The domain gap between the real and spoof face images leads to easily identifiable differences, especially between the recovered albedo maps. We enhance this domain gap by retraining existing methods using supervised contrastive loss. We present empirical and theoretical analysis that demonstrates that contrast and lighting effects can play a significant role in PAD; these show up, particularly in the recovered albedo. Finally, we demonstrate that by combining all of these methods we achieve state-of-the-art results on both intra-dataset testing for CelebA-Spoof, OULU, CASIA-SURF datasets and inter-dataset setting on SiW, CASIA-MFSD, Replay-Attack and MSU-MFSD datasets.
△ Less
Submitted 1 December, 2022; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Nerfies: Deformable Neural Radiance Fields
Authors:
Keunhong Park,
Utkarsh Sinha,
Jonathan T. Barron,
Sofien Bouaziz,
Dan B Goldman,
Steven M. Seitz,
Ricardo Martin-Brualla
Abstract:
We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local mini…
▽ More
We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. By adapting principles from geometry processing and physical simulation to NeRF-like models, we propose an elastic regularization of the deformation field that further improves robustness. We show that our method can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub "nerfies." We evaluate our method by collecting time-synchronized data using a rig with two mobile phones, yielding train/validation images of the same pose at different viewpoints. We show that our method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity.
△ Less
Submitted 9 September, 2021; v1 submitted 25 November, 2020;
originally announced November 2020.
-
GeLaTO: Generative Latent Textured Objects
Authors:
Ricardo Martin-Brualla,
Rohit Pandey,
Sofien Bouaziz,
Matthew Brown,
Dan B Goldman
Abstract:
Accurate modeling of 3D objects exhibiting transparency, reflections and thin structures is an extremely challenging problem. Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to enc…
▽ More
Accurate modeling of 3D objects exhibiting transparency, reflections and thin structures is an extremely challenging problem. Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance. To generate the proxies' textures, we learn a joint latent space allowing category-level appearance and geometry interpolation. The proxies are independently rasterized with their corresponding neural texture and composited using a U-Net, which generates an output photorealistic image including an alpha map. We demonstrate the effectiveness of our approach by reconstructing complex objects from a sparse set of views. We show results on a dataset of real images of eyeglasses frames, which are particularly challenging to reconstruct using classical methods. We also demonstrate that these coarse proxies can be handcrafted when the underlying object geometry is easy to model, like eyeglasses, or generated using a neural network for more complex categories, such as cars.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching
Authors:
Vladimir Tankovich,
Christian Häne,
Yinda Zhang,
Adarsh Kowdle,
Sean Fanello,
Sofien Bouaziz
Abstract:
This paper presents HITNet, a novel neural network architecture for real-time stereo matching. Contrary to many recent neural network approaches that operate on a full cost volume and rely on 3D convolutions, our approach does not explicitly build a volume and instead relies on a fast multi-resolution initialization step, differentiable 2D geometric propagation and war** mechanisms to infer disp…
▽ More
This paper presents HITNet, a novel neural network architecture for real-time stereo matching. Contrary to many recent neural network approaches that operate on a full cost volume and rely on 3D convolutions, our approach does not explicitly build a volume and instead relies on a fast multi-resolution initialization step, differentiable 2D geometric propagation and war** mechanisms to infer disparity hypotheses. To achieve a high level of accuracy, our network not only geometrically reasons about disparities but also infers slanted plane hypotheses allowing to more accurately perform geometric war** and upsampling operations. Our architecture is inherently multi-resolution allowing the propagation of information across different levels. Multiple experiments prove the effectiveness of the proposed approach at a fraction of the computation required by state-of-the-art methods. At the time of writing, HITNet ranks 1st-3rd on all the metrics published on the ETH3D website for two view stereo, ranks 1st on most of the metrics among all the end-to-end learning approaches on Middlebury-v3, ranks 1st on the popular KITTI 2012 and 2015 benchmarks among the published methods faster than 100ms.
△ Less
Submitted 19 January, 2023; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Deep Implicit Volume Compression
Authors:
Danhang Tang,
Saurabh Singh,
Philip A. Chou,
Christian Haene,
Mingsong Dou,
Sean Fanello,
Jonathan Taylor,
Philip Davidson,
Onur G. Guleryuz,
Yinda Zhang,
Shahram Izadi,
Andrea Tagliasacchi,
Sofien Bouaziz,
Cem Keskin
Abstract:
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly compress the signs of the TSDF, which also upper bo…
▽ More
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly compress the signs of the TSDF, which also upper bounds the reconstruction error by the voxel size. To compress the corresponding texture, we designed a fast block-based UV parameterization, generating coherent texture maps that can be effectively compressed using existing video compression algorithms. We demonstrate the performance of our algorithms on two 4D performance capture datasets, reducing bitrate by 66% for the same distortion, or alternatively reducing the distortion by 50% for the same bitrate, compared to the state-of-the-art.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation
Authors:
Hossam Isack,
Christian Haene,
Cem Keskin,
Sofien Bouaziz,
Yuri Boykov,
Shahram Izadi,
Sameh Khamis
Abstract:
We propose a novel efficient and lightweight model for human pose estimation from a single image. Our model is designed to achieve competitive results at a fraction of the number of parameters and computational cost of various state-of-the-art methods. To this end, we explicitly incorporate part-based structural and geometric priors in a hierarchical prediction framework. At the coarsest resolutio…
▽ More
We propose a novel efficient and lightweight model for human pose estimation from a single image. Our model is designed to achieve competitive results at a fraction of the number of parameters and computational cost of various state-of-the-art methods. To this end, we explicitly incorporate part-based structural and geometric priors in a hierarchical prediction framework. At the coarsest resolution, and in a manner similar to classical part-based approaches, we leverage the kinematic structure of the human body to propagate convolutional feature updates between the keypoints or body parts. Unlike classical approaches, we adopt end-to-end training to learn this geometric prior through feature updates from data. We then propagate the feature representation at the coarsest resolution up the hierarchy to refine the predicted pose in a coarse-to-fine fashion. The final network effectively models the geometric prior and intuition within a lightweight deep neural network, yielding state-of-the-art results for a model of this size on two standard datasets, Leeds Sports Pose and MPII Human Pose.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
CvxNet: Learnable Convex Decomposition
Authors:
Boyang Deng,
Kyle Genova,
Soroosh Yazdani,
Sofien Bouaziz,
Geoffrey Hinton,
Andrea Tagliasacchi
Abstract:
Any solid object can be decomposed into a collection of convex polytopes (in short, convexes). When a small number of convexes are used, such a decomposition can be thought of as a piece-wise approximation of the geometry. This decomposition is fundamental in computer graphics, where it provides one of the most common ways to approximate geometry, for example, in real-time physics simulation. A co…
▽ More
Any solid object can be decomposed into a collection of convex polytopes (in short, convexes). When a small number of convexes are used, such a decomposition can be thought of as a piece-wise approximation of the geometry. This decomposition is fundamental in computer graphics, where it provides one of the most common ways to approximate geometry, for example, in real-time physics simulation. A convex object also has the property of being simultaneously an explicit and implicit representation: one can interpret it explicitly as a mesh derived by computing the vertices of a convex hull, or implicitly as the collection of half-space constraints or support functions. Their implicit representation makes them particularly well suited for neural network training, as they abstract away from the topology of the geometry they need to represent. However, at testing time, convexes can also generate explicit representations -- polygonal meshes -- which can then be used in any downstream application. We introduce a network architecture to represent a low dimensional family of convexes. This family is automatically derived via an auto-encoding process. We investigate the applications of this architecture including automatic convex decomposition, image to 3D reconstruction, and part-based shape retrieval.
△ Less
Submitted 12 April, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Multiview Aggregation for Learning Category-Specific Shape Reconstruction
Authors:
Srinath Sridhar,
Davis Rempe,
Julien Valentin,
Sofien Bouaziz,
Leonidas J. Guibas
Abstract:
We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations, or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of i…
▽ More
We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations, or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views. Given a single input view of an object instance, we propose a representation that encodes the dense shape of the visible object surface as well as the surface behind line of sight occluded by the visible surface. When multiple input views are available, the shape representation is designed to be aggregated into a single 3D shape using an inexpensive union operation. We train a 2D CNN to learn to predict this representation from a variable number of views (1 or more). We further aggregate multiview information by using permutation equivariant layers that promote order-agnostic view information exchange at the feature level. Experiments show that our approach is able to produce dense 3D reconstructions of objects that improve in quality as more views are added.
△ Less
Submitted 8 December, 2019; v1 submitted 1 July, 2019;
originally announced July 2019.
-
VIPER: Volume Invariant Position-based Elastic Rods
Authors:
Baptiste Angles,
Daniel Rebain,
Miles Macklin,
Brian Wyvill,
Loic Barthe,
JP Lewis,
Javier von der Pahlen,
Shahram Izadi,
Julien Valentin,
Sofien Bouaziz,
Andrea Tagliasacchi
Abstract:
We extend the formulation of position-based rods to include elastic volumetric deformations. We achieve this by introducing an additional degree of freedom per vertex -- isotropic scale (and its velocity). Including scale enriches the space of possible deformations, allowing the simulation of volumetric effects, such as a reduction in cross-sectional area when a rod is stretched. We rigorously der…
▽ More
We extend the formulation of position-based rods to include elastic volumetric deformations. We achieve this by introducing an additional degree of freedom per vertex -- isotropic scale (and its velocity). Including scale enriches the space of possible deformations, allowing the simulation of volumetric effects, such as a reduction in cross-sectional area when a rod is stretched. We rigorously derive the continuous formulation of its elastic energy potentials, and hence its associated position-based dynamics (PBD) updates to realize this model, enabling the simulation of up to 26000 DOFs at 140 Hz in our GPU implementation. We further show how rods can provide a compact alternative to tetrahedral meshes for the representation of complex muscle deformations, as well as providing a convenient representation for collision detection. This is achieved by modeling a muscle as a bundle of rods, for which we also introduce a technique to automatically convert a muscle surface mesh into a rods-bundle. Finally, we show how rods and/or bundles can be skinned to a surface mesh to drive its deformation, resulting in an alternative to cages for real-time volumetric deformation.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Towards Real-time Simulation of Hyperelastic Materials
Authors:
Tiantian Liu,
Sofien Bouaziz,
Ladislav Kavan
Abstract:
We present a new method for real-time physics-based simulation supporting many different types of hyperelastic materials. Previous methods such as Position Based or Projective Dynamics are fast, but support only limited selection of materials; even classical materials such as the Neo-Hookean elasticity are not supported. Recently, Xu et al. [2015] introduced new "spline-based materials" which can…
▽ More
We present a new method for real-time physics-based simulation supporting many different types of hyperelastic materials. Previous methods such as Position Based or Projective Dynamics are fast, but support only limited selection of materials; even classical materials such as the Neo-Hookean elasticity are not supported. Recently, Xu et al. [2015] introduced new "spline-based materials" which can be easily controlled by artists to achieve desired animation effects. Simulation of these types of materials currently relies on Newton's method, which is slow, even with only one iteration per timestep. In this paper, we show that Projective Dynamics can be interpreted as a quasi-Newton method. This insight enables very efficient simulation of a large class of hyperelastic materials, including the Neo-Hookean, spline-based materials, and others. The quasi-Newton interpretation also allows us to leverage ideas from numerical optimization. In particular, we show that our solver can be further accelerated using L-BFGS updates (Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm). Our final method is typically more than 10 times faster than one iteration of Newton's method without compromising quality. In fact, our result is often more accurate than the result obtained with one iteration of Newton's method. Our method is also easier to implement, implying reduced software development costs.
△ Less
Submitted 25 April, 2016;
originally announced April 2016.