-
Evaluating Alternatives to SFM Point Cloud Initialization for Gaussian Splatting
Authors:
Yalda Foroutan,
Daniel Rebain,
Kwang Moo Yi,
Andrea Tagliasacchi
Abstract:
3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome.…
▽ More
3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome. To this end, we investigate various initialization strategies for Gaussian Splatting and delve into how volumetric reconstructions from Neural Radiance Fields (NeRF) can be utilized to bypass the dependency on SFM data. Our findings demonstrate that random initialization can perform much better if carefully designed and that by employing a combination of improved initialization strategies and structure distillation from low-cost NeRF models, it is possible to achieve equivalent results, or at times even superior, to those obtained from SFM initialization. Source code is available at https://theialab.github.io/nerf-3dgs .
△ Less
Submitted 23 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Base Layer Efficiency in Scalable Human-Machine Coding
Authors:
Yalda Foroutan,
Alon Harell,
Anderson de Andrade,
Ivan V. Bajić
Abstract:
A basic premise in scalable human-machine coding is that the base layer is intended for automated machine analysis and is therefore more compressible than the same content would be for human viewing. Use cases for such coding include video surveillance and traffic monitoring, where the majority of the content will never be seen by humans. Therefore, base layer efficiency is of paramount importance…
▽ More
A basic premise in scalable human-machine coding is that the base layer is intended for automated machine analysis and is therefore more compressible than the same content would be for human viewing. Use cases for such coding include video surveillance and traffic monitoring, where the majority of the content will never be seen by humans. Therefore, base layer efficiency is of paramount importance because the system would most frequently operate at the base-layer rate. In this paper, we analyze the coding efficiency of the base layer in a state-of-the-art scalable human-machine image codec, and show that it can be improved. In particular, we demonstrate that gains of 20-40% in BD-Rate compared to the currently best results on object detection and instance segmentation are possible.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Rate-Distortion Theory in Coding for Machines and its Application
Authors:
Alon Harell,
Yalda Foroutan,
Nilesh Ahuja,
Parual Datta,
Bhavya Kanzariya,
V. Srinivasa Somayaulu,
Omesh Tickoo,
Anderson de Andrade,
Ivan V. Bajic
Abstract:
Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of images and video. As a result, a growing need for efficient compression methods optimized for machine vision, rather than human vision, has emerged. To meet this growing demand, several methods have been developed for image and video coding for machines. Unfortunately, while there is a…
▽ More
Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of images and video. As a result, a growing need for efficient compression methods optimized for machine vision, rather than human vision, has emerged. To meet this growing demand, several methods have been developed for image and video coding for machines. Unfortunately, while there is a substantial body of knowledge regarding rate-distortion theory for human vision, the same cannot be said of machine analysis. In this paper, we extend the current rate-distortion theory for machines, providing insight into important design considerations of machine-vision codecs. We then utilize this newfound understanding to improve several methods for learnable image coding for machines. Our proposed methods achieve state-of-the-art rate-distortion performance on several computer vision tasks such as classification, instance segmentation, and object detection.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
VVC+M: Plug and Play Scalable Image Coding for Humans and Machines
Authors:
Alon Harell,
Yalda Foroutan,
Ivan V. Bajic
Abstract:
Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results…
▽ More
Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results in sub-optimal rate-distortion (RD) performance for the machine side. We focus on the case of images, proposing to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable codec from any image compression for machines (ICM) scheme. Using our approach we improve an existing scalable codec to achieve superior RD performance on the machine task, while remaining competitive for human perception. Moreover, our approach can be trained post-hoc for any given ICM scheme, and without creating a coupling between the quality of the machine analysis and human vision.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Conditional and Residual Methods in Scalable Coding for Humans and Machines
Authors:
Anderson de Andrade,
Alon Harell,
Yalda Foroutan,
Ivan V. Bajić
Abstract:
We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional codin…
▽ More
We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.
△ Less
Submitted 4 July, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Control of Computer Pointer Using Hand Gesture Recognition in Motion Pictures
Authors:
Yalda Foroutan,
Ahmad Kalhor,
Saeid Mohammadi Nejati,
Samad Sheikhaei
Abstract:
This paper presents a user interface designed to enable computer cursor control through hand detection and gesture classification. A comprehensive hand dataset comprising 6720 image samples was collected, encompassing four distinct classes: fist, palm, pointing to the left, and pointing to the right. The images were captured from 15 individuals in various settings, including simple backgrounds wit…
▽ More
This paper presents a user interface designed to enable computer cursor control through hand detection and gesture classification. A comprehensive hand dataset comprising 6720 image samples was collected, encompassing four distinct classes: fist, palm, pointing to the left, and pointing to the right. The images were captured from 15 individuals in various settings, including simple backgrounds with different perspectives and lighting conditions. A convolutional neural network (CNN) was trained on this dataset to accurately predict labels for each captured image and measure their similarity. The system incorporates defined commands for cursor movement, left-click, and right-click actions. Experimental results indicate that the proposed algorithm achieves a remarkable accuracy of 91.88% and demonstrates its potential applicability across diverse backgrounds.
△ Less
Submitted 9 June, 2023; v1 submitted 24 December, 2020;
originally announced December 2020.