-
Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference
Authors:
Fred Hohman,
Chaoqun Wang,
**mook Lee,
Jochen Görtler,
Dominik Moritz,
Jeffrey P Bigham,
Zhile Ren,
Cecile Foret,
Qi Shan,
Xiaoyi Zhang
Abstract:
On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML mo…
▽ More
On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D
Authors:
Pengsheng Guo,
Hans Hao,
Adam Caccavale,
Zhongzheng Ren,
Edward Zhang,
Qi Shan,
Aditya Sankar,
Alexander G. Schwing,
Alex Colburn,
Fangchang Ma
Abstract:
In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the…
▽ More
In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the diffusion network, and the 3D model representation. To overcome these limitations, we present StableDreamer, a methodology incorporating three advances. First, inspired by InstructNeRF2NeRF, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. This finding provides a novel tool to debug SDS, which we use to show the impact of time-annealing noise levels on reducing multi-faced geometries. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition. Based on this observation, StableDreamer introduces a two-stage training strategy that effectively combines these aspects, resulting in high-fidelity 3D models. Third, we adopt an anisotropic 3D Gaussians representation, replacing Neural Radiance Fields (NeRFs), to enhance the overall quality, reduce memory usage during training, and accelerate rendering speeds, and better capture semi-transparent objects. StableDreamer reduces multi-face geometries, generates fine details, and converges stably.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
UPSCALE: Unconstrained Channel Pruning
Authors:
Alvin Wan,
Hanxiang Hao,
Kaushik Patnaik,
Yueyang Xu,
Omer Hadad,
David Güera,
Zhile Ren,
Qi Shan
Abstract:
As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower th…
▽ More
As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower than the unpruned model. As a workaround, pruners conventionally constrain certain channels to be pruned together. This fully eliminates memory copies but, as we show, significantly impairs accuracy. We now have a dilemma: Remove constraints but increase latency, or add constraints and impair accuracy. In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints. Using this insight, we design a generic algorithm UPSCALE to prune models with any pruning pattern. By removing constraints from existing pruners, we improve ImageNet accuracy for post-training pruned models by 2.1 points on average -- benefiting DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Furthermore, by reordering channels, UPSCALE improves inference speeds by up to 2x over a baseline export.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion
Authors:
Ziya Erkoç,
Fangchang Ma,
Qi Shan,
Matthias Nießner,
Angela Dai
Abstract:
Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize…
▽ More
Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize new data. To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. HyperDiffusion operates directly on MLP weights and generates new neural implicit fields encoded by synthesized MLP parameters. Specifically, a collection of MLPs is first optimized to faithfully represent individual data samples. Subsequently, a diffusion process is trained in this MLP weight space to model the underlying distribution of neural implicit fields. HyperDiffusion enables diffusion modeling over a implicit, compact, and yet high-fidelity representation of complex signals across 3D shapes and 4D mesh animations within one single unified framework.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Highly dynamic locomotion control of biped robot enhanced by swing arms
Authors:
Weijie Wang,
Song Liu,
Qinfeng Shan,
Lihao Jia
Abstract:
Swing arms have an irreplaceable role in promoting highly dynamic locomotion on bipedal robots by a larger angular momentum control space from the viewpoint of biomechanics. Few bipedal robots utilize swing arms and its redundancy characteristic of multiple degrees of freedom due to the lack of appropriate locomotion control strategies to perfectly integrate modeling and control. This paper presen…
▽ More
Swing arms have an irreplaceable role in promoting highly dynamic locomotion on bipedal robots by a larger angular momentum control space from the viewpoint of biomechanics. Few bipedal robots utilize swing arms and its redundancy characteristic of multiple degrees of freedom due to the lack of appropriate locomotion control strategies to perfectly integrate modeling and control. This paper presents a kind of control strategy by modeling the bipedal robot as a flywheel-spring loaded inverted pendulum (F-SLIP) to extract characteristics of swing arms and using the whole-body controller (WBC) to achieve these characteristics, and also proposes a evaluation system including three aspects of agility defined by us, stability and energy consumption for the highly dynamic locomotion of bipedal robots. We design several sets of simulation experiments and analyze the effects of swing arms according to the evaluation system during the jum** motion of Purple (Purple energy rises in the east)V1.0, a kind of bipedal robot designed to test high explosive locomotion. Results show that Purple's agility is increased by more than 10 percent, stabilization time is reduced by a factor of two, and energy consumption is reduced by more than 20 percent after introducing swing arms.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
PointConvFormer: Revenge of the Point-based Convolution
Authors:
Wenxuan Wu,
Li Fuxin,
Qi Shan
Abstract:
We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. In PointConvFormer, attention computed from feature difference between points in the neighbor…
▽ More
We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. In PointConvFormer, attention computed from feature difference between points in the neighborhood is used to modify the convolutional weights at each point. Hence, we preserved the invariances from point convolution, whereas attention helps to select relevant points in the neighborhood for convolution. PointConvFormer is suitable for multiple tasks that require details at the point level, such as segmentation and scene flow estimation tasks. We experiment on both tasks with multiple datasets including ScanNet, SemanticKitti, FlyingThings3D and KITTI. Our results show that PointConvFormer offers a better accuracy-speed tradeoff than classic convolutions, regular transformers, and voxelized sparse convolution approaches. Visualizations show that PointConvFormer performs similarly to convolution on flat areas, whereas the neighborhood selection effect is stronger on object boundaries, showing that it has got the best of both worlds.
△ Less
Submitted 10 May, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Network resampling for estimating uncertainty
Authors:
Qianhua Shan,
Elizaveta Levina
Abstract:
With network data becoming ubiquitous in many applications, many models and algorithms for network analysis have been proposed. Yet methods for providing uncertainty estimates in addition to point estimates of network parameters are much less common. While bootstrap and other resampling procedures have been an effective general tool for estimating uncertainty from i.i.d. samples, adapting them to…
▽ More
With network data becoming ubiquitous in many applications, many models and algorithms for network analysis have been proposed. Yet methods for providing uncertainty estimates in addition to point estimates of network parameters are much less common. While bootstrap and other resampling procedures have been an effective general tool for estimating uncertainty from i.i.d. samples, adapting them to networks is highly nontrivial. In this work, we study three different network resampling procedures for uncertainty estimation, and propose a general algorithm to construct confidence intervals for network parameters through network resampling. We also propose an algorithm for selecting the sampling fraction, which has a substantial effect on performance. We find that, unsurprisingly, no one procedure is empirically best for all tasks, but that selecting an appropriate sampling fraction substantially improves performance in many cases. We illustrate this on simulated networks and on Facebook data.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction
Authors:
Zhenpei Yang,
Zhile Ren,
Miguel Angel Bautista,
Zaiwei Zhang,
Qi Shan,
Qixing Huang
Abstract:
Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy i…
▽ More
Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses. The core of our approach is a fast and robust multi-view reconstruction algorithm to jointly refine 3D geometry and camera pose estimation using learnable neural network modules. We provide a thorough benchmark of state-of-the-art approaches for this problem on ShapeNet. Our approach achieves best-in-class results. It is also two orders of magnitude faster than the recent optimization-based approach IDR. Our code is released at \url{https://github.com/zhenpeiyang/FvOR/}
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Texturify: Generating Textures on 3D Shape Surfaces
Authors:
Yawar Siddiqui,
Justus Thies,
Fangchang Ma,
Qi Shan,
Matthias Nießner,
Angela Dai
Abstract:
Texture cues on 3D objects are key to compelling visual representations, with the possibility to create high visual fidelity with inherent spatial consistency across different views. Since the availability of textured 3D shapes remains very limited, learning a 3D-supervised data-driven method that predicts a texture based on the 3D input is very challenging. We thus propose Texturify, a GAN-based…
▽ More
Texture cues on 3D objects are key to compelling visual representations, with the possibility to create high visual fidelity with inherent spatial consistency across different views. Since the availability of textured 3D shapes remains very limited, learning a 3D-supervised data-driven method that predicts a texture based on the 3D input is very challenging. We thus propose Texturify, a GAN-based method that leverages a 3D shape dataset of an object class and learns to reproduce the distribution of appearances observed in real images by generating high-quality textures. In particular, our method does not require any 3D color supervision or correspondence between shape geometry and images to learn the texturing of 3D objects. Texturify operates directly on the surface of the 3D objects by introducing face convolutional operators on a hierarchical 4-RoSy parametrization to generate plausible object-specific textures. Employing differentiable rendering and adversarial losses that critique individual views and consistency across views, we effectively learn the high-quality surface texturing distribution from real-world images. Experiments on car and chair shape collections show that our approach outperforms state of the art by an average of 22% in FID score.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration
Authors:
Yichi Qian,
Qiyi Shan,
Hanjia Lyu,
Jiebo Luo
Abstract:
The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose information operations such as censorship/suspension on…
▽ More
The emergence of social media has largely eased the way people receive information and participate in public discussions. However, in countries with strict regulations on discussions in the public space, social media is no exception. To limit the degree of dissent or inhibit the spread of "harmful" information, a common approach is to impose information operations such as censorship/suspension on social media. In this paper, we focus on a study of censorship on Weibo, the counterpart of Twitter in China. Specifically, we 1) create a web-scra** pipeline and collect a large dataset solely focus on the reposts from Weibo; 2) discover the characteristics of users whose reposts contain censored information, in terms of gender, device, and account type; and 3) conduct a thematic analysis by extracting and analyzing topic information. Note that although the original posts are no longer visible, we can use comments users wrote when reposting the original post to infer the topic of the original content. We find that such efforts can recover the discussions around social events that triggered massive discussions but were later muted. Further, we show the variations of inferred topics across different user groups and time frames.
△ Less
Submitted 23 July, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Fast and Explicit Neural View Synthesis
Authors:
Pengsheng Guo,
Miguel Angel Bautista,
Alex Colburn,
Liang Yang,
Daniel Ulbricht,
Joshua M. Susskind,
Qi Shan
Abstract:
We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radian…
▽ More
We study the problem of novel view synthesis from sparse source observations of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. Our approach explicitly encodes observations into a volumetric representation that enables amortized rendering. We demonstrate that although continuous radiance field representations have gained a lot of attention due to their expressive power, our simple approach obtains comparable or even better novel view reconstruction quality comparing with state-of-the-art baselines while increasing rendering speed by over 400x. Our model is trained in a category-agnostic manner and does not require scene-specific optimization. Therefore, it is able to generalize novel view synthesis to object categories not seen during training. In addition, we show that with our simple formulation, we can use view synthesis as a self-supervision signal for efficient learning of 3D geometry without explicit 3D supervision.
△ Less
Submitted 8 December, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Solutions of the Dirac equation in one fixed and one moving wall well
Authors:
Qiuyu Shan
Abstract:
It is very important which the Hamiltonian of the quantum system is time changing, especially the potential well that its width can change, the schrodinger equation and klein Gordon equation of this kind of circumstance are solved by some studies, but the Dirac equation haven't be solved, so this article discussed the solution of the Dirac equation in this kind of circumstance.
It is very important which the Hamiltonian of the quantum system is time changing, especially the potential well that its width can change, the schrodinger equation and klein Gordon equation of this kind of circumstance are solved by some studies, but the Dirac equation haven't be solved, so this article discussed the solution of the Dirac equation in this kind of circumstance.
△ Less
Submitted 31 March, 2024; v1 submitted 26 June, 2021;
originally announced July 2021.
-
MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions
Authors:
Zhenpei Yang,
Zhile Ren,
Qi Shan,
Qixing Huang
Abstract:
Deep learning has made significant impacts on multi-view stereo systems. State-of-the-art approaches typically involve building a cost volume, followed by multiple 3D convolution operations to recover the input image's pixel-wise depth. While such end-to-end learning of plane-swee** stereo advances public benchmarks' accuracy, they are typically very slow to compute. We present \ouralg, a highly…
▽ More
Deep learning has made significant impacts on multi-view stereo systems. State-of-the-art approaches typically involve building a cost volume, followed by multiple 3D convolution operations to recover the input image's pixel-wise depth. While such end-to-end learning of plane-swee** stereo advances public benchmarks' accuracy, they are typically very slow to compute. We present \ouralg, a highly efficient multi-view stereo algorithm that seamlessly integrates multi-view constraints into single-view networks via an attention mechanism. Since \ouralg only builds on 2D convolutions, it is at least $2\times$ faster than all the notable counterparts. Moreover, our algorithm produces precise depth estimations and 3D reconstructions, achieving state-of-the-art results on challenging benchmarks ScanNet, SUN3D, RGBD, and the classical DTU dataset. our algorithm also out-performs all other algorithms in the setting of inexact camera poses. Our code is released at \url{https://github.com/zhenpeiyang/MVS2D}
△ Less
Submitted 11 December, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
RetrievalFuse: Neural 3D Scene Reconstruction with a Database
Authors:
Yawar Siddiqui,
Justus Thies,
Fangchang Ma,
Qi Shan,
Matthias Nießner,
Angela Dai
Abstract:
3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scen…
▽ More
3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scene geometry from the training database. First, we learn to synthesize an initial estimate for a 3D scene, constructed by retrieving a top-k set of volumetric chunks from the scene database. These candidates are then refined to a final scene generation with an attention-based refinement that can effectively select the most consistent set of geometry from the candidates and combine them together to create an output scene, facilitating transfer of coherent structures and local detail from train scene geometry. We demonstrate our neural scene reconstruction with a database for the tasks of 3D super resolution and surface reconstruction from sparse point clouds, showing that our approach enables generation of more coherent, accurate 3D scenes, improving on average by over 8% in IoU over state-of-the-art scene reconstruction.
△ Less
Submitted 10 August, 2021; v1 submitted 31 March, 2021;
originally announced April 2021.
-
Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels
Authors:
Xiaoyi Zhang,
Lilian de Greef,
Amanda Swearngin,
Samuel White,
Kyle Murray,
Lisa Yu,
Qi Shan,
Jeffrey Nichols,
Jason Wu,
Chris Fleizach,
Aaron Everitt,
Jeffrey P. Bigham
Abstract:
Many accessibility features available on mobile platforms require applications (apps) to provide complete and accurate metadata describing user interface (UI) components. Unfortunately, many apps do not provide sufficient metadata for accessibility features to work as expected. In this paper, we explore inferring accessibility metadata for mobile apps from their pixels, as the visual interfaces of…
▽ More
Many accessibility features available on mobile platforms require applications (apps) to provide complete and accurate metadata describing user interface (UI) components. Unfortunately, many apps do not provide sufficient metadata for accessibility features to work as expected. In this paper, we explore inferring accessibility metadata for mobile apps from their pixels, as the visual interfaces often best reflect an app's full functionality. We trained a robust, fast, memory-efficient, on-device model to detect UI elements using a dataset of 77,637 screens (from 4,068 iPhone apps) that we collected and annotated. To further improve UI detections and add semantic information, we introduced heuristics (e.g., UI grou** and ordering) and additional models (e.g., recognize UI content, state, interactivity). We built Screen Recognition to generate accessibility metadata to augment iOS VoiceOver. In a study with 9 screen reader users, we validated that our approach improves the accessibility of existing mobile apps, enabling even previously inaccessible apps to be used.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Equivariant Neural Rendering
Authors:
Emilien Dupont,
Miguel Angel Bautista,
Alex Colburn,
Aditya Sankar,
Carlos Guestrin,
Josh Susskind,
Qi Shan
Abstract:
We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer…
▽ More
We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference. In addition, we introduce two challenging new datasets for scene representation and neural rendering, including scenes with complex lighting and backgrounds. Through experiments, we show that our model achieves compelling results on these datasets as well as on standard ShapeNet benchmarks.
△ Less
Submitted 21 December, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
Direct observation of six-fold exotic fermions in topological semimetal PdSb$_2$
Authors:
Sun Zhipeng,
Hua Chenqiang,
Liu Xiaolei,
Liu Zhengtai,
Ye Mao,
Qiao Shan,
Liu Zhonghao,
Liu Jishan,
Guo Yanfeng,
Lu Yunhao,
Shen Dawei
Abstract:
Pyrite-type PdSb$_2$ with a nonsymmorphic cubic structure has been predicted to host six-fold-degenerate exotic fermions beyond the Dirac and Weyl fermions. Though magnetotransport measurements on PdSb$_2$ suggest its topologically nontrivial character, direct spectroscpic study of its band structure remains absent. Here, by utilizing high-resolution angle-resolved photoemission spectroscopy, we p…
▽ More
Pyrite-type PdSb$_2$ with a nonsymmorphic cubic structure has been predicted to host six-fold-degenerate exotic fermions beyond the Dirac and Weyl fermions. Though magnetotransport measurements on PdSb$_2$ suggest its topologically nontrivial character, direct spectroscpic study of its band structure remains absent. Here, by utilizing high-resolution angle-resolved photoemission spectroscopy, we present a systematic study on its bulk and surface electronic structure. Through careful comparison with first-principles calculations, we verify the existence of six-fold fermions in PdSb$_2$, which are formed by three doubly degenerate bands centered at the $R$ point in the Brillouin zone. These bands exhibit parabolic dispersion close to six-fold fermion nodes, in sharp contrast to previously reported ones in chiral fermion materials. Furthermore, our data reveal no protected Fermi arcs in PdSb$_2$, which is compatible with its achiral structure. Our findings provide a remarkable platform for study of new topological fermions and indicate their potential applications.
△ Less
Submitted 9 December, 2019; v1 submitted 1 December, 2019;
originally announced December 2019.
-
Manhattan Room Layout Reconstruction from a Single 360 image: A Comparative Study of State-of-the-art Methods
Authors:
Chuhang Zou,
Jheng-Wei Su,
Chi-Han Peng,
Alex Colburn,
Qi Shan,
Peter Wonka,
Hung-Kuo Chu,
Derek Hoiem
Abstract:
Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different d…
▽ More
Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g. SegNet or ResNet), type of elements predicted (e.g. corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset [3], and introduce two depth-based evaluation metrics.
△ Less
Submitted 25 December, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image
Authors:
Chuhang Zou,
Alex Colburn,
Qi Shan,
Derek Hoiem
Abstract:
We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to align…
▽ More
We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to aligning the image based on vanishing points, predicting multiple layout elements (corners, boundaries, size and translation), and fitting a constrained Manhattan layout to the resulting predictions. Our method compares well in speed and accuracy to other existing work on panoramas, achieves among the best accuracy for perspective images, and can handle both cuboid-shaped and more general Manhattan layouts.
△ Less
Submitted 23 March, 2018;
originally announced March 2018.
-
RIDI: Robust IMU Double Integration
Authors:
Hang Yan,
Qi Shan,
Yasutaka Furukawa
Abstract:
This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear…
▽ More
This paper proposes a novel data-driven approach for inertial navigation, which learns to estimate trajectories of natural human motions just from an inertial measurement unit (IMU) in every smartphone. The key observation is that human motions are repetitive and consist of a few major modes (e.g., standing, walking, or turning). Our algorithm regresses a velocity vector from the history of linear accelerations and angular velocities, then corrects low-frequency bias in the linear accelerations, which are integrated twice to estimate positions. We have acquired training data with ground-truth motions across multiple human subjects and multiple phone placements (e.g., in a bag or a hand). The qualitatively and quantitatively evaluations have demonstrated that our algorithm has surprisingly shown comparable results to full Visual Inertial navigation. To our knowledge, this paper is the first to integrate sophisticated machine learning techniques with inertial navigation, potentially opening up a new line of research in the domain of data-driven inertial navigation. We will publicly share our code and data to facilitate further research.
△ Less
Submitted 30 December, 2017; v1 submitted 24 December, 2017;
originally announced December 2017.
-
Panoramic Structure from Motion via Geometric Relationship Detection
Authors:
Satoshi Ikehata,
Ivaylo Boyadzhiev,
Qi Shan,
Yasutaka Furukawa
Abstract:
This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perfo…
▽ More
This paper addresses the problem of Structure from Motion (SfM) for indoor panoramic image streams, extremely challenging even for the state-of-the-art due to the lack of textures and minimal parallax. The key idea is the fusion of single-view and multi-view reconstruction techniques via geometric relationship detection (e.g., detecting 2D lines as coplanar in 3D). Rough geometry suffices to perform such detection, and our approach utilizes rough surface normal estimates from an image-to-normal deep network to discover geometric relationships among lines. The detected relationships provide exact geometric constraints in our line-based linear SfM formulation. A constrained linear least squares is used to reconstruct a 3D model and camera motions, followed by the bundle adjustment. We have validated our algorithm on challenging datasets, outperforming various state-of-the-art reconstruction techniques.
△ Less
Submitted 5 December, 2016;
originally announced December 2016.
-
IM2CAD
Authors:
Hamid Izadinia,
Qi Shan,
Steven M. Seitz
Abstract:
Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodel…
▽ More
Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement.
△ Less
Submitted 23 April, 2017; v1 submitted 17 August, 2016;
originally announced August 2016.
-
Proton radiography of magnetic field produced by ultra-intense laser irradiating capacity-coil target
Authors:
W. W. Wang,
J. Teng,
J. Chen,
H. B. Cai,
S. K. He,
W. M. Zhou,
L. Q. Shan,
F. Lu,
Y. C. Wu,
W. Hong,
D. X. Liu,
B. Bi,
F. Zhang,
F. B. Xue,
B. Y. Li,
B. Zhang,
Y. L. He,
W. He,
J. L. Jiao,
K. G. Dong,
F. Q. Zhang,
Z. G. Deng,
Z. M. Zhang,
B. Cui,
D. Han
, et al. (7 additional authors not shown)
Abstract:
Ultra-intense ultra-short laser is firstly used to irradiate the capacity-coil target to generate magnetic field. The spatial structure and temporal evolution of huge magnetic fields were studied with time-gated proton radiography method. A magnetic flux density of 40T was measured by comparing the proton deflection and particle track simulations. Although the laser pulse duration is only 30fs, th…
▽ More
Ultra-intense ultra-short laser is firstly used to irradiate the capacity-coil target to generate magnetic field. The spatial structure and temporal evolution of huge magnetic fields were studied with time-gated proton radiography method. A magnetic flux density of 40T was measured by comparing the proton deflection and particle track simulations. Although the laser pulse duration is only 30fs, the generated magnetic field can last for over 100 picoseconds. The energy conversion efficiency from laser to magnetic field can reach as high as ~20%. The results indicate that tens of tesla (T) magnetic field could be produced in many ultra intense laser facilities around the world, and higher magnetic field could be produced by picosecond lasers.
△ Less
Submitted 17 November, 2014;
originally announced November 2014.
-
Aging research of the LAB-based liquid scintillator in stainless steel container
Authors:
Hai-tao Chen,
Bo-xiang Yu,
Qing Shan,
Ya-yun Ding,
Bing Du,
Shu-tong Liu,
Xuan Zhang,
Li Zhou,
Wen-bao Jia,
Jian Fang,
Xing-chen Ye,
Wei Hu,
Shun-li Niu,
Jia-qing Yan,
Hang Zhao,
Dao-** Zhao
Abstract:
Stainless steel is the material used for the storage vessels and pi** systems of LAB-based liquid scintillator in JUNO experiment. Aging is recognized as one of the main degradation mechanisms affecting the properties of liquid scintillator. LAB-based liquid scintillator aging experiments were carried out in different material of containers (type 316 and 304 stainless steel and glass) at two dif…
▽ More
Stainless steel is the material used for the storage vessels and pi** systems of LAB-based liquid scintillator in JUNO experiment. Aging is recognized as one of the main degradation mechanisms affecting the properties of liquid scintillator. LAB-based liquid scintillator aging experiments were carried out in different material of containers (type 316 and 304 stainless steel and glass) at two different temperature (40 and 25 degrees Celsius). For the continuous liquid scintillator properties tests, the light yield and the absorption spectrum are nearly the same as that of the unaged one. The attenuation length of the aged samples is 6%~12% shorter than that of the unaged one. But the concentration of element Fe in the LAB-based liquid scintillator does not show a clear change. So the self aging has small effect on liquid scintillator, as well as the stainless steel impurity quenching. Type 316 and 304 stainless steel can be used as LAB-based liquid scintillator vessel, transportation pipeline material.
△ Less
Submitted 3 September, 2014;
originally announced September 2014.
-
Discovery of higher order reentrant modes by constructing a cylindrical symmetric ring and post cavity resonator
Authors:
Y. Fan,
Z. Zhang,
N. C. Carvalho,
J-M. Le Floch,
Q. Shan,
M. E. Tobar
Abstract:
Analysis of the properties of resonant modes in a reentrant cavity structure comprising of a post and a ring was undertaken and verified experimentally. In particular we show the existence of higher order reentrant cavity modes in such a structure. Results show that the new cavity has two re-entrant modes, one which has a better displacement sensitivity than the single post resonator and the other…
▽ More
Analysis of the properties of resonant modes in a reentrant cavity structure comprising of a post and a ring was undertaken and verified experimentally. In particular we show the existence of higher order reentrant cavity modes in such a structure. Results show that the new cavity has two re-entrant modes, one which has a better displacement sensitivity than the single post resonator and the other with a reduced sensitivity. The more sensitive mode is better than the single post resonator by a factor of 2 to 1.5 when the gap spacing is below 100 $ μ$m. This type of cavity has the potential to operate as a highly sensitive transducer for a variety of precision measurement applications, in particular applications which require coupling to more than one sensitive transducer mode.
△ Less
Submitted 10 March, 2014; v1 submitted 27 September, 2013;
originally announced September 2013.
-
Rigorous analysis of highly tunable cylindrical Transverse Magnetic mode re-entrant cavities
Authors:
J-M. Le Floch,
Y. Fan,
M. Aubourg,
D. Cros,
N. C. Carvalho,
Q. Shan,
J. Bourhill,
E. N. Ivanov,
G. Humbert,
V. Madrangeas,
M. E. Tobar
Abstract:
Cylindrical re-entrant cavities are unique three-dimensional structures that resonate with their electric and magnetic fields in separate parts of the cavity. To further understand these devices, we undertake rigorous analysis of the properties of the resonance using in-house developed Finite Element Method (FEM) software capable of dealing with small gap structures of extreme aspect ratio. Compar…
▽ More
Cylindrical re-entrant cavities are unique three-dimensional structures that resonate with their electric and magnetic fields in separate parts of the cavity. To further understand these devices, we undertake rigorous analysis of the properties of the resonance using in-house developed Finite Element Method (FEM) software capable of dealing with small gap structures of extreme aspect ratio. Comparisons between the FEM method and experiments are consistent and we illustrate where predictions using established lumped element models work well and where they are limited. With the aid of the modeling we design a highly tunable cavity that can be tuned from 2 GHz to 22 GHz just by inserting a post into a fixed dimensioned cylindrical cavity. We show this is possible as the mode structure transforms from a re-entrant mode during the tuning process to a standard cylindrical Transverse Magnetic (TM) mode.
△ Less
Submitted 8 January, 2014; v1 submitted 13 August, 2013;
originally announced August 2013.