-
Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching
Authors:
Hongkai Chen,
Zixin Luo,
Yurun Tian,
Xuyang Bai,
Ziyu Wang,
Lei Zhou,
Mingmin Zhen,
Tian Fang,
David McKinnon,
Yanghai Tsin,
Long Quan
Abstract:
Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cr…
▽ More
Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from cross attention. Apart from network structure, we also identify the importance of enforcing spatial smoothness in loss design, which has been omitted by previous works. Based on these augmentations, our network demonstrate strong matching capacity under different settings. The full version of our network achieves state-of-the-art performance among semi-dense matching methods at a similar cost to LoFTR, while the slim version reaches LoFTR baseline's performance with only 15% computation cost and 18% parameters.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Authors:
Yuanxun Lu,
**gyang Zhang,
Shiwei Li,
Tian Fang,
David McKinnon,
Yanghai Tsin,
Long Quan,
Xun Cao,
Yao Yao
Abstract:
Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fin…
▽ More
Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.
△ Less
Submitted 21 March, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling
Authors:
**gyang Zhang,
Shiwei Li,
Yuanxun Lu,
Tian Fang,
David McKinnon,
Yanghai Tsin,
Long Quan,
Yao Yao
Abstract:
We introduce JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). JointNet is extended from a pre-trained text-to-image diffusion model, where a copy of the original network is created for the new dense modality branch and is densely connected with the RGB branch. The RGB branch is locked during network fin…
▽ More
We introduce JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). JointNet is extended from a pre-trained text-to-image diffusion model, where a copy of the original network is created for the new dense modality branch and is densely connected with the RGB branch. The RGB branch is locked during network fine-tuning, which enables efficient learning of the new modality distribution while maintaining the strong generalization ability of the large-scale pre-trained diffusion model. We demonstrate the effectiveness of JointNet by using RGBD diffusion as an example and through extensive experiments, showcasing its applicability in a variety of applications, including joint RGBD generation, dense depth prediction, depth-conditioned image generation, and coherent tile-based 3D panorama generation.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
A simple linear-time algorithm for generating auxiliary 3-edge-connected subgraphs
Authors:
Yung H. Tsin
Abstract:
A linear-time algorithm for generating auxiliary subgraphs for the 3-edge-connected components of a connected multigraph is presented. The algorithm uses an innovative graph contraction operation and makes only one pass over the graph. By contrast, the previously best-known algorithms make multiple passes over the graph to decompose it into its 2-edge-connected components or 2-vertex-connected com…
▽ More
A linear-time algorithm for generating auxiliary subgraphs for the 3-edge-connected components of a connected multigraph is presented. The algorithm uses an innovative graph contraction operation and makes only one pass over the graph. By contrast, the previously best-known algorithms make multiple passes over the graph to decompose it into its 2-edge-connected components or 2-vertex-connected components, then its 3-edge-connected components or 3-vertex-connected components, and then construct a cactus representation for the 2-cuts to generate the auxiliary subgraphs for the 3-edge-connected components.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
On finding 2-cuts and 3-edge-connected components in parallel
Authors:
Yung H. Tsin
Abstract:
Given a connected undirected multigraph G (a graph that may contain parallel edges), the algorithm of [19] finds the 3-edge-connected components of $G$ in linear time using an innovative graph contraction technique based on a depth-first search. In [21], it was shown that the algorithm can be extended to produce a Mader construction sequence for each 3-edge-connected component, a cactus representa…
▽ More
Given a connected undirected multigraph G (a graph that may contain parallel edges), the algorithm of [19] finds the 3-edge-connected components of $G$ in linear time using an innovative graph contraction technique based on a depth-first search. In [21], it was shown that the algorithm can be extended to produce a Mader construction sequence for each 3-edge-connected component, a cactus representation of the 2-cuts (cut-pairs) of each 2-edge-connected component of $G$, and the 1-cuts (bridges) at the same time.
In this paper, we further extend the algorithm of [19] to generate the 2-cuts and the 3-edge-connected components of $G$ simultaneously in linear time by performing only one depth-first search over the input graph. Previously known algorithms solve the two problems separately in multiple phases.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation
Authors:
**gyang Zhang,
Yao Yao,
Shiwei Li,
**gbo Liu,
Tian Fang,
David McKinnon,
Yanghai Tsin,
Long Quan
Abstract:
We present a novel differentiable rendering framework for joint geometry, material, and lighting estimation from multi-view images. In contrast to previous methods which assume a simplified environment map or co-located flashlights, in this work, we formulate the lighting of a static scene as one neural incident light field (NeILF) and one outgoing neural radiance field (NeRF). The key insight of…
▽ More
We present a novel differentiable rendering framework for joint geometry, material, and lighting estimation from multi-view images. In contrast to previous methods which assume a simplified environment map or co-located flashlights, in this work, we formulate the lighting of a static scene as one neural incident light field (NeILF) and one outgoing neural radiance field (NeRF). The key insight of the proposed method is the union of the incident and outgoing light fields through physically-based rendering and inter-reflections between surfaces, making it possible to disentangle the scene geometry, material, and lighting from image observations in a physically-based manner. The proposed incident light and inter-reflection framework can be easily applied to other NeRF systems. We show that our method can not only decompose the outgoing radiance into incident lights and surface materials, but also serve as a surface refinement module that further improves the reconstruction detail of the neural surface. We demonstrate on several datasets that the proposed method is able to achieve state-of-the-art results in terms of geometry reconstruction quality, material estimation accuracy, and the fidelity of novel view rendering.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer
Authors:
Hongkai Chen,
Zixin Luo,
Lei Zhou,
Yurun Tian,
Mingmin Zhen,
Tian Fang,
David Mckinnon,
Yanghai Tsin,
Long Quan
Abstract:
Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. T…
▽ More
Generating robust and reliable correspondences across images is a fundamental task for a diversity of applications. To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. To achieve this goal, first, flow maps are regressed in each cross attention phase to locate the center of search region. Next, a sampling grid is generated around the center, whose size, instead of being empirically configured as fixed, is adaptively computed from a pixel uncertainty estimated along with the flow map. Finally, attention is computed across two images within derived regions, referred to as attention span. By these means, we are able to not only maintain long-range dependencies, but also enable fine-grained attention among pixels of high relevance that compensates essential locality and piece-wise smoothness in matching tasks. State-of-the-art accuracy on a wide range of evaluation benchmarks validates the strong matching capability of our method.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Critical Regularizations for Neural Surface Reconstruction in the Wild
Authors:
**gyang Zhang,
Yao Yao,
Shiwei Li,
Tian Fang,
David McKinnon,
Yanghai Tsin,
Long Quan
Abstract:
Neural implicit functions have recently shown promising results on surface reconstructions from multiple views. However, current methods still suffer from excessive time complexity and poor robustness when reconstructing unbounded or complex scenes. In this paper, we present RegSDF, which shows that proper point cloud supervisions and geometry regularizations are sufficient to produce high-quality…
▽ More
Neural implicit functions have recently shown promising results on surface reconstructions from multiple views. However, current methods still suffer from excessive time complexity and poor robustness when reconstructing unbounded or complex scenes. In this paper, we present RegSDF, which shows that proper point cloud supervisions and geometry regularizations are sufficient to produce high-quality and robust reconstruction results. Specifically, RegSDF takes an additional oriented point cloud as input, and optimizes a signed distance field and a surface light field within a differentiable rendering framework. We also introduce the two critical regularizations for this optimization. The first one is the Hessian regularization that smoothly diffuses the signed distance values to the entire distance field given noisy and incomplete input. And the second one is the minimal surface regularization that compactly interpolates and extrapolates the missing geometry. Extensive experiments are conducted on DTU, BlendedMVS, and Tanks and Temples datasets. Compared with recent neural surface reconstruction approaches, RegSDF is able to reconstruct surfaces with fine details even for open scenes with complex topologies and unstructured camera trajectories.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
NeILF: Neural Incident Light Field for Physically-based Material Estimation
Authors:
Yao Yao,
**gyang Zhang,
**gbo Liu,
Yihang Qu,
Tian Fang,
David McKinnon,
Yanghai Tsin,
Long Quan
Abstract:
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. In the framework, we represent scene lightings as the Neural Incident Light Field (NeILF) and material properties as the surface BRDF modelled by multi-layer perceptrons. Compared with recent approaches that approximate scene lightings as the 2D environment map,…
▽ More
We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. In the framework, we represent scene lightings as the Neural Incident Light Field (NeILF) and material properties as the surface BRDF modelled by multi-layer perceptrons. Compared with recent approaches that approximate scene lightings as the 2D environment map, NeILF is a fully 5D light field that is capable of modelling illuminations of any static scenes. In addition, occlusions and indirect lights can be handled naturally by the NeILF representation without requiring multiple bounces of ray tracing, making it possible to estimate material properties even for scenes with complex lightings and geometries. We also propose a smoothness regularization and a Lambertian assumption to reduce the material-lighting ambiguity during the optimization. Our method strictly follows the physically-based rendering equation, and jointly optimizes material and lighting through the differentiable rendering process. We have intensively evaluated the proposed method on our in-house synthetic dataset, the DTU MVS dataset, and real-world BlendedMVS scenes. Our method is able to outperform previous methods by a significant margin in terms of novel view rendering quality, setting a new state-of-the-art for image-based material and lighting estimation.
△ Less
Submitted 18 March, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
A simple certifying algorithm for 3-edge-connectivity
Authors:
Yung H. Tsin
Abstract:
A linear-time certifying algorithm for 3-edge-connectivity is presented. Given an undirected graph G, if G is 3-edge-connected, the algorithm generates a construction sequence as a positive certificate for G. Otherwise, the algorithm decomposes G into its 3-edge-connected components and at the same time generates a construction sequence for each connected component as well as the bridges and a cac…
▽ More
A linear-time certifying algorithm for 3-edge-connectivity is presented. Given an undirected graph G, if G is 3-edge-connected, the algorithm generates a construction sequence as a positive certificate for G. Otherwise, the algorithm decomposes G into its 3-edge-connected components and at the same time generates a construction sequence for each connected component as well as the bridges and a cactus representation of the cut-pairs in G. All of these are done by making only one pass over G using an innovative graph contraction technique. Moreover, the graph need not be 2-edge-connected.
△ Less
Submitted 29 January, 2022; v1 submitted 11 February, 2020;
originally announced February 2020.