-
Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression
Authors:
Huy-Hoang Bui,
Bach-Thuan Bui,
Dinh-Tuan Tran,
Joo-Ho Lee
Abstract:
Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is t…
▽ More
Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: https://github.com/ais-lab/DescriptorSynthesis4Feat2Map.
△ Less
Submitted 19 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Representing 3D sparse map points and lines for camera relocalization
Authors:
Bach-Thuan Bui,
Huy-Hoang Bui,
Dinh-Tuan Tran,
Joo-Ho Lee
Abstract:
Recent advancements in visual localization and map** have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional map** components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to rep…
▽ More
Recent advancements in visual localization and map** have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional map** components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to represent both 3D point and line features, and exhibit leading pose accuracy by harnessing the power of multiple learned map**s. Specifically, we utilize a single transformer block to encode line features, effectively transforming them into distinctive point-like descriptors. Subsequently, we treat these point and line descriptor sets as distinct yet interconnected feature sets. Through the integration of self- and cross-attention within several graph layers, our method effectively refines each feature before regressing 3D maps using two simple MLPs. In comprehensive experiments, our indoor localization findings surpass those of Hloc and Limap across both point-based and line-assisted configurations. Moreover, in outdoor scenarios, our method secures a significant lead, marking the most considerable enhancement over state-of-the-art learning-based methodologies. The source code and demo videos of this work are publicly available at: https://thpjp.github.io/pl2map/
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
D2S: Representing sparse descriptors and 3D coordinates for camera relocalization
Authors:
Bach-Thuan Bui,
Huy-Hoang Bui,
Dinh-Tuan Tran,
Joo-Ho Lee
Abstract:
State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinat…
▽ More
State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the state-of-the-art CNN-based methods in scene coordinate regression in indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts, even in the absence of the labeled data sources. The source code, trained models, dataset, and demo videos are available at the following link: https://thpjp.github.io/d2s.
△ Less
Submitted 12 July, 2024; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Geomechanics in unconventional resource development
Authors:
Binh T. Bui
Abstract:
To economically produce from very low permeability shale formations, hydraulic fracturing stimulation is typically used to improve their conductivity. This process deforms and breaks the rock, hence requires the geomechanics data and calculation. The development of unconventional reservoirs requires large geomechanical data, and geomechanics has involved in all calculations of the unconventional r…
▽ More
To economically produce from very low permeability shale formations, hydraulic fracturing stimulation is typically used to improve their conductivity. This process deforms and breaks the rock, hence requires the geomechanics data and calculation. The development of unconventional reservoirs requires large geomechanical data, and geomechanics has involved in all calculations of the unconventional reservoir projects. Geomechanics has numerous contributions to the development of unconventional reservoirs from reservoir characterization and well construction to hydraulic fracturing and reservoir modeling as well as environmental aspect. This paper reviews and highlights some important aspects of geomechanics on the successful development of unconventional reservoirs as well as outlines the recent development in unconventional reservoir geomechanics. The main objective is to emphasize the importance of geomechanical data and geomechanics and how they are being used in in all aspects of unconventional reservoir projects.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Fast and Lightweight Scene Regressor for Camera Relocalization
Authors:
Thuan B. Bui,
Dinh-Tuan Tran,
Joo-Ho Lee
Abstract:
Camera relocalization involving a prior 3D reconstruction plays a crucial role in many mixed reality and robotics applications. Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications with limited storage and/or communication bandwidth. Although recent scene and absolute pose regression methods have become popular for efficien…
▽ More
Camera relocalization involving a prior 3D reconstruction plays a crucial role in many mixed reality and robotics applications. Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications with limited storage and/or communication bandwidth. Although recent scene and absolute pose regression methods have become popular for efficient camera localization, most of them are computation-resource intensive and difficult to obtain a real-time inference with high accuracy constraints. This study proposes a simple scene regression method that requires only a multi-layer perceptron network for map** scene coordinates to achieve accurate camera pose estimations. The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image. The use of sparse features provides several advantages. First, the proposed regressor network is substantially smaller than those reported in previous studies. This makes our system highly efficient and scalable. Second, the pre-built 3D models provide the most reliable and robust 2D-3D matches. Therefore, learning from them can lead to an awareness of equivalent features and substantially improve the generalization performance. A detailed analysis of our approach and extensive evaluations using existing datasets are provided to support the proposed method. The implementation detail is available at https://github.com/aislab/feat2map
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
On joint training with interfaces for spoken language understanding
Authors:
Anirudh Raju,
Milind Rao,
Gautam Tiwari,
Pranav Dheram,
Bryan Anderson,
Zhe Zhang,
Chul Lee,
Bach Bui,
Ariya Rastrow
Abstract:
Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on t…
▽ More
Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on text transcriptions or richer information like neural embeddings from ASR to NLU. In this paper, we study how interfaces affect joint-training for spoken language understanding. Most notably, we obtain the state-of-the-art results on the publicly available 50-hr SLURP dataset. We first leverage large-size pretrained ASR and NLU models that are connected by a text interface, and then jointly train both models via a sequence loss function. For scenarios where pretrained models are not utilized, the best results are obtained through a joint sequence loss training using richer neural interfaces. Finally, we show the overall diminishing impact of leveraging pretrained models with increased training data size.
△ Less
Submitted 25 July, 2022; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Authors:
Milind Rao,
Anirudh Raju,
Pranav Dheram,
Bach Bui,
Ariya Rastrow
Abstract:
We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening…
▽ More
We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios like devices enabling voice assistants to work offline, in a privacy preserving manner, whilst also reducing server costs.
We first present models that extract utterance intent directly from speech without intermediate text output. We then present a compositional model, which generates the transcript using the Listen Attend Spell ASR system and then extracts interpretation using a neural NLU model. Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation. We show that the jointly trained model shows improvements to ASR incorporating semantic information from NLU and also improves NLU by exposing it to ASR confusion encoded in the hidden layer.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology
Authors:
Stefan Studer,
Thanh Binh Bui,
Christian Drescher,
Alexander Hanuschkin,
Ludwig Winkler,
Steven Peters,
Klaus-Robert Mueller
Abstract:
Machine learning is an established and frequently used technique in industry and academia but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners have a need for guidance throughout the life cycle of a machine learning application to meet business expectations. We therefore propose a…
▽ More
Machine learning is an established and frequently used technique in industry and academia but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners have a need for guidance throughout the life cycle of a machine learning application to meet business expectations. We therefore propose a process model for the development of machine learning applications, that covers six phases from defining the scope to maintaining the deployed machine learning application. The first phase combines business and data understanding as data availability oftentimes affects the feasibility of the project. The sixth phase covers state-of-the-art approaches for monitoring and maintenance of a machine learning applications, as the risk of model degradation in a changing environment is eminent. With each task of the process, we propose quality assurance methodology that is suitable to adress challenges in machine learning development that we identify in form of risks. The methodology is drawn from practical experience and scientific literature and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support but lacks to address machine learning specific tasks. Our work proposes an industry and application neutral process model tailored for machine learning applications with focus on technical tasks for quality assurance.
△ Less
Submitted 24 February, 2021; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Stability of the optimal filter in continuous time: Beyond the bene{\v s} filter
Authors:
van Bien Bui,
Sylvain Rubenthaler
Abstract:
We are interested in the optimal filter in a continuous time setting. We want to show that the optimal filter is stable with respect to its initial condition. We reduce the problem to a discrete time setting and apply truncation techniques coming from [OR05]. Due to the continuous time setting, we need a new technique to solve the problem. In the end, we show that the forgetting rate is at least a…
▽ More
We are interested in the optimal filter in a continuous time setting. We want to show that the optimal filter is stable with respect to its initial condition. We reduce the problem to a discrete time setting and apply truncation techniques coming from [OR05]. Due to the continuous time setting, we need a new technique to solve the problem. In the end, we show that the forgetting rate is at least a power of the time t. The results can be re-used to prove the stability in time of a numerical approximation of the optimal filter.
△ Less
Submitted 28 January, 2020; v1 submitted 12 April, 2016;
originally announced April 2016.
-
Intractability of the Minimum-Flip Supertree problem and its variants
Authors:
Sebastian Böcker,
Quang Bao Anh Bui,
Francois Nicolas,
Anke Truss
Abstract:
Computing supertrees is a central problem in phylogenetics. The supertree method that is by far the most widely used today was introduced in 1992 and is called Matrix Representation with Parsimony analysis (MRP). Matrix Representation using Flip** (MRF)}, which was introduced in 2002, is an interesting variant of MRP: MRF is arguably more relevant that MRP and various efficient implementations o…
▽ More
Computing supertrees is a central problem in phylogenetics. The supertree method that is by far the most widely used today was introduced in 1992 and is called Matrix Representation with Parsimony analysis (MRP). Matrix Representation using Flip** (MRF)}, which was introduced in 2002, is an interesting variant of MRP: MRF is arguably more relevant that MRP and various efficient implementations of MRF have been presented. From a theoretical point of view, implementing MRF or MRP is solving NP-hard optimization problems. The aim of this paper is to study the approximability and the fixed-parameter tractability of the optimization problem corresponding to MRF, namely Minimum-Flip Supertree. We prove strongly negative results.
△ Less
Submitted 19 December, 2011;
originally announced December 2011.