Search | arXiv e-print repository

Zero-shot detection of buildings in mobile LiDAR using Language Vision Model

Authors: June Moh Goo, Zichao Zeng, Jan Boehm

Abstract: Recent advances have demonstrated that Language Vision Models (LVMs) surpass the existing State-of-the-Art (SOTA) in two-dimensional (2D) computer vision tasks, motivating attempts to apply LVMs to three-dimensional (3D) data. While LVMs are efficient and effective in addressing various downstream 2D vision tasks without training, they face significant challenges when it comes to point clouds, a r… ▽ More Recent advances have demonstrated that Language Vision Models (LVMs) surpass the existing State-of-the-Art (SOTA) in two-dimensional (2D) computer vision tasks, motivating attempts to apply LVMs to three-dimensional (3D) data. While LVMs are efficient and effective in addressing various downstream 2D vision tasks without training, they face significant challenges when it comes to point clouds, a representative format for representing 3D data. It is more difficult to extract features from 3D data and there are challenges due to large data sizes and the cost of the collection and labelling, resulting in a notably limited availability of datasets. Moreover, constructing LVMs for point clouds is even more challenging due to the requirements for large amounts of data and training time. To address these issues, our research aims to 1) apply the Grounded SAM through Spherical Projection to transfer 3D to 2D, and 2) experiment with synthetic data to evaluate its effectiveness in bridging the gap between synthetic and real-world data domains. Our approach exhibited high performance with an accuracy of 0.96, an IoU of 0.85, precision of 0.92, recall of 0.91, and an F1 score of 0.92, confirming its potential. However, challenges such as occlusion problems and pixel-level overlaps of multi-label points during spherical image generation remain to be addressed in future studies. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 7 pages, 6 figures, conference

arXiv:2404.09921 [pdf, other]

Zero-shot Building Age Classification from Facade Image Using GPT-4

Authors: Zichao Zeng, June Moh Goo, Xinglei Wang, Bin Chi, Meihui Wang, Jan Boehm

Abstract: A building's age of construction is crucial for supporting many geospatial applications. Much current research focuses on estimating building age from facade images using deep learning. However, building an accurate deep learning model requires a considerable amount of labelled training data, and the trained models often have geographical constraints. Recently, large pre-trained vision language mo… ▽ More A building's age of construction is crucial for supporting many geospatial applications. Much current research focuses on estimating building age from facade images using deep learning. However, building an accurate deep learning model requires a considerable amount of labelled training data, and the trained models often have geographical constraints. Recently, large pre-trained vision language models (VLMs) such as GPT-4 Vision, which demonstrate significant generalisation capabilities, have emerged as potential training-free tools for dealing with specific vision tasks, but their applicability and reliability for building information remain unexplored. In this study, a zero-shot building age classifier for facade images is developed using prompts that include logical instructions. Taking London as a test case, we introduce a new dataset, FI-London, comprising facade images and building age epochs. Although the training-free classifier achieved a modest accuracy of 39.69%, the mean absolute error of 0.85 decades indicates that the model can predict building age epochs successfully albeit with a small bias. The ensuing discussion reveals that the classifier struggles to predict the age of very old buildings and is challenged by fine-grained predictions within 2 decades. Overall, the classifier utilising GPT-4 Vision is capable of predicting the rough age epoch of a building from a single facade image without any training. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2312.14810 [pdf, other]

Accurate, scalable, and efficient Bayesian Optimal Experimental Design with derivative-informed neural operators

Authors: **woo Go, Peng Chen

Abstract: We consider optimal experimental design (OED) problems in selecting the most informative observation sensors to estimate model parameters in a Bayesian framework. Such problems are computationally prohibitive when the parameter-to-observable (PtO) map is expensive to evaluate, the parameters are high-dimensional, and the optimization for sensor selection is combinatorial and high-dimensional. To a… ▽ More We consider optimal experimental design (OED) problems in selecting the most informative observation sensors to estimate model parameters in a Bayesian framework. Such problems are computationally prohibitive when the parameter-to-observable (PtO) map is expensive to evaluate, the parameters are high-dimensional, and the optimization for sensor selection is combinatorial and high-dimensional. To address these challenges, we develop an accurate, scalable, and efficient computational framework based on derivative-informed neural operators (DINOs). The derivative of the PtO map is essential for accurate evaluation of the optimality criteria of OED in our consideration. We take the key advantage of DINOs, a class of neural operators trained with derivative information, to achieve high approximate accuracy of not only the PtO map but also, more importantly, its derivative. Moreover, we develop scalable and efficient computation of the optimality criteria based on DINOs and propose a modified swap** greedy algorithm for its optimization. We demonstrate that the proposed method is scalable to preserve the accuracy for increasing parameter dimensions and achieves high computational efficiency, with an over 1000x speedup accounting for both offline construction and online evaluation costs, compared to high-fidelity Bayesian OED solutions for a three-dimensional nonlinear convection-diffusion-reaction example with tens of thousands of parameters. △ Less

Submitted 27 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

MSC Class: 62K05; 35Q62; 62F15; 35R30; 35Q93; 65C60; 90C27 ACM Class: G.1.8; I.5.2; I.6.4

arXiv:2312.09830 [pdf, other]

Socio-Economic Deprivation Analysis: Diffusion Maps

Authors: June Moh Goo

Abstract: This report proposes a model to predict the location of the most deprived areas in a city using data from the census. A census data is very high dimensional and needs to be simplified. We use a novel algorithm to reduce dimensionality and find patterns: The diffusion map. Features are defined by eigenvectors of the Laplacian matrix that defines the diffusion map. Eigenvectors corresponding to the… ▽ More This report proposes a model to predict the location of the most deprived areas in a city using data from the census. A census data is very high dimensional and needs to be simplified. We use a novel algorithm to reduce dimensionality and find patterns: The diffusion map. Features are defined by eigenvectors of the Laplacian matrix that defines the diffusion map. Eigenvectors corresponding to the smallest eigenvalues indicate specific population features. Previous work has found qualitatively that the second most important dimension for describing the census data in Bristol is linked to deprivation. In this report, we analyse how good this dimension is as a model for predicting deprivation by comparing with the recognised measures. The Pearson correlation coefficient was found to be over 0.7. The top 10 per cent of deprived areas in the UK which also locate in Bristol are extracted to test the accuracy of the model. There are 52 most deprived areas, and 38 areas are correctly identified by comparing to the model. The influence of scores of IMD domains that do not correlate with the models, Eigenvector 2 entries of non-deprived OAs and orthogonality of Eigenvectors cause the model to fail the prediction of 14 deprived areas. However, overall, the model shows a high performance to predict the future deprivation of overall areas where the project considers. This project is expected to support the government to allocate resources and funding. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2302.12505 [pdf, other]

Spatial Bias for Attention-free Non-local Neural Networks

Authors: Junhyung Go, Jongbin Ryu

Abstract: In this paper, we introduce the spatial bias to learn global knowledge without self-attention in convolutional neural networks. Owing to the limited receptive field, conventional convolutional neural networks suffer from learning long-range dependencies. Non-local neural networks have struggled to learn global knowledge, but unavoidably have too heavy a network design due to the self-attention ope… ▽ More In this paper, we introduce the spatial bias to learn global knowledge without self-attention in convolutional neural networks. Owing to the limited receptive field, conventional convolutional neural networks suffer from learning long-range dependencies. Non-local neural networks have struggled to learn global knowledge, but unavoidably have too heavy a network design due to the self-attention operation. Therefore, we propose a fast and lightweight spatial bias that efficiently encodes global knowledge without self-attention on convolutional neural networks. Spatial bias is stacked on the feature map and convolved together to adjust the spatial structure of the convolutional features. Therefore, we learn the global knowledge on the convolution layer directly with very few additional resources. Our method is very fast and lightweight due to the attention-free non-local method while improving the performance of neural networks considerably. Compared to non-local neural networks, the spatial bias use about 10 times fewer parameters while achieving comparable performance with 1.6 ~ 3.3 times more throughput on a very little budget. Furthermore, the spatial bias can be used with conventional non-local neural networks to further improve the performance of the backbone model. We show that the spatial bias achieves competitive performance that improves the classification accuracy by +0.79% and +1.5% on ImageNet-1K and cifar100 datasets. Additionally, we validate our method on the MS-COCO and ADE20K datasets for downstream tasks involving object detection and semantic segmentation. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2207.00555 [pdf, other]

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Authors: Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Abstract: Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such… ▽ More Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted to Interspeech 2022

arXiv:2205.09914 [pdf, other]

Robust Expected Information Gain for Optimal Bayesian Experimental Design Using Ambiguity Sets

Authors: **woo Go, Tobin Isaac

Abstract: The ranking of experiments by expected information gain (EIG) in Bayesian experimental design is sensitive to changes in the model's prior distribution, and the approximation of EIG yielded by sampling will have errors similar to the use of a perturbed prior. We define and analyze \emph{robust expected information gain} (REIG), a modification of the objective in EIG maximization by minimizing an a… ▽ More The ranking of experiments by expected information gain (EIG) in Bayesian experimental design is sensitive to changes in the model's prior distribution, and the approximation of EIG yielded by sampling will have errors similar to the use of a perturbed prior. We define and analyze \emph{robust expected information gain} (REIG), a modification of the objective in EIG maximization by minimizing an affine relaxation of EIG over an ambiguity set of distributions that are close to the original prior in KL-divergence. We show that, when combined with a sampling-based approach to estimating EIG, REIG corresponds to a `log-sum-exp' stabilization of the samples used to estimate EIG, meaning that it can be efficiently implemented in practice. Numerical tests combining REIG with variational nested Monte Carlo (VNMC), adaptive contrastive estimation (ACE) and mutual information neural estimation (MINE) suggest that in practice REIG also compensates for the variability of under-sampled estimators. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: The 38th Conference on Uncertainty in Artificial Intelligence, 2022

arXiv:2205.07039 [pdf, other]

Fake News Quick Detection on Dynamic Heterogeneous Information Networks

Authors: ** Ho Go, Alina Sari, Jiaojiao Jiang, Shuiqiao Yang, Sanjay Jha

Abstract: The spread of fake news has caused great harm to society in recent years. So the quick detection of fake news has become an important task. Some current detection methods often model news articles and other related components as a static heterogeneous information network (HIN) and use expensive message-passing algorithms. However, in the real-world, quickly identifying fake news is of great signif… ▽ More The spread of fake news has caused great harm to society in recent years. So the quick detection of fake news has become an important task. Some current detection methods often model news articles and other related components as a static heterogeneous information network (HIN) and use expensive message-passing algorithms. However, in the real-world, quickly identifying fake news is of great significance and the network may vary over time in terms of dynamic nodes and edges. Therefore, in this paper, we propose a novel Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick detection. More specifically, we first implement BERT and fine-tuned BERT to get a semantic representation of the news article contents and author profiles and convert it into graph data. Then, we construct the heterogeneous news-author graph to reflect contextual information and relationships. Additionally, we adapt ideas from personalized PageRank propagation and dynamic propagation to heterogeneous networks in order to reduce the time complexity of back-propagating through many nodes during training. Experiments on three real-world fake news datasets show that DHGNN can outperform other GNN-based models in terms of both effectiveness and efficiency. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2107.02314 [pdf, other]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

Authors: Ujjwal Baid, Satyam Ghodasara, Suyash Mohan, Michel Bilello, Evan Calabrese, Errol Colak, Keyvan Farahani, Jayashree Kalpathy-Cramer, Felipe C. Kitamura, Sarthak Pati, Luciano M. Prevedello, Jeffrey D. Rudie, Chiharu Sako, Russell T. Shinohara, Timothy Bergquist, Rong Chai, James Eddy, Julia Elliott, Walter Reade, Thomas Schaffter, Thomas Yu, Jiaxin Zheng, Ahmed W. Moawad, Luiz Otavio Coelho, Olivia McDonnell , et al. (78 additional authors not shown)

Abstract: The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with wel… ▽ More The BraTS 2021 challenge celebrates its 10th anniversary and is jointly organized by the Radiological Society of North America (RSNA), the American Society of Neuroradiology (ASNR), and the Medical Image Computing and Computer Assisted Interventions (MICCAI) society. Since its inception, BraTS has been focusing on being a common benchmarking venue for brain glioma segmentation algorithms, with well-curated multi-institutional multi-parametric magnetic resonance imaging (mpMRI) data. Gliomas are the most common primary malignancies of the central nervous system, with varying degrees of aggressiveness and prognosis. The RSNA-ASNR-MICCAI BraTS 2021 challenge targets the evaluation of computational algorithms assessing the same tumor compartmentalization, as well as the underlying tumor's molecular characterization, in pre-operative baseline mpMRI data from 2,040 patients. Specifically, the two tasks that BraTS 2021 focuses on are: a) the segmentation of the histologically distinct brain tumor sub-regions, and b) the classification of the tumor's O[6]-methylguanine-DNA methyltransferase (MGMT) promoter methylation status. The performance evaluation of all participating algorithms in BraTS 2021 will be conducted through the Sage Bionetworks Synapse platform (Task 1) and Kaggle (Task 2), concluding in distributing to the top ranked participants monetary awards of $60,000 collectively. △ Less

Submitted 12 September, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 19 pages, 2 figures, 1 table

arXiv:2005.03867 [pdf, other]

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

Authors: Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim

Abstract: Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challengin… ▽ More Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challenging conditions such as noisy environments, open-vocabulary KWS, and short-duration SV, by introducing novel techniques of connectionist temporal classification (CTC)-based soft voice activity detection (VAD) and global query attention. Frame-level acoustic and speaker information is integrated with phonetically originated weights so that forms a word-level global representation. Then it is used for the aggregation of feature vectors to generate discriminative embeddings. Our proposed approach shows 4.06% and 26.71% relative improvements in equal error rate (EER) compared to the baselines for both tasks. We also present a visualization example and results of ablation experiments. △ Less

Submitted 7 August, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: Accepted to Interspeech 2020

arXiv:1910.00341 [pdf, other]

Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

Authors: Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Abstract: Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection. Recently, on the fact that the orthography of text labels partly reflects the phonetic similarity between the words' pronunciation, a multi-view approach has been introduced that jointly learns acoustic and text embeddings… ▽ More Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection. Recently, on the fact that the orthography of text labels partly reflects the phonetic similarity between the words' pronunciation, a multi-view approach has been introduced that jointly learns acoustic and text embeddings. It showed that it is possible to learn discriminative embeddings by designing the objective which takes text labels as well as word segments. In this paper, we propose a network architecture that expands the multi-view approach by combining the Siamese multi-view encoders with a shared decoder network to maximize the effect of the relationship between acoustic and text embeddings in embedding space. Discriminatively trained with multi-view triplet loss and decoding loss, our proposed approach achieves better performance on acoustic word discrimination task with the WSJ dataset, resulting in 11.1% relative improvement in average precision. We also present experimental results on cross-view word discrimination and word level speech recognition tasks. △ Less

Submitted 1 October, 2019; originally announced October 2019.

Comments: Accepted at 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)

arXiv:1909.06560 [pdf, other]

Unclogging Our Arteries: Using Human-Inspired Signals to Disambiguate Navigational Intentions

Authors: Justin Hart, Reuth Mirsky, Stone Tejeda, Bonny Mahajan, Jamin Goo, Kathryn Baldauf, Sydney Owen, Peter Stone

Abstract: People are proficient at communicating their intentions in order to avoid conflicts when navigating in narrow, crowded environments. In many situations mobile robots lack both the ability to interpret human intentions and the ability to clearly communicate their own intentions to people sharing their space. This work addresses the second of these points, leveraging insights about how people implic… ▽ More People are proficient at communicating their intentions in order to avoid conflicts when navigating in narrow, crowded environments. In many situations mobile robots lack both the ability to interpret human intentions and the ability to clearly communicate their own intentions to people sharing their space. This work addresses the second of these points, leveraging insights about how people implicitly communicate with each other through observations of behaviors such as gaze to provide mobile robots with better social navigation skills. In a preliminary human study, the importance of gaze as a signal used by people to interpret each-other's intentions during navigation of a shared space is observed. This study is followed by the development of a virtual agent head which is mounted to the top of the chassis of the BWIBot mobile robot platform. Contrasting the performance of the virtual agent head against an LED turn signal demonstrates that the naturalistic, implicit gaze cue is more easily interpreted than the LED turn signal. △ Less

Submitted 6 November, 2019; v1 submitted 14 September, 2019; originally announced September 2019.

Report number: AI-HRI/2019/27

arXiv:1810.01732 [pdf]

2018 Low-Power Image Recognition Challenge

Authors: Sergei Alyamkin, Matthew Ardi, Achille Brighton, Alexander C. Berg, Yiran Chen, Hsin-Pai Cheng, Bo Chen, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Jongkook Go, Alexander Goncharenko, Xuyang Guo, Hong Hanh Nguyen, Andrew Howard, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Alexander Kondratyev, Seungjae Lee, Suwoong Lee, Junhyeok Lee, Zhiyu Liang, Xin Liu , et al. (16 additional authors not shown)

Abstract: The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times.… ▽ More The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners' scores have improved more than 24 times. As computer vision is widely used in many battery-powered systems (such as drones and mobile phones), the need for low-power computer vision will become increasingly important. This paper summarizes LPIRC 2018 by describing the three different tracks and the winners' solutions. △ Less

Submitted 3 October, 2018; originally announced October 2018.

Comments: 13 pages, workshop in 2018 CVPR, competition, low-power, image recognition

Showing 1–13 of 13 results for author: Goo, J