Search | arXiv e-print repository

First Map** the Canopy Height of Primeval Forests in the Tallest Tree Area of Asia

Authors: Guangpeng Fan, Fei Yan, Xiangquan Zeng, Qingtao Xu, Ruoyoulan Wang, Binghong Zhang, Jialing Zhou, Liangliang Nan, **hu Wang, Zhiwei Zhang, Jia Wang

Abstract: We have developed the world's first canopy height map of the distribution area of world-level giant trees. This map** is crucial for discovering more individual and community world-level giant trees, and for analyzing and quantifying the effectiveness of biodiversity conservation measures in the Yarlung Tsangpo Grand Canyon (YTGC) National Nature Reserve. We proposed a method to map the canopy h… ▽ More We have developed the world's first canopy height map of the distribution area of world-level giant trees. This map** is crucial for discovering more individual and community world-level giant trees, and for analyzing and quantifying the effectiveness of biodiversity conservation measures in the Yarlung Tsangpo Grand Canyon (YTGC) National Nature Reserve. We proposed a method to map the canopy height of the primeval forest within the world-level giant tree distribution area by using a spaceborne LiDAR fusion satellite imagery (Global Ecosystem Dynamics Investigation (GEDI), ICESat-2, and Sentinel-2) driven deep learning modeling. And we customized a pyramid receptive fields depth separable CNN (PRFXception). PRFXception, a CNN architecture specifically customized for map** primeval forest canopy height to infer the canopy height at the footprint level of GEDI and ICESat-2 from Sentinel-2 optical imagery with a 10-meter spatial resolution. We conducted a field survey of 227 permanent plots using a stratified sampling method and measured several giant trees using UAV-LS. The predicted canopy height was compared with ICESat-2 and GEDI validation data (RMSE =7.56 m, MAE=6.07 m, ME=-0.98 m, R^2=0.58 m), UAV-LS point clouds (RMSE =5.75 m, MAE =3.72 m, ME = 0.82 m, R^2= 0.65 m), and ground survey data (RMSE = 6.75 m, MAE = 5.56 m, ME= 2.14 m, R^2=0.60 m). We mapped the potential distribution map of world-level giant trees and discovered two previously undetected giant tree communities with an 89% probability of having trees 80-100 m tall, potentially taller than Asia's tallest tree. This paper provides scientific evidence confirming southeastern Tibet--northwestern Yunnan as the fourth global distribution center of world-level giant trees initiatives and promoting the inclusion of the YTGC giant tree distribution area within the scope of China's national park conservation. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.00546 [pdf, other]

On the Estimation of Image-matching Uncertainty in Visual Place Recognition

Authors: Mubariz Zaffar, Liangliang Nan, Julian F. P. Kooij

Abstract: In Visual Place Recognition (VPR) the pose of a query image is estimated by comparing the image to a map of reference images with known reference poses. As is typical for image retrieval problems, a feature extractor maps the query and reference images to a feature space, where a nearest neighbor search is then performed. However, till recently little attention has been given to quantifying the co… ▽ More In Visual Place Recognition (VPR) the pose of a query image is estimated by comparing the image to a map of reference images with known reference poses. As is typical for image retrieval problems, a feature extractor maps the query and reference images to a feature space, where a nearest neighbor search is then performed. However, till recently little attention has been given to quantifying the confidence that a retrieved reference image is a correct match. Highly certain but incorrect retrieval can lead to catastrophic failure of VPR-based localization pipelines. This work compares for the first time the main approaches for estimating the image-matching uncertainty, including the traditional retrieval-based uncertainty estimation, more recent data-driven aleatoric uncertainty estimation, and the compute-intensive geometric verification. We further formulate a simple baseline method, ``SUE'', which unlike the other methods considers the freely-available poses of the reference images in the map. Our experiments reveal that a simple L2-distance between the query and reference descriptors is already a better estimate of image-matching uncertainty than current data-driven approaches. SUE outperforms the other efficient uncertainty estimation methods, and its uncertainty estimates complement the computationally expensive geometric verification approach. Future works for uncertainty estimation in VPR should consider the baselines discussed in this work. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: To appear in the proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2312.12743 [pdf, other]

PointeNet: A Lightweight Framework for Effective and Efficient Point Cloud Analysis

Authors: Lipeng Gu, Xuefeng Yan, Liangliang Nan, Dingkun Zhu, Honghua Chen, Weiming Wang, Mingqiang Wei

Abstract: Current methodologies in point cloud analysis predominantly explore 3D geometries, often achieved through the introduction of intricate learnable geometric extractors in the encoder or by deepening networks with repeated blocks. However, these approaches inevitably lead to a significant number of learnable parameters, resulting in substantial computational costs and imposing memory burdens on CPU/… ▽ More Current methodologies in point cloud analysis predominantly explore 3D geometries, often achieved through the introduction of intricate learnable geometric extractors in the encoder or by deepening networks with repeated blocks. However, these approaches inevitably lead to a significant number of learnable parameters, resulting in substantial computational costs and imposing memory burdens on CPU/GPU. Additionally, the existing strategies are primarily tailored for object-level point cloud classification and segmentation tasks, with limited extensions to crucial scene-level applications, such as autonomous driving. In response to these limitations, we introduce PointeNet, an efficient network designed specifically for point cloud analysis. PointeNet distinguishes itself with its lightweight architecture, low training cost, and plug-and-play capability, effectively capturing representative features. The network consists of a Multivariate Geometric Encoding (MGE) module and an optional Distance-aware Semantic Enhancement (DSE) module. The MGE module employs operations of sampling, grou**, and multivariate geometric aggregation to lightweightly capture and adaptively aggregate multivariate geometric features, providing a comprehensive depiction of 3D geometries. The DSE module, designed for real-world autonomous driving scenarios, enhances the semantic perception of point clouds, particularly for distant points. Our method demonstrates flexibility by seamlessly integrating with a classification/segmentation head or embedding into off-the-shelf 3D object detection networks, achieving notable performance improvements at a minimal cost. Extensive experiments on object-level datasets, including ModelNet40, ScanObjectNN, ShapeNetPart, and the scene-level dataset KITTI, demonstrate the superior performance of PointeNet over state-of-the-art methods in point cloud analysis. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.05046 [pdf, other]

MuVieCAST: Multi-View Consistent Artistic Style Transfer

Authors: Nail Ibrahimli, Julian F. P. Kooij, Liangliang Nan

Abstract: We introduce MuVieCAST, a modular multi-view consistent style transfer network architecture that enables consistent style transfer between multiple viewpoints of the same scene. This network architecture supports both sparse and dense views, making it versatile enough to handle a wide range of multi-view image datasets. The approach consists of three modules that perform specific tasks related to… ▽ More We introduce MuVieCAST, a modular multi-view consistent style transfer network architecture that enables consistent style transfer between multiple viewpoints of the same scene. This network architecture supports both sparse and dense views, making it versatile enough to handle a wide range of multi-view image datasets. The approach consists of three modules that perform specific tasks related to style transfer, namely content preservation, image transformation, and multi-view consistency enforcement. We extensively evaluate our approach across multiple application domains including depth-map-based point cloud fusion, mesh reconstruction, and novel-view synthesis. Our experiments reveal that the proposed framework achieves an exceptional generation of stylized images, exhibiting consistent outcomes across perspectives. A user study focusing on novel-view synthesis further confirms these results, with approximately 68\% of cases participants expressing a preference for our generated outputs compared to the recent state-of-the-art method. Our modular framework is extensible and can easily be integrated with various backbone architectures, making it a flexible solution for multi-view style transfer. More results are demonstrated on our project page: muviecast.github.io △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04891 [pdf, other]

Cross-BERT for Point Cloud Pretraining

Authors: Xin Li, Peng Li, Zeyong Wei, Zhe Zhu, Mingqiang Wei, Junhui Hou, Liangliang Nan, **g Qin, Haoran Xie, Fu Lee Wang

Abstract: Introducing BERT into cross-modal settings raises difficulties in its optimization for handling multiple modalities. Both the BERT architecture and training objective need to be adapted to incorporate and model information from different modalities. In this paper, we address these challenges by exploring the implicit semantic and geometric correlations between 2D and 3D data of the same objects/sc… ▽ More Introducing BERT into cross-modal settings raises difficulties in its optimization for handling multiple modalities. Both the BERT architecture and training objective need to be adapted to incorporate and model information from different modalities. In this paper, we address these challenges by exploring the implicit semantic and geometric correlations between 2D and 3D data of the same objects/scenes. We propose a new cross-modal BERT-style self-supervised learning paradigm, called Cross-BERT. To facilitate pretraining for irregular and sparse point clouds, we design two self-supervised tasks to boost cross-modal interaction. The first task, referred to as Point-Image Alignment, aims to align features between unimodal and cross-modal representations to capture the correspondences between the 2D and 3D modalities. The second task, termed Masked Cross-modal Modeling, further improves mask modeling of BERT by incorporating high-dimensional semantic information obtained by cross-modal interaction. By performing cross-modal interaction, Cross-BERT can smoothly reconstruct the masked tokens during pretraining, leading to notable performance enhancements for downstream tasks. Through empirical evaluation, we demonstrate that Cross-BERT outperforms existing state-of-the-art methods in 3D downstream applications. Our work highlights the effectiveness of leveraging cross-modal 2D knowledge to strengthen 3D point cloud representation and the transferable capability of BERT across modalities. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.09805 [pdf, other]

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data

Authors: Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan

Abstract: Recent LLMs have demonstrated remarkable performance in solving exam-like math word problems. However, the degree to which these numerical reasoning skills are effective in real-world scenarios, particularly in expert domains, is still largely unexplored. This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning and problem-solving capa… ▽ More Recent LLMs have demonstrated remarkable performance in solving exam-like math word problems. However, the degree to which these numerical reasoning skills are effective in real-world scenarios, particularly in expert domains, is still largely unexplored. This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning and problem-solving capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables. We evaluate a wide spectrum of 19 LLMs, including those specialized in coding and finance. We also incorporate different prompting strategies (i.e., Chain-of-Thoughts and Program-of-Thoughts) to comprehensively assess the capabilities and limitations of existing LLMs in DocMath-Eval. We found that, although the current best-performing system (i.e., GPT-4), can perform well on simple problems such as calculating the rate of increase in a financial metric within a short document context, it significantly lags behind human experts in more complex problems grounded in longer contexts. We believe DocMath-Eval can be used as a valuable benchmark to evaluate LLMs' capabilities to solve challenging numerical reasoning problems in expert domains. We will release the benchmark and code at https://github.com/yale-nlp/DocMath-Eval. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: work in progress

arXiv:2311.09721 [pdf, other]

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Authors: Linyong Nan, Ellen Zhang, Wei** Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

Abstract: This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models (LLMs) interact with a SQL interpreter. The task necessitates LLMs to strategically generate multiple SQL queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. Our findings high… ▽ More This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models (LLMs) interact with a SQL interpreter. The task necessitates LLMs to strategically generate multiple SQL queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. Our findings highlight that this task poses great challenges even for the state-of-the-art GPT-4 model. We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction. A key discovery is the identification of two primary bottlenecks hindering effective interaction: the capacity for planning and the ability to generate multiple SQL queries. To address the challenge of accurately assessing answer quality, we introduce a multi-agent evaluation framework that simulates the academic peer-review process, enhancing the precision and reliability of our evaluations. This framework allows for a more nuanced understanding of the strengths and limitations of current LLMs in complex retrieval and reasoning tasks. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2309.14516 [pdf, other]

UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities

Authors: Shiming Wang, Holger Caesar, Liangliang Nan, Julian F. P. Kooij

Abstract: Multi-sensor object detection is an active research topic in automated driving, but the robustness of such detection models against missing sensor input (modality missing), e.g., due to a sudden sensor failure, is a critical problem which remains under-studied. In this work, we propose UniBEV, an end-to-end multi-modal 3D object detection framework designed for robustness against missing modalitie… ▽ More Multi-sensor object detection is an active research topic in automated driving, but the robustness of such detection models against missing sensor input (modality missing), e.g., due to a sudden sensor failure, is a critical problem which remains under-studied. In this work, we propose UniBEV, an end-to-end multi-modal 3D object detection framework designed for robustness against missing modalities: UniBEV can operate on LiDAR plus camera input, but also on LiDAR-only or camera-only input without retraining. To facilitate its detector head to handle different input combinations, UniBEV aims to create well-aligned Bird's Eye View (BEV) feature maps from each available modality. Unlike prior BEV-based multi-modal detection methods, all sensor modalities follow a uniform approach to resample features from the native sensor coordinate systems to the BEV features. We furthermore investigate the robustness of various fusion strategies w.r.t. missing modalities: the commonly used feature concatenation, but also channel-wise averaging, and a generalization to weighted averaging termed Channel Normalized Weights. To validate its effectiveness, we compare UniBEV to state-of-the-art BEVFusion and MetaBEV on nuScenes over all sensor input combinations. In this setting, UniBEV achieves $52.5 \%$ mAP on average over all input combinations, significantly improving over the baselines ($43.5 \%$ mAP on average for BEVFusion, $48.7 \%$ mAP on average for MetaBEV). An ablation study shows the robustness benefits of fusing by weighted averaging over regular concatenation, and of sharing queries between the BEV encoders of each modality. Our code is available at https://github.com/tudelft-iv/UniBEV. △ Less

Submitted 8 May, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted by IEEE Intelligent Vehicles Symposium (IV 2024), camera-ready. Code: https://github.com/tudelft-iv/UniBEV

arXiv:2307.08636 [pdf, other]

PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds

Authors: Zhaiyu Chen, Yilei Shi, Liangliang Nan, Zhitong Xiong, Xiao Xiang Zhu

Abstract: We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds. PolyGNN learns to assemble primitives obtained by polyhedral decomposition via graph node classification, achieving a watertight, compact, and weakly semantic reconstruction. To effectively represent arbitrary-shaped polyhedra in the neural network, we propose three different sampling stra… ▽ More We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds. PolyGNN learns to assemble primitives obtained by polyhedral decomposition via graph node classification, achieving a watertight, compact, and weakly semantic reconstruction. To effectively represent arbitrary-shaped polyhedra in the neural network, we propose three different sampling strategies to select representative points as polyhedron-wise queries, enabling efficient occupancy inference. Furthermore, we incorporate the inter-polyhedron adjacency to enhance the classification of the graph nodes. We also observe that existing city-building models are abstractions of the underlying instances. To address this abstraction gap and provide a fair evaluation of the proposed method, we develop our method on a large-scale synthetic dataset covering 500k+ buildings with well-defined ground truths of polyhedral class labels. We further conduct a transferability analysis across cities and on real-world point clouds. Both qualitative and quantitative results demonstrate the effectiveness of our method, particularly its efficiency for large-scale reconstructions. The source code and data of our work are available at https://github.com/chenzhaiyu/polygnn. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.14321 [pdf, other]

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

Authors: Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev

Abstract: Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. To systematically study the robustness of Table QA models, we propose a benchmark called RobuT, which builds upon existing Table… ▽ More Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e.g., replacing key question entities or shuffling table columns. To systematically study the robustness of Table QA models, we propose a benchmark called RobuT, which builds upon existing Table QA datasets (WTQ, WikiSQL-Weak, and SQA) and includes human-annotated adversarial perturbations in terms of table header, table content, and question. Our results indicate that both state-of-the-art Table QA models and large language models (e.g., GPT-3) with few-shot learning falter in these adversarial sets. We propose to address this problem by using large language models to generate adversarial examples to enhance training, which significantly improves the robustness of Table QA models. Our data and code is publicly available at https://github.com/yilunzhao/RobuT. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: Accepted at ACL 2023

arXiv:2305.14987 [pdf, other]

Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios

Authors: Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan

Abstract: Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this p… ▽ More Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this paper, we investigate the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. These include the LogicNLG and our newly-constructed LoTNLG datasets for data insight generation, along with the FeTaQA and our newly-constructed F2WTQ datasets for query-based generation. We structure our investigation around three research questions, evaluating the performance of LLMs in table-to-text generation, automated evaluation, and feedback generation, respectively. Experimental results indicate that the current high-performing LLM, specifically GPT-4, can effectively serve as a table-to-text generator, evaluator, and feedback generator, facilitating users' information seeking purposes in real-world scenarios. However, a significant performance gap still exists between other open-sourced LLMs (e.g., Tulu and LLaMA-2) and GPT-4 models. Our data and code are publicly available at https://github.com/yale-nlp/LLM-T2T. △ Less

Submitted 30 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Camera-ready version for EMNLP 2023 industry track

arXiv:2305.14303 [pdf, other]

QTSumm: Query-Focused Summarization over Tabular Data

Authors: Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Wei** Zou, Simeng Han, Ruizhe Chen, Xiangru Tang, Yumo Xu, Dragomir Radev, Arman Cohan

Abstract: People primarily consult tables to conduct data analysis or answer specific questions. Text generation systems that can provide accurate table summaries tailored to users' information needs can facilitate more efficient access to relevant data insights. Motivated by this, we define a new query-focused table summarization task, where text generation models have to perform human-like reasoning and a… ▽ More People primarily consult tables to conduct data analysis or answer specific questions. Text generation systems that can provide accurate table summaries tailored to users' information needs can facilitate more efficient access to relevant data insights. Motivated by this, we define a new query-focused table summarization task, where text generation models have to perform human-like reasoning and analysis over the given table to generate a tailored summary. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables covering diverse topics. We investigate a set of strong baselines on QTSumm, including text generation, table-to-text generation, and large language models. Experimental results and manual analysis reveal that the new task presents significant challenges in table-to-text generation for future research. Moreover, we propose a new approach named ReFactor, to retrieve and reason over query-relevant information from tabular data to generate several natural language facts. Experimental results demonstrate that ReFactor can bring improvements to baselines by concatenating the generated facts to the model input. Our data and code are publicly available at https://github.com/yale-nlp/QTSumm. △ Less

Submitted 6 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at EMNLP 2023

arXiv:2305.12586 [pdf, other]

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

Authors: Linyong Nan, Yilun Zhao, Wei** Zou, Narutatsu Ri, Jaesung Tae, Ellen Zhang, Arman Cohan, Dragomir Radev

Abstract: In-context learning (ICL) has emerged as a new approach to various natural language processing tasks, utilizing large language models (LLMs) to make predictions based on context that has been supplemented with a few examples or task-specific instructions. In this paper, we aim to extend this method to question answering tasks that utilize structured knowledge sources, and improve Text-to-SQL syste… ▽ More In-context learning (ICL) has emerged as a new approach to various natural language processing tasks, utilizing large language models (LLMs) to make predictions based on context that has been supplemented with a few examples or task-specific instructions. In this paper, we aim to extend this method to question answering tasks that utilize structured knowledge sources, and improve Text-to-SQL systems by exploring various prompt design strategies for employing LLMs. We conduct a systematic investigation into different demonstration selection methods and optimal instruction formats for prompting LLMs in the Text-to-SQL task. Our approach involves leveraging the syntactic structure of an example's SQL query to retrieve demonstrations, and we demonstrate that pursuing both diversity and similarity in demonstration selection leads to enhanced performance. Furthermore, we show that LLMs benefit from database-related knowledge augmentations. Our most effective strategy outperforms the state-of-the-art system by 2.5 points (Execution Accuracy) and the best fine-tuned system by 5.1 points on the Spider dataset. These results highlight the effectiveness of our approach in adapting LLMs to the Text-to-SQL task, and we present an analysis of the factors contributing to the success of our strategy. △ Less

Submitted 21 May, 2023; originally announced May 2023.

arXiv:2304.07426 [pdf, other]

doi 10.1109/TRO.2023.3262106

CoPR: Towards Accurate Visual Localization With Continuous Place-descriptor Regression

Authors: Mubariz Zaffar, Liangliang Nan, Julian Francisco Pieter Kooij

Abstract: Visual Place Recognition (VPR) is an image-based localization method that estimates the camera location of a query image by retrieving the most similar reference image from a map of geo-tagged reference images. In this work, we look into two fundamental bottlenecks for its localization accuracy: reference map sparseness and viewpoint invariance. Firstly, the reference images for VPR are only avail… ▽ More Visual Place Recognition (VPR) is an image-based localization method that estimates the camera location of a query image by retrieving the most similar reference image from a map of geo-tagged reference images. In this work, we look into two fundamental bottlenecks for its localization accuracy: reference map sparseness and viewpoint invariance. Firstly, the reference images for VPR are only available at sparse poses in a map, which enforces an upper bound on the maximum achievable localization accuracy through VPR. We therefore propose Continuous Place-descriptor Regression (CoPR) to densify the map and improve localization accuracy. We study various interpolation and extrapolation models to regress additional VPR feature descriptors from only the existing references. Secondly, we compare different feature encoders and show that CoPR presents value for all of them. We evaluate our models on three existing public datasets and report on average around 30% improvement in VPR-based localization accuracy using CoPR, on top of the 15% increase by using a viewpoint-variant loss for the feature encoder. The complementary relation between CoPR and Relative Pose Estimation is also discussed. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: Published in the IEEE Transactions on Robotics, April 2023

arXiv:2302.02962 [pdf, other]

LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control

Authors: Yilun Zhao, Zhenting Qi, Linyong Nan, Lorenzo Jaime Yu Flores, Dragomir Radev

Abstract: Logical Table-to-Text (LT2T) generation is tasked with generating logically faithful sentences from tables. There currently exists two challenges in the field: 1) Faithfulness: how to generate sentences that are factually correct given the table content; 2) Diversity: how to generate multiple sentences that offer different perspectives on the table. This work proposes LoFT, which utilizes logic fo… ▽ More Logical Table-to-Text (LT2T) generation is tasked with generating logically faithful sentences from tables. There currently exists two challenges in the field: 1) Faithfulness: how to generate sentences that are factually correct given the table content; 2) Diversity: how to generate multiple sentences that offer different perspectives on the table. This work proposes LoFT, which utilizes logic forms as fact verifiers and content planners to control LT2T generation. Experimental results on the LogicNLG dataset demonstrate that LoFT is the first model that addresses unfaithfulness and lack of diversity issues simultaneously. Our code is publicly available at https://github.com/Yale-LILY/LoFT. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: Accepted at EACL 2023 as a short paper

arXiv:2212.12402 [pdf, other]

doi 10.1109/3DV57658.2022.00025

Push-the-Boundary: Boundary-aware Feature Propagation for Semantic Segmentation of 3D Point Clouds

Authors: Shenglan Du, Nail Ibrahimli, Jantien Stoter, Julian Kooij, Liangliang Nan

Abstract: Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds. Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation. Prior works either address this issue by post-processing or jointly learn object boundari… ▽ More Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds. Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation. Prior works either address this issue by post-processing or jointly learn object boundaries to implicitly improve feature encoding of the networks. These approaches often require additional modules which are difficult to integrate into the original architecture. To improve the segmentation near object boundaries, we propose a boundary-aware feature propagation mechanism. This mechanism is achieved by exploiting a multi-task learning framework that aims to explicitly guide the boundaries to their original locations. With one shared encoder, our network outputs (i) boundary localization, (ii) prediction of directions pointing to the object's interior, and (iii) semantic segmentation, in three parallel streams. The predicted boundaries and directions are fused to propagate the learned features to refine the segmentation. We conduct extensive experiments on the S3DIS and SensatUrban datasets against various baseline methods, demonstrating that our proposed approach yields consistent improvements by reducing boundary errors. Our code is available at https://github.com/shenglandu/PushBoundary. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: 3DV2022

arXiv:2212.07981 [pdf, other]

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation

Authors: Yixin Liu, Alexander R. Fabbri, Pengfei Liu, Yilun Zhao, Linyong Nan, Ruilin Han, Simeng Han, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev

Abstract: Human evaluation is the foundation upon which the evaluation of both summarization systems and automatic metrics rests. However, existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale, and an in-depth analysis of human evaluation is lacking. Therefore, we address the shortcomings of existing summarization evaluation along the f… ▽ More Human evaluation is the foundation upon which the evaluation of both summarization systems and automatic metrics rests. However, existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale, and an in-depth analysis of human evaluation is lacking. Therefore, we address the shortcomings of existing summarization evaluation along the following axes: (1) We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units and allows for a high inter-annotator agreement. (2) We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems on three datasets. (3) We conduct a comparative study of four human evaluation protocols, underscoring potential confounding factors in evaluation setups. (4) We evaluate 50 automatic metrics and their variants using the collected human annotations across evaluation protocols and demonstrate how our benchmark leads to more statistically stable and significant results. The metrics we benchmarked include recent methods based on large language models (LLMs), GPTScore and G-Eval. Furthermore, our findings have important implications for evaluating LLMs, as we show that LLMs adjusted by human feedback (e.g., GPT-3.5) may overfit unconstrained human evaluation, which is affected by the annotators' prior, input-agnostic preferences, calling for more robust, targeted evaluation methods. △ Less

Submitted 6 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: ACL 2023 Camera Ready

arXiv:2210.12374 [pdf, other]

ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples

Authors: Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev

Abstract: Reasoning over tabular data requires both table structure understanding and a broad set of table reasoning skills. Current models with table-specific architectures and pre-training methods perform well on understanding table structures, but they still struggle with tasks that require various table reasoning skills. In this work, we develop ReasTAP to show that high-level table reasoning skills can… ▽ More Reasoning over tabular data requires both table structure understanding and a broad set of table reasoning skills. Current models with table-specific architectures and pre-training methods perform well on understanding table structures, but they still struggle with tasks that require various table reasoning skills. In this work, we develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design. We define 7 table reasoning skills, such as numerical operation, temporal comparison, and conjunction. Each reasoning skill is associated with one example generator, which synthesizes questions over semi-structured tables according to the sampled templates. We model the table pre-training task as a sequence generation task and pre-train ReasTAP to generate precise answers to the synthetic examples. ReasTAP is evaluated on four benchmarks covering three downstream tasks including: 1) WikiSQL and WTQ for Table Question Answering; 2) TabFact for Table Fact Verification; and 3) LogicNLG for Faithful Table-to-Text Generation. Experimental results demonstrate that ReasTAP achieves new state-of-the-art performance on all benchmarks and delivers a significant improvement on low-resource setting. Our code is publicly available at https://github.com/Yale-LILY/ReasTAP. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: accepted by EMNLP 2022

arXiv:2209.00840 [pdf, other]

FOLIO: Natural Language Reasoning with First-Order Logic

Authors: Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri , et al. (10 additional authors not shown)

Abstract: Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FO… ▽ More Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4. △ Less

Submitted 17 May, 2024; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.14796 [pdf, other]

3DLG-Detector: 3D Object Detection via Simultaneous Local-Global Feature Learning

Authors: Baian Chen, Liangliang Nan, Haoran Xie, Dening Lu, Fu Lee Wang, Mingqiang Wei

Abstract: Capturing both local and global features of irregular point clouds is essential to 3D object detection (3OD). However, mainstream 3D detectors, e.g., VoteNet and its variants, either abandon considerable local features during pooling operations or ignore many global features in the whole scene context. This paper explores new modules to simultaneously learn local-global features of scene point clo… ▽ More Capturing both local and global features of irregular point clouds is essential to 3D object detection (3OD). However, mainstream 3D detectors, e.g., VoteNet and its variants, either abandon considerable local features during pooling operations or ignore many global features in the whole scene context. This paper explores new modules to simultaneously learn local-global features of scene point clouds that serve 3OD positively. To this end, we propose an effective 3OD network via simultaneous local-global feature learning (dubbed 3DLG-Detector). 3DLG-Detector has two key contributions. First, it develops a Dynamic Points Interaction (DPI) module that preserves effective local features during pooling. Besides, DPI is detachable and can be incorporated into existing 3OD networks to boost their performance. Second, it develops a Global Context Aggregation module to aggregate multi-scale features from different layers of the encoder to achieve scene context-awareness. Our method shows improvements over thirteen competitors in terms of detection accuracy and robustness on both the SUN RGB-D and ScanNet datasets. Source code will be available upon publication. △ Less

Submitted 31 August, 2022; originally announced August 2022.

arXiv:2208.00751 [pdf, other]

doi 10.1109/TVCG.2023.3236061

CSDN: Cross-modal Shape-transfer Dual-refinement Network for Point Cloud Completion

Authors: Zhe Zhu, Liangliang Nan, Haoran Xie, Honghua Chen, Mingqiang Wei, Jun Wang, **g Qin

Abstract: How will you repair a physical object with some missings? You may imagine its original shape from previously captured images, recover its overall (global) but coarse shape first, and then refine its local details. We are motivated to imitate the physical repair procedure to address point cloud completion. To this end, we propose a cross-modal shape-transfer dual-refinement network (termed CSDN), a… ▽ More How will you repair a physical object with some missings? You may imagine its original shape from previously captured images, recover its overall (global) but coarse shape first, and then refine its local details. We are motivated to imitate the physical repair procedure to address point cloud completion. To this end, we propose a cross-modal shape-transfer dual-refinement network (termed CSDN), a coarse-to-fine paradigm with images of full-cycle participation, for quality point cloud completion. CSDN mainly consists of "shape fusion" and "dual-refinement" modules to tackle the cross-modal challenge. The first module transfers the intrinsic shape characteristics from single images to guide the geometry generation of the missing regions of point clouds, in which we propose IPAdaIN to embed the global features of both the image and the partial point cloud into completion. The second module refines the coarse output by adjusting the positions of the generated points, where the local refinement unit exploits the geometric relation between the novel and the input points by graph convolution, and the global constraint unit utilizes the input image to fine-tune the generated offset. Different from most existing approaches, CSDN not only explores the complementary information from images but also effectively exploits cross-modal data in the whole coarse-to-fine completion procedure. Experimental results indicate that CSDN performs favorably against ten competitors on the cross-modal benchmark. △ Less

Submitted 11 December, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

arXiv:2205.12476 [pdf, other]

Leveraging Locality in Abstractive Text Summarization

Authors: Yixin Liu, Ansong Ni, Linyong Nan, Budhaditya Deb, Chenguang Zhu, Ahmed H. Awadallah, Dragomir Radev

Abstract: Neural attention models have achieved significant improvements on many natural language processing tasks. However, the quadratic memory complexity of the self-attention module with respect to the input length hinders their applications in long text summarization. Instead of designing more efficient attention modules, we approach this problem by investigating if models with a restricted context can… ▽ More Neural attention models have achieved significant improvements on many natural language processing tasks. However, the quadratic memory complexity of the self-attention module with respect to the input length hinders their applications in long text summarization. Instead of designing more efficient attention modules, we approach this problem by investigating if models with a restricted context can have competitive performance compared with the memory-efficient attention models that maintain a global context by treating the input as a single sequence. Our model is applied to individual pages which contain parts of inputs grouped by the principle of locality during both encoding and decoding. We empirically investigated three kinds of locality in text summarization at different levels of granularity, ranging from sentences to documents. Our experimental results show that our model has a better performance compared with strong baselines with efficient attention modules, and our analysis provides further insights into our locality-aware modeling strategy. △ Less

Submitted 30 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: Accepted to EMNLP 2022

arXiv:2205.12467 [pdf, other]

R2D2: Robust Data-to-Text with Replacement Detection

Authors: Linyong Nan, Lorenzo Jaime Yu Flores, Yilun Zhao, Yixin Liu, Luke Benson, Wei** Zou, Dragomir Radev

Abstract: Unfaithful text generation is a common problem for text generation systems. In the case of Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any real-world applications. We introduce R2D2, a training framework that addresses unfaithful Data-to-Text generation by training a system both as a generator and a faithfulness discriminator with additional replace… ▽ More Unfaithful text generation is a common problem for text generation systems. In the case of Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any real-world applications. We introduce R2D2, a training framework that addresses unfaithful Data-to-Text generation by training a system both as a generator and a faithfulness discriminator with additional replacement detection and unlikelihood learning tasks. To facilitate such training, we propose two methods for sampling unfaithful sentences. We argue that the poor entity retrieval capability of D2T systems is one of the primary sources of unfaithfulness, so in addition to the existing metrics, we further propose NER-based metrics to evaluate the fidelity of D2T generations. Our experimental results show that R2D2 systems could effectively mitigate the unfaithful text generation, and they achieve new state-of-the-art results on FeTaQA, LogicNLG, and ToTTo, all with significant improvements. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2203.01391 [pdf, other]

DDL-MVS: Depth Discontinuity Learning for MVS Networks

Authors: Nail Ibrahimli, Hugo Ledoux, Julian Kooij, Liangliang Nan

Abstract: Traditional MVS methods have good accuracy but struggle with completeness, while recently developed learning-based multi-view stereo (MVS) techniques have improved completeness except accuracy being compromised. We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction. Our idea is to jointly estimate the depth a… ▽ More Traditional MVS methods have good accuracy but struggle with completeness, while recently developed learning-based multi-view stereo (MVS) techniques have improved completeness except accuracy being compromised. We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction. Our idea is to jointly estimate the depth and boundary maps where the boundary maps are explicitly used for further refinement of the depth maps. We validate our idea and demonstrate that our strategies can be easily integrated into the existing learning-based MVS pipeline where the reconstruction depends on high-quality depth map estimation. Extensive experiments on various datasets show that our method improves reconstruction quality compared to baseline. Experiments also demonstrate that the presented model and strategies have good generalization capabilities. The source code will be available soon. △ Less

Submitted 12 June, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.03209 [pdf, other]

PSSNet: Planarity-sensible Semantic Segmentation of Large-scale Urban Meshes

Authors: Weixiao Gao, Liangliang Nan, Bas Boom, Hugo Ledoux

Abstract: We introduce a novel deep learning-based framework to interpret 3D urban scenes represented as textured meshes. Based on the observation that object boundaries typically align with the boundaries of planar regions, our framework achieves semantic segmentation in two steps: planarity-sensible over-segmentation followed by semantic classification. The over-segmentation step generates an initial set… ▽ More We introduce a novel deep learning-based framework to interpret 3D urban scenes represented as textured meshes. Based on the observation that object boundaries typically align with the boundaries of planar regions, our framework achieves semantic segmentation in two steps: planarity-sensible over-segmentation followed by semantic classification. The over-segmentation step generates an initial set of mesh segments that capture the planar and non-planar regions of urban scenes. In the subsequent classification step, we construct a graph that encodes the geometric and photometric features of the segments in its nodes and the multi-scale contextual features in its edges. The final semantic segmentation is obtained by classifying the segments using a graph convolutional network. Experiments and comparisons on two semantic urban mesh benchmarks demonstrate that our approach outperforms the state-of-the-art methods in terms of boundary quality, mean IoU (intersection over union), and generalization ability. We also introduce several new metrics for evaluating mesh over-segmentation methods dedicated to semantic segmentation, and our proposed over-segmentation approach outperforms state-of-the-art methods on all metrics. Our source code is available at \url{https://github.com/WeixiaoGao/PSSNet}. △ Less

Submitted 24 December, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: 24 pages,11 figures

arXiv:2202.01829 [pdf, other]

doi 10.1145/3516521

HRBF-Fusion: Accurate 3D reconstruction from RGB-D data using on-the-fly implicits

Authors: Yabin Xu, Liangliang Nan, Laishui Zhou, Jun Wang, Charlie C. L. Wang

Abstract: Reconstruction of high-fidelity 3D objects or scenes is a fundamental research problem. Recent advances in RGB-D fusion have demonstrated the potential of producing 3D models from consumer-level RGB-D cameras. However, due to the discrete nature and limited resolution of their surface representations (e.g., point- or voxel-based), existing approaches suffer from the accumulation of errors in camer… ▽ More Reconstruction of high-fidelity 3D objects or scenes is a fundamental research problem. Recent advances in RGB-D fusion have demonstrated the potential of producing 3D models from consumer-level RGB-D cameras. However, due to the discrete nature and limited resolution of their surface representations (e.g., point- or voxel-based), existing approaches suffer from the accumulation of errors in camera tracking and distortion in the reconstruction, which leads to an unsatisfactory 3D reconstruction. In this paper, we present a method using on-the-fly implicits of Hermite Radial Basis Functions (HRBFs) as a continuous surface representation for camera tracking in an existing RGB-D fusion framework. Furthermore, curvature estimation and confidence evaluation are coherently derived from the inherent surface properties of the on-the-fly HRBF implicits, which devote to a data fusion with better quality. We argue that our continuous but on-the-fly surface representation can effectively mitigate the impact of noise with its robustness and constrain the reconstruction with inherent surface smoothness when being compared with discrete representations. Experimental results on various real-world and synthetic datasets demonstrate that our HRBF-fusion outperforms the state-of-the-art approaches in terms of tracking robustness and reconstruction accuracy. △ Less

Submitted 13 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

arXiv:2201.10276 [pdf, other]

doi 10.3390/rs14092254

City3D: Large-Scale Building Reconstruction from Airborne LiDAR Point Clouds

Authors: ** Huang, Jantien Stoter, Ravi Peters, Liangliang Nan

Abstract: We present a fully automatic approach for reconstructing compact 3D building models from large-scale airborne point clouds. A major challenge of urban reconstruction from airborne LiDAR point clouds lies in that the vertical walls are typically missing. Based on the observation that urban buildings typically consist of planar roofs connected with vertical walls to the ground, we propose an approac… ▽ More We present a fully automatic approach for reconstructing compact 3D building models from large-scale airborne point clouds. A major challenge of urban reconstruction from airborne LiDAR point clouds lies in that the vertical walls are typically missing. Based on the observation that urban buildings typically consist of planar roofs connected with vertical walls to the ground, we propose an approach to infer the vertical walls directly from the data. With the planar segments of both roofs and walls, we hypothesize the faces of the building surface, and the final model is obtained by using an extended hypothesis-and-selection-based polygonal surface reconstruction framework. Specifically, we introduce a new energy term to encourage roof preferences and two additional hard constraints into the optimization step to ensure correct topology and enhance detail recovery. Experiments on various large-scale airborne LiDAR point clouds have demonstrated that the method is superior to the state-of-the-art methods in terms of reconstruction accuracy and robustness. In addition, we have generated a new dataset with our method consisting of the point clouds and 3D models of 20k real-world buildings. We believe this dataset can stimulate research in urban reconstruction from airborne LiDAR point clouds and the use of 3D city models in urban applications. △ Less

Submitted 9 March, 2023; v1 submitted 25 January, 2022; originally announced January 2022.

arXiv:2112.13142 [pdf, other]

doi 10.1016/j.isprsjprs.2022.09.017

Reconstructing Compact Building Models from Point Clouds Using Deep Implicit Fields

Authors: Zhaiyu Chen, Hugo Ledoux, Seyran Khademi, Liangliang Nan

Abstract: While three-dimensional (3D) building models play an increasingly pivotal role in many real-world applications, obtaining a compact representation of buildings remains an open problem. In this paper, we present a novel framework for reconstructing compact, watertight, polygonal building models from point clouds. Our framework comprises three components: (a) a cell complex is generated via adaptive… ▽ More While three-dimensional (3D) building models play an increasingly pivotal role in many real-world applications, obtaining a compact representation of buildings remains an open problem. In this paper, we present a novel framework for reconstructing compact, watertight, polygonal building models from point clouds. Our framework comprises three components: (a) a cell complex is generated via adaptive space partitioning that provides a polyhedral embedding as the candidate set; (b) an implicit field is learned by a deep neural network that facilitates building occupancy estimation; (c) a Markov random field is formulated to extract the outer surface of a building via combinatorial optimization. We evaluate and compare our method with state-of-the-art methods in generic reconstruction, model-based reconstruction, geometry simplification, and primitive assembly. Experiments on both synthetic and real-world point clouds have demonstrated that, with our neural-guided strategy, high-quality building models can be obtained with significant advantages in fidelity, compactness, and computational efficiency. Our method also shows robustness to noise and insufficient measurements, and it can directly generalize from synthetic scans to real-world measurements. The source code of this work is freely available at https://github.com/chenzhaiyu/points2poly. △ Less

Submitted 26 September, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

Comments: Accepted for publication in ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:2112.11121 [pdf, other]

doi 10.1016/j.isprsjprs.2023.01.013

GlobalMatch: Registration of Forest Terrestrial Point Clouds by Global Matching of Relative Stem Positions

Authors: Xufei Wang, Zexin Yang, Xiaojun Cheng, Jantien Stoter, Wenbing Xu, Zhenlun Wu, Liangliang Nan

Abstract: Registering point clouds of forest environments is an essential prerequisite for LiDAR applications in precision forestry. State-of-the-art methods for forest point cloud registration require the extraction of individual tree attributes, and they have an efficiency bottleneck when dealing with point clouds of real-world forests with dense trees. We propose an automatic, robust, and efficient metho… ▽ More Registering point clouds of forest environments is an essential prerequisite for LiDAR applications in precision forestry. State-of-the-art methods for forest point cloud registration require the extraction of individual tree attributes, and they have an efficiency bottleneck when dealing with point clouds of real-world forests with dense trees. We propose an automatic, robust, and efficient method for the registration of forest point clouds. Our approach first locates tree stems from raw point clouds and then matches the stems based on their relative spatial relationship to determine the registration transformation. The algorithm requires no extra individual tree attributes and has quadratic complexity to the number of trees in the environment, allowing it to align point clouds of large forest environments. Extensive experiments on forest terrestrial point clouds have revealed that our method inherits the effectiveness and robustness of the stem-based registration strategy while exceedingly increasing its efficiency. Besides, we introduce a new benchmark dataset that complements the very few existing open datasets for the development and evaluation of registration methods for forest point clouds. The source code of our method and the dataset are available at https://github.com/zexinyang/GlobalMatch. △ Less

Submitted 1 April, 2023; v1 submitted 21 December, 2021; originally announced December 2021.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing. Vol. 197, 71-86, 2023

arXiv:2112.09902 [pdf, other]

doi 10.1109/TGRS.2022.3183567

3D Instance Segmentation of MVS Buildings

Authors: Jiazhou Chen, Yanghui Xu, Shufang Lu, Ronghua Liang, Liangliang Nan

Abstract: We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images… ▽ More We present a novel 3D instance segmentation framework for Multi-View Stereo (MVS) buildings in urban scenes. Unlike existing works focusing on semantic segmentation of urban scenes, the emphasis of this work lies in detecting and segmenting 3D building instances even if they are attached and embedded in a large and imprecise 3D surface model. Multi-view RGB images are first enhanced to RGBH images by adding a heightmap and are segmented to obtain all roof instances using a fine-tuned 2D instance segmentation neural network. Instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlap**, which can eliminate segmentation ambiguities among multi-view images. Based on these global masks, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through a Markov random field optimization. A new dataset that contains instance-level annotation for both 3D urban scenes (roofs and buildings) and drone images (roofs) is provided. To the best of our knowledge, it is the first outdoor dataset dedicated to 3D instance segmentation with much more annotations of attached 3D buildings than existing datasets. Quantitative evaluations and ablation studies have shown the effectiveness of all major steps and the advantages of our multi-view framework over the orthophoto-based method. △ Less

Submitted 22 June, 2022; v1 submitted 18 December, 2021; originally announced December 2021.

Comments: 14 figures, 12 figures

arXiv:2104.00369 [pdf, other]

FeTaQA: Free-form Table Question Answering

Authors: Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev

Abstract: Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question an… ▽ More Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question answering, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations. We provide two benchmark methods for the proposed task: a pipeline method based on semantic-parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.00355 [pdf, other]

doi 10.1016/j.isprsjprs.2021.07.008

SUM: A Benchmark Dataset of Semantic Urban Meshes

Authors: Weixiao Gao, Liangliang Nan, Bas Boom, Hugo Ledoux

Abstract: Recent developments in data acquisition technology allow us to collect 3D texture meshes quickly. Those can help us understand and analyse the urban environment, and as a consequence are useful for several applications like spatial analysis and urban planning. Semantic segmentation of texture meshes through deep learning methods can enhance this understanding, but it requires a lot of labelled dat… ▽ More Recent developments in data acquisition technology allow us to collect 3D texture meshes quickly. Those can help us understand and analyse the urban environment, and as a consequence are useful for several applications like spatial analysis and urban planning. Semantic segmentation of texture meshes through deep learning methods can enhance this understanding, but it requires a lot of labelled data. The contributions of this work are threefold: (1) a new benchmark dataset of semantic urban meshes, (2) a novel semi-automatic annotation framework, and (3) an annotation tool for 3D meshes. In particular, our dataset covers about 4 km2 in Helsinki (Finland), with six classes, and we estimate that we save about 600 hours of labelling work using our annotation framework, which includes initial segmentation and interactive refinement. We also compare the performance of several state-of-theart 3D semantic segmentation methods on the new benchmark dataset. Other researchers can use our results to train their networks: the dataset is publicly available, and the annotation tool is released as open-source. △ Less

Submitted 13 July, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

Comments: 27 pages, 14 figures

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, Volume 179, September 2021, Pages 108-120

arXiv:2007.02871 [pdf, other]

DART: Open-Domain Structured Data Record to Text Generation

Authors: Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

Abstract: We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploi… ▽ More We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks by utilizing techniques such as: tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart. △ Less

Submitted 12 April, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: NAACL 2021

arXiv:2003.06535 [pdf, other]

An End-to-End Geometric Deficiency Elimination Algorithm for 3D Meshes

Authors: Bingtao Ma, Hongsen Liu, Liangliang Nan, Yang Cong

Abstract: The 3D mesh is an important representation of geometric data. In the generation of mesh data, geometric deficiencies (e.g., duplicate elements, degenerate faces, isolated vertices, self-intersection, and inner faces) are unavoidable and may violate the topology structure of an object. In this paper, we propose an effective and efficient geometric deficiency elimination algorithm for 3D meshes. Spe… ▽ More The 3D mesh is an important representation of geometric data. In the generation of mesh data, geometric deficiencies (e.g., duplicate elements, degenerate faces, isolated vertices, self-intersection, and inner faces) are unavoidable and may violate the topology structure of an object. In this paper, we propose an effective and efficient geometric deficiency elimination algorithm for 3D meshes. Specifically, duplicate elements can be eliminated by assessing the occurrence times of vertices or faces; degenerate faces can be removed according to the outer product of two edges; since isolated vertices do not appear in any face vertices, they can be deleted directly; self-intersecting faces are detected using an AABB tree and remeshed afterward; by simulating whether multiple random rays that shoot from a face can reach infinity, we can judge whether the surface is an inner face, then decide to delete it or not. Experiments on ModelNet40 dataset illustrate that our method can eliminate the deficiencies of the 3D mesh thoroughly. △ Less

Submitted 2 September, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

arXiv:1911.05531 [pdf, other]

Accurate Protein Structure Prediction by Embeddings and Deep Learning Representations

Authors: Iddo Drori, Darshan Thaker, Arjun Srivatsa, Daniel Jeong, Yueqi Wang, Linyong Nan, Fan Wu, Dimitri Leggas, **hao Lei, Weiyi Lu, Weilong Fu, Yuan Gao, Sashank Karri, Anand Kannan, Antonio Moretti, Mohammed AlQuraishi, Chen Keasar, Itsik Pe'er

Abstract: Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonst… ▽ More Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonstrate state-of-the-art protein structure prediction (PSP) results using embeddings and deep learning models for prediction of backbone atom distance matrices and torsion angles. We recover 3D coordinates of backbone atoms and reconstruct full atom protein by optimization. We create a new gold standard dataset of proteins which is comprehensive and easy to use. Our dataset consists of amino acid sequences, Q8 secondary structures, position specific scoring matrices, multiple sequence alignment co-evolutionary features, backbone atom distance matrices, torsion angles, and 3D coordinates. We evaluate the quality of our structure prediction by RMSD on the latest Critical Assessment of Techniques for Protein Structure Prediction (CASP) test data and demonstrate competitive results with the winning teams and AlphaFold in CASP13 and supersede the results of the winning teams in CASP12. We make our data, models, and code publicly available. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Journal ref: Machine Learning in Computational Biology, 2019

arXiv:1901.00891 [pdf, other]

Guigle: A GUI Search Engine for Android Apps

Authors: Carlos Bernal-Cardenas, Kevin Moran, Michele Tufano, Zichang Liu, Linyong Nan, Zhehan Shi, Denys Poshyvanyk

Abstract: The process of develo** a mobile application typically starts with the ideation and conceptualization of its user interface. This concept is then translated into a set of mock-ups to help determine how well the user interface embodies the intended features of the app. After the creation of mock-ups developers then translate it into an app that runs in a mobile device. In this paper we propose an… ▽ More The process of develo** a mobile application typically starts with the ideation and conceptualization of its user interface. This concept is then translated into a set of mock-ups to help determine how well the user interface embodies the intended features of the app. After the creation of mock-ups developers then translate it into an app that runs in a mobile device. In this paper we propose an approach, called GUIGLE, that aims to facilitate the process of conceptualizing the user interface of an app through GUI search. GUIGLE indexes GUI images and metadata extracted using automated dynamic analysis on a large corpus of apps extracted from Google Play. To perform a search, our approach uses information from text displayed on a screen, user interface components, the app name, and screen color palettes to retrieve relevant screens given a query. Furthermore, we provide a lightweight query language that allows for intuitive search of screens. We evaluate GUIGLE with real users and found that, on average, 68.8% of returned screens were relevant to the specified query. Additionally, users found the various different features of GUIGLE useful, indicating that our search engine provides an intuitive user experience. Finally, users agree that the information presented by GUIGLE is useful in conceptualizing the design of new screens for applications. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: Accepted to 41st ACM/IEEE International Conference on Software Engineering, Formal Tool Demonstrations Track

arXiv:1811.07143 [pdf, other]

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures

Authors: Iddo Drori, Isht Dwivedi, Pranav Shrestha, Jeffrey Wan, Yueqi Wang, Yunchu He, Anthony Mazza, Hugh Krogh-Freeman, Dimitri Leggas, Kendal Sandridge, Linyong Nan, Kaveri Thakoor, Chinmay Joshi, Sonam Goenka, Chen Keasar, Itsik Pe'er

Abstract: We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the applicatio… ▽ More We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the application domain. We focus on the Q8 resolution of secondary structure, an active area for continuously improving methods. We use an ensemble of strong predictors to achieve accuracy of 70.7% (on the CB513 test set using the CB6133filtered training set). These results are statistically indistinguishable from those of the top existing predictors. In the spirit of reproducible research we make our data, models and code available, aiming to set a gold standard for purity of training and testing sets. Such good practices lower entry barriers to this domain and facilitate reproducible, extendable research. △ Less

Submitted 17 November, 2018; originally announced November 2018.

Comments: NIPS 2018 Workshop on Machine Learning for Molecules and Materials, 10 pages

arXiv:1603.08753 [pdf, other]

Curve Networks for Surface Reconstruction

Authors: Yuanhao Cao, Liangliang Nan, Peter Wonka

Abstract: Man-made objects usually exhibit descriptive curved features (i.e., curve networks). The curve network of an object conveys its high-level geometric and topological structure. We present a framework for extracting feature curve networks from unstructured point cloud data. Our framework first generates a set of initial curved segments fitting highly curved regions. We then optimize these curved seg… ▽ More Man-made objects usually exhibit descriptive curved features (i.e., curve networks). The curve network of an object conveys its high-level geometric and topological structure. We present a framework for extracting feature curve networks from unstructured point cloud data. Our framework first generates a set of initial curved segments fitting highly curved regions. We then optimize these curved segments to respect both data fitting and structural regularities. Finally, the optimized curved segments are extended and connected into curve networks using a clustering method. To facilitate effectiveness in case of severe missing data and to resolve ambiguities, we develop a user interface for completing the curve networks. Experiments on various imperfect point cloud data validate the effectiveness of our curve network extraction framework. We demonstrate the usefulness of the extracted curve networks for surface reconstruction from incomplete point clouds. △ Less

Submitted 29 March, 2016; originally announced March 2016.

Comments: 10 pages

Showing 1–38 of 38 results for author: Nan, L