-
SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph
Authors:
Julio C. Rangel,
Tarcisio Mendes de Farias,
Ana Claudia Sima,
Norio Kobayashi
Abstract:
The recent success of Large Language Models (LLM) in a wide range of Natural Language Processing applications opens the path towards novel Question Answering Systems over Knowledge Graphs leveraging LLMs. However, one of the main obstacles preventing their implementation is the scarcity of training data for the task of translating questions into corresponding SPARQL queries, particularly in the ca…
▽ More
The recent success of Large Language Models (LLM) in a wide range of Natural Language Processing applications opens the path towards novel Question Answering Systems over Knowledge Graphs leveraging LLMs. However, one of the main obstacles preventing their implementation is the scarcity of training data for the task of translating questions into corresponding SPARQL queries, particularly in the case of domain-specific KGs. To overcome this challenge, in this study, we evaluate several strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. In particular, we propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph towards a larger dataset of semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even for datasets where these pairs are scarce. In this context, we also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments. Finally, we evaluate our approach over the real-world Bgee gene expression knowledge graph and we show that semantic clues can improve model performance by up to 33% compared to a baseline with random variable names and no comments included.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
DriveLM: Driving with Graph Visual Question Answering
Authors:
Chonghao Sima,
Katrin Renz,
Kashyap Chitta,
Li Chen,
Hanxue Zhang,
Chengen Xie,
** Luo,
Andreas Geiger,
Hongyang Li
Abstract:
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human users. While recent approaches adapt VLMs to driving via single-round visual question answering (VQA), human drivers reason about decisions in multiple steps. Starting from the localization of key objects, humans estimate…
▽ More
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human users. While recent approaches adapt VLMs to driving via single-round visual question answering (VQA), human drivers reason about decisions in multiple steps. Starting from the localization of key objects, humans estimate object interactions before taking actions. The key insight is that with our proposed task, Graph VQA, where we model graph-structured reasoning through perception, prediction and planning question-answer pairs, we obtain a suitable proxy task to mimic the human reasoning process. We instantiate datasets (DriveLM-Data) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving. The experiments demonstrate that Graph VQA provides a simple, principled framework for reasoning about a driving scene, and DriveLM-Data provides a challenging benchmark for this task. Our DriveLM-Agent baseline performs end-to-end autonomous driving competitively in comparison to state-of-the-art driving-specific architectures. Notably, its benefits are pronounced when it is evaluated zero-shot on unseen objects or sensor configurations. We hope this work can be the starting point to shed new light on how to apply VLMs for autonomous driving. To facilitate future research, all code, data, and models are available to the public.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
Authors:
Linyan Huang,
Zhiqi Li,
Chonghao Sima,
Wenhai Wang,
**gdong Wang,
Yu Qiao,
Hongyang Li
Abstract:
Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-bas…
▽ More
Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-based enhancements for apprentices. Motivated by the success of uni-modal distillation, an apprentice-friendly expert model would predominantly rely on camera features, while still achieving comparable performance to multi-modal models. To this end, we introduce VCD, a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision. The multi-modal expert VCD-E adopts an identical structure as that of the camera-only apprentice in order to alleviate the feature disparity, and leverages LiDAR input as a depth prior to reconstruct the 3D scene, achieving the performance on par with other heterogeneous multi-modal experts. Additionally, a fine-grained trajectory-based distillation module is introduced with the purpose of individually rectifying the motion misalignment for each object in the scene. With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Scene as Occupancy
Authors:
Chonghao Sima,
Wenwen Tong,
Tai Wang,
Li Chen,
Silei Wu,
Hanming Deng,
Yi Gu,
Lewei Lu,
** Luo,
Dahua Lin,
Hongyang Li
Abstract:
Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occu…
▽ More
Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occupancy is that it could capture the fine-grained details of critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior or concurrent literature mainly concentrate on a single scene completion task, where we might argue that the potential of this occupancy representation might obsess broader impact. In this paper, we propose OccNet, a multi-view vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. At the core of OccNet is a general occupancy embedding to represent 3D physical world. Such a descriptor could be applied towards a wide span of driving tasks, including detection, segmentation and planning. To validate the effectiveness of this new representation and our proposed algorithm, we propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes. Empirical experiments show that there are evident performance gain across multiple tasks, e.g., motion planning could witness a collision rate reduction by 15%-58%, demonstrating the superiority of our method.
△ Less
Submitted 26 June, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Map**
Authors:
Huijie Wang,
Tianyu Li,
Yang Li,
Li Chen,
Chonghao Sima,
Zhenbo Liu,
Bangjun Wang,
Pei** Jia,
Yuting Wang,
Shengyin Jiang,
Feng Wen,
Hang Xu,
** Luo,
Junchi Yan,
Wei Zhang,
Hongyang Li
Abstract:
Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments. However, existing benchmarks tend to oversimplify the scene by solely focusing on lane perception tasks. Observing that human drivers rely on both lanes and traffic signals to operate their vehicles safely, we present OpenLane-V2, the first dataset on topology reasoning for tra…
▽ More
Accurately depicting the complex traffic scene is a vital component for autonomous vehicles to execute correct judgments. However, existing benchmarks tend to oversimplify the scene by solely focusing on lane perception tasks. Observing that human drivers rely on both lanes and traffic signals to operate their vehicles safely, we present OpenLane-V2, the first dataset on topology reasoning for traffic scene structure. The objective of the presented dataset is to advance research in understanding the structure of road scenes by examining the relationship between perceived entities, such as traffic elements and lanes. Leveraging existing datasets, OpenLane-V2 consists of 2,000 annotated road scenes that describe traffic elements and their correlation to the lanes. It comprises three primary sub-tasks, including the 3D lane detection inherited from OpenLane, accompanied by corresponding metrics to evaluate the model's performance. We evaluate various state-of-the-art methods, and present their quantitative and qualitative results on OpenLane-V2 to indicate future avenues for investigating topology reasoning in traffic scenes.
△ Less
Submitted 28 October, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Sparse Dense Fusion for 3D Object Detection
Authors:
Yulu Gao,
Chonghao Sima,
Shaoshuai Shi,
Shangzhe Di,
Si Liu,
Hongyang Li
Abstract:
With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection. Although multiple fusion approaches have been proposed, they can be classified into either sparse-only or dense-only fashion based on the feature representation in the fusion module. In this paper, we analyze them in a common taxonomy and thereafter observe two challenges: 1) sparse-only s…
▽ More
With the prevalence of multimodal learning, camera-LiDAR fusion has gained popularity in 3D object detection. Although multiple fusion approaches have been proposed, they can be classified into either sparse-only or dense-only fashion based on the feature representation in the fusion module. In this paper, we analyze them in a common taxonomy and thereafter observe two challenges: 1) sparse-only solutions preserve 3D geometric prior and yet lose rich semantic information from the camera, and 2) dense-only alternatives retain the semantic continuity but miss the accurate geometric information from LiDAR. By analyzing these two formulations, we conclude that the information loss is inevitable due to their design scheme. To compensate for the information loss in either manner, we propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture. Such a simple yet effective sparse-dense fusion structure enriches semantic texture and exploits spatial structure information simultaneously. Through our SDF strategy, we assemble two popular methods with moderate performance and outperform baseline by 4.3% in mAP and 2.5% in NDS, ranking first on the nuScenes benchmark. Extensive ablations demonstrate the effectiveness of our method and empirically align our analysis.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Planning-oriented Autonomous Driving
Authors:
Yihan Hu,
Jiazhi Yang,
Li Chen,
Keyu Li,
Chonghao Sima,
Xizhou Zhu,
Siqi Chai,
Senyao Du,
Tianwei Lin,
Wenhai Wang,
Lewei Lu,
Xiaosong Jia,
Qiang Liu,
Jifeng Dai,
Yu Qiao,
Hongyang Li
Abstract:
Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction, and planning. In order to perform a wide diversity of tasks and achieve advanced-level intelligence, contemporary approaches either deploy standalone models for individual tasks, or design a multi-task paradigm with separate heads. However, they might suffer from accumulative error…
▽ More
Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction, and planning. In order to perform a wide diversity of tasks and achieve advanced-level intelligence, contemporary approaches either deploy standalone models for individual tasks, or design a multi-task paradigm with separate heads. However, they might suffer from accumulative errors or deficient task coordination. Instead, we argue that a favorable framework should be devised and optimized in pursuit of the ultimate goal, i.e., planning of the self-driving car. Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning. We introduce Unified Autonomous Driving (UniAD), a comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query interfaces to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven by substantially outperforming previous state-of-the-arts in all aspects. Code and models are public.
△ Less
Submitted 23 March, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
Authors:
Hongyang Li,
Chonghao Sima,
Jifeng Dai,
Wenhai Wang,
Lewei Lu,
Huijie Wang,
Jia Zeng,
Zhiqi Li,
Jiazhi Yang,
Hanming Deng,
Hao Tian,
Enze Xie,
Jiangwei Xie,
Li Chen,
Tianyu Li,
Yang Li,
Yulu Gao,
Xiaosong Jia,
Si Liu,
Jian** Shi,
Dahua Lin,
Yu Qiao
Abstract:
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sens…
▽ More
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. BEV perception inherits several advantages, as representing surrounding scenes in BEV is intuitive and fusion-friendly; and representing objects in BEV is most desirable for subsequent modules as in planning and/or control. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; (c) how to formulate the pipeline to incorporate features from different sources and views; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios. In this survey, we review the most recent works on BEV perception and provide an in-depth analysis of different solutions. Moreover, several systematic designs of BEV approach from the industry are depicted as well. Furthermore, we introduce a full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs. At last, we point out the future research directions in this area. We hope this report will shed some light on the community and encourage more research effort on BEV perception. We keep an active repository to collect the most recent work and provide a toolbox for bag of tricks at https://github.com/OpenDriveLab/Birds-eye-view-Perception
△ Less
Submitted 27 September, 2023; v1 submitted 12 September, 2022;
originally announced September 2022.
-
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
Authors:
Zhiqi Li,
Wenhai Wang,
Hongyang Li,
Enze Xie,
Chonghao Sima,
Tong Lu,
Qiao Yu,
Jifeng Dai
Abstract:
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal in…
▽ More
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code is available at \url{https://github.com/zhiqi-li/BEVFormer}.
△ Less
Submitted 13 July, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
Authors:
Li Chen,
Chonghao Sima,
Yang Li,
Zehan Zheng,
Jiajie Xu,
Xiangwei Geng,
Hongyang Li,
Conghui He,
Jian** Shi,
Yu Qiao,
Junchi Yan
Abstract:
Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.). Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird's eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer:…
▽ More
Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.). Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird's eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer: an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module. Our model generates BEV features by attending to related front-view local regions with camera parameters as a reference. PersFormer adopts a unified 2D/3D anchor design and an auxiliary task to detect 2D/3D lanes simultaneously, enhancing the feature consistency and sharing the benefits of multi-task learning. Moreover, we release one of the first large-scale real-world 3D lane datasets: OpenLane, with high-quality annotation and scenario diversity. OpenLane contains 200,000 frames, over 880,000 instance-level lanes, 14 lane categories, along with scene tags and the closed-in-path object annotations to encourage the development of lane detection and more industrial-related autonomous driving methods. We show that PersFormer significantly outperforms competitive baselines in the 3D lane detection task on our new OpenLane dataset as well as Apollo 3D Lane Synthetic dataset, and is also on par with state-of-the-art algorithms in the 2D task on OpenLane. The project page is available at https://github.com/OpenPerceptionX/PersFormer_3DLane and OpenLane dataset is provided at https://github.com/OpenPerceptionX/OpenLane.
△ Less
Submitted 19 July, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
BV equivalence with boundary
Authors:
Francisco Manuel Castela Simão,
Alberto S. Cattaneo,
Michele Schiavina
Abstract:
An extension of the notion of classical equivalence of equivalence in the Batalin--(Fradkin)--Vilkovisky (BV) and (BFV) framework for local Lagrangian field theory on manifolds possibly with boundary is discussed. Equivalence is phrased in both a strict and a lax sense, distinguished by the compatibility between the BV data for a field theory and its boundary BFV data, necessary for quantisation.…
▽ More
An extension of the notion of classical equivalence of equivalence in the Batalin--(Fradkin)--Vilkovisky (BV) and (BFV) framework for local Lagrangian field theory on manifolds possibly with boundary is discussed. Equivalence is phrased in both a strict and a lax sense, distinguished by the compatibility between the BV data for a field theory and its boundary BFV data, necessary for quantisation. In this context, the first- and second-order formulations of non-Abelian Yang--Mills and of classical mechanics on curved backgrounds, all of which admit a strict BV-BFV description, are shown to be pairwise equivalent as strict BV-BFV theories. This in particular implies that their BV-complexes are quasi-isomorphic. Furthermore, Jacobi theory and one-dimensional gravity coupled with scalar matter are compared as classically-equivalent reparametrisation-invariant versions of classical mechanics, but such that only the latter admits a strict BV-BFV formulation. They are shown to be equivalent as lax BV-BFV theories and to have isomorphic BV cohomologies. This shows that strict BV-BFV equivalence is a strictly finer notion of equivalence of theories.
△ Less
Submitted 7 March, 2023; v1 submitted 11 September, 2021;
originally announced September 2021.
-
Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data
Authors:
Ana Claudia Sima,
Tarcisio Mendes de Farias,
Maria Anisimova,
Christophe Dessimoz,
Marc Robinson-Rechavi,
Erich Zbinden,
Kurt Stockinger
Abstract:
The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training dat…
▽ More
The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training datasets to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex scientific datasets where no prior training data is available.
In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries. Bio-SODA uses a generic graph-based approach for translating user questions to a ranked list of SPARQL candidate queries. Furthermore, Bio-SODA uses a novel ranking algorithm that includes node centrality as a measure of relevance for selecting the best SPARQL candidate query. Our experiments with real-world datasets across several scientific domains, including the official bioinformatics Question Answering over Linked Data (QALD) challenge, show that Bio-SODA outperforms publicly available KGQA systems by an F1-score of least 20% and by an even higher factor on more complex bioinformatics datasets.
△ Less
Submitted 14 June, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Optical and mechanical properties of nanofibrillated cellulose: towards a robust platform for next-generation green technologies
Authors:
Claudia D. Simao,
Juan S. Reparaz,
Markus. R. Wagner,
Bartlomiej Graczykowski,
Martin Kreuzer,
Yasser B. Ruiz-Blanco,
Yamila Garcia,
Jani-Markus Malho,
Alejandro R. Goni,
Jouni Ahopelto,
Clivia M. Sotomayor Torres
Abstract:
Nanofibrillated cellulose, a polymer that can be obtained from one of the most abundant biopolymers in Nature, is being increasingly explored due to its outstanding properties for packaging and device applications. Still, open challenges in engineering its intrinsic properties remain to address. The results obtained show the precise determination of significant properties as elastic properties and…
▽ More
Nanofibrillated cellulose, a polymer that can be obtained from one of the most abundant biopolymers in Nature, is being increasingly explored due to its outstanding properties for packaging and device applications. Still, open challenges in engineering its intrinsic properties remain to address. The results obtained show the precise determination of significant properties as elastic properties and interactions that are compared with similar works and, moreover, demonstrate that nanofibrillated cellulose properties can be reversibly controlled, supporting the extended potential of nanofibrillated cellulose as a robust platform for green-technology applications
△ Less
Submitted 1 April, 2015;
originally announced April 2015.
-
Order quantification of hexagonal periodic arrays fabricated by in situ solvent-assisted nanoimprint lithography of block copolymers
Authors:
Claudia Simao,
Worawut Khunsin,
Nikolaos Kehagias,
Mathieu Salaun,
Marc Zelsmann,
Michael A. Morris,
Clivia M. Sotomayor Torres
Abstract:
Directed self-assembly of block copolymer polystyrene-b-polyethylene oxide (PS-b-PEO) thin film was achieved by one-pot methodology of solvent vapour assisted nanoimprint lithography (SAIL).
Directed self-assembly of block copolymer polystyrene-b-polyethylene oxide (PS-b-PEO) thin film was achieved by one-pot methodology of solvent vapour assisted nanoimprint lithography (SAIL).
△ Less
Submitted 10 March, 2014;
originally announced March 2014.