Search | arXiv e-print repository

arXiv:2407.02095 [pdf, other]

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Authors: Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng

Abstract: Python's dynamic ty** system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined type… ▽ More Python's dynamic ty** system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.15806 [pdf, other]

Robust Dynamic Control Barrier Function Based Trajectory Planning for Mobile Manipulator

Authors: Lihao Xu, Xiaogang Xiong, Bai Yang, Yunjiang Lou

Abstract: High-dimensional robot dynamic trajectory planning poses many challenges for traditional planning algorithms. Existing planning methods suffer from issues such as long computation times, limited capacity to address intricate obstacle models, and lack of consideration for external disturbances and measurement inaccuracies in these high-dimensional systems. To tackle these challenges, this paper pro… ▽ More High-dimensional robot dynamic trajectory planning poses many challenges for traditional planning algorithms. Existing planning methods suffer from issues such as long computation times, limited capacity to address intricate obstacle models, and lack of consideration for external disturbances and measurement inaccuracies in these high-dimensional systems. To tackle these challenges, this paper proposes a novel trajectory planning approach that combines Dynamic Control Barrier Function (DCBF) with a disturbance observer to create a Robust Dynamic Control Barrier Function (RDCBF) planner. This approach successfully plans trajectories in environments with complex dynamic obstacles while accounting for external disturbances and measurement uncertainties, ensuring system safety and enabling precise obstacle avoidance. Experimental results on a mobile manipulator demonstrate outstanding performance of the proposed approach. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.11707 [pdf, other]

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

Authors: Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jian** Wang

Abstract: Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap… ▽ More Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities. Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of prediction-side attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car. To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 33rd USENIX Security Symposium 2024

arXiv:2406.11147 [pdf, other]

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

Abstract: Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in… ▽ More Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77. △ Less

Submitted 19 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10018 [pdf, other]

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

Authors: Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou

Abstract: Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based rep… ▽ More Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based repository-level code completion by investigating both the effectiveness and efficiency of static analysis integration strategies across different phases of code completion. We first implement a framework STALL+, which supports an extendable and customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion; and based on STALL+, we perform extensive experiments by including different code LLMs on the latest repository-level code completion benchmark CrossCodeEval. Our findings show that integrating file-level dependencies in prompting phase performs the best while the integration in post-processing phase performs the worse. Additionally, we observe different improvements from static analysis between dynamic languages and static languages, i.e., the best combination is prompting-phase with decoding-phase integration for Java while the best combination is prompting-phase with post-processing-phase integration for Python given the limitations of statically analyzing dynamic languages. Additionally, we find the complementarity between RAG and static analysis integration as well as their cost-effectiveness after combination. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures

arXiv:2406.03803 [pdf, ps, other]

Determining the Weight Spectrum of the Reed--Muller Codes RM(m-6,m)

Authors: Yueying Lou, Qichun Wang

Abstract: The weight spectra of the Reed-Muller codes $RM(r,m)$ were unknown for $r=3,...,m-5$. In IEEE Trans. Inform. Theory 2024, Carlet determined the weight spectrum of $RM(m-5,m)$ for $m\ge10$ using the Maiorana-McFarland construction, where the result was tried to be extended to $RM(m-6,m)$, but many problems occurred and much work needed to be done. In this paper, we propose a novel way of constructi… ▽ More The weight spectra of the Reed-Muller codes $RM(r,m)$ were unknown for $r=3,...,m-5$. In IEEE Trans. Inform. Theory 2024, Carlet determined the weight spectrum of $RM(m-5,m)$ for $m\ge10$ using the Maiorana-McFarland construction, where the result was tried to be extended to $RM(m-6,m)$, but many problems occurred and much work needed to be done. In this paper, we propose a novel way of constructing Reed--Muller codewords and determine the weight spectrum of $RM(m-6,m)$ for $m\ge12$, which gives a positive answer to an open question on the weight spectrum of $RM(m-c,m)$ for $c=6$. Moreover, we put forward a conjecture and verify it for some cases. If the conjecture is true, then that open question can be completely solved. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2404.14294 [pdf, other]

A Survey on Efficient Inference for Large Language Models

Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-** Zhang, Yuhan Dong, Yu Wang

Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards develo** techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards develo** techniques aimed at enhancing the efficiency of LLM inference. This paper presents a comprehensive survey of the existing literature on efficient LLM inference. We start by analyzing the primary causes of the inefficient LLM inference, i.e., the large model size, the quadratic-complexity attention operation, and the auto-regressive decoding approach. Then, we introduce a comprehensive taxonomy that organizes the current literature into data-level, model-level, and system-level optimization. Moreover, the paper includes comparative experiments on representative methods within critical sub-fields to provide quantitative insights. Last but not least, we provide some knowledge summary and discuss future research directions. △ Less

Submitted 8 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11978 [pdf, other]

EVIT: Event-Oriented Instruction Tuning for Event Reasoning

Authors: Zhengwei Tao, Xiancai Chen, Zhi **, Xiaoying Bai, Haiyan Zhao, Yiwei Lou

Abstract: Events refer to specific occurrences, incidents, or happenings that take place under a particular background. Event reasoning aims to infer events according to certain relations and predict future events. The cutting-edge techniques for event reasoning play a crucial role in various natural language processing applications. Large language models (LLMs) have made significant advancements in event r… ▽ More Events refer to specific occurrences, incidents, or happenings that take place under a particular background. Event reasoning aims to infer events according to certain relations and predict future events. The cutting-edge techniques for event reasoning play a crucial role in various natural language processing applications. Large language models (LLMs) have made significant advancements in event reasoning owing to their wealth of knowledge and reasoning capabilities. However, smaller instruction-tuned models currently in use do not consistently demonstrate exceptional proficiency in managing these tasks. This discrepancy arises from the absence of explicit modeling of events and the interconnections of them within their instruction data. Consequently, these models face challenges in comprehending event structures and semantics while struggling to bridge the gap between their interpretations and human understanding of events. Additionally, their limitations in gras** event relations lead to constrained event reasoning abilities to effectively deduce and incorporate pertinent event knowledge. In this paper, we propose Event-Oriented Instruction Tuning (EvIT) to train our LLM. Specifically, we first propose a novel structure named event quadruple which contains the structure and semantics of events and is complete in the event representation. We then design event-relation learning based on the structures. We encapsulate the learning into the instruction-tuning formulation to better stimulate the event reasoning capacity of our model. We design a heuristic unsupervised method to mine event quadruple from a large-scale corpus. At last, we finetune a Llama model on our Event-Oriented Instruction Tuning. We conduct extensive experiments on event reasoning tasks on several datasets. Automatic and human evaluations demonstrate EvIT achieves competitive performances on event reasoning. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.05952 [pdf, other]

Robot Safe Planning In Dynamic Environments Based On Model Predictive Control Using Control Barrier Function

Authors: Zetao Lu, Kaijun Feng, Jun Xu, Haoyao Chen, Yunjiang Lou

Abstract: Implementing obstacle avoidance in dynamic environments is a challenging problem for robots. Model predictive control (MPC) is a popular strategy for dealing with this type of problem, and recent work mainly uses control barrier function (CBF) as hard constraints to ensure that the system state remains in the safe set. However, in crowded scenarios, effective solutions may not be obtained due to i… ▽ More Implementing obstacle avoidance in dynamic environments is a challenging problem for robots. Model predictive control (MPC) is a popular strategy for dealing with this type of problem, and recent work mainly uses control barrier function (CBF) as hard constraints to ensure that the system state remains in the safe set. However, in crowded scenarios, effective solutions may not be obtained due to infeasibility problems, resulting in degraded controller performance. We propose a new MPC framework that integrates CBF to tackle the issue of obstacle avoidance in dynamic environments, in which the infeasibility problem induced by hard constraints operating over the whole prediction horizon is solved by softening the constraints and introducing exact penalty, prompting the robot to actively seek out new paths. At the same time, generalized CBF is extended as a single-step safety constraint of the controller to enhance the safety of the robot during navigation. The efficacy of the proposed method is first shown through simulation experiments, in which a double-integrator system and a unicycle system are employed, and the proposed method outperforms other controllers in terms of safety, feasibility, and navigation efficiency. Furthermore, real-world experiment on an MR1000 robot is implemented to demonstrate the effectiveness of the proposed method. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.16362 [pdf, other]

AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

Authors: Yihao Qin, Shangwen Wang, Yiling Lou, **hao Dong, Kaixin Wang, Xiaoling Li, Xiaoguang Mao

Abstract: Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code sc… ▽ More Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code scope (i.e., a method or a class), which struggles to diagnose bugs for a large code scope (i.e., an entire software system). To address the limitation, this paper presents AgentFL, a multi-agent system based on ChatGPT for automated fault localization. By simulating the behavior of a human developer, AgentFL models the FL task as a three-step process, which involves comprehension, navigation, and confirmation. Within each step, AgentFL hires agents with diversified expertise, each of which utilizes different tools to handle specific tasks. Particularly, we adopt a series of auxiliary strategies such as Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to overcome the challenges in each step. The evaluation on the widely used Defects4J-V1.2.0 benchmark shows that AgentFL can localize 157 out of 395 bugs within Top-1, which outperforms the other LLM-based approaches and exhibits complementarity to the state-of-the-art learning-based techniques. Additionally, we confirm the indispensability of the components in AgentFL with the ablation study and demonstrate the usability of AgentFL through a user study. Finally, the cost analysis shows that AgentFL spends an average of only 0.074 dollars and 97 seconds for a single bug. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2402.03610 [pdf, other]

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

Authors: Tomoyuki Kagaya, Thong **g Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You

Abstract: Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning… ▽ More Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.17786 [pdf, other]

A Graph-Native Query Optimization Framework

Authors: Bingqing Lyu, Xiaoli Zhou, Longbin Lai, Yufan Yang, Yunkai Lou, Wenyuan Yu, **gren Zhou

Abstract: Graph queries that combine pattern matching with relational operations, referred as PatRelQuery, are widely used in many real-world applications. It allows users to identify arbitrary patterns in a graph and further perform in-depth relational analysis on the results. To effectively support PatRelQuery, two key challenges need to be addressed: (1) how to optimize PatRelQuery in a unified framework… ▽ More Graph queries that combine pattern matching with relational operations, referred as PatRelQuery, are widely used in many real-world applications. It allows users to identify arbitrary patterns in a graph and further perform in-depth relational analysis on the results. To effectively support PatRelQuery, two key challenges need to be addressed: (1) how to optimize PatRelQuery in a unified framework, and (2) how to handle the arbitrary type constraints in patterns in PatRelQuery. In this paper, we present a graph-native query optimization framework named GOpt, to tackle these issues. GOpt is built on top of a unified intermediate representation (IR) that is capable of capturing both graph and relational operations, thereby streamlining the optimization of PatRelQuery. To handle the arbitrary type constraints, GOpt employs an automatic type inference approach to identify implicit type constraints. Additionally, GOpt introduces a graph-native optimizer, which encompasses an extensive collection of optimization rules along with cost-based techniques tailored for arbitrary patterns, to optimize PatRelQuery. Through comprehensive experiments, we demonstrate that GOpt can achieve significant query performance improvements, in both crafted benchmarks and real-world applications. △ Less

Submitted 5 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.01738 [pdf, other]

doi 10.1109/TWC.2024.3351856

Integrated Sensing and Communication with Massive MIMO: A Unified Tensor Approach for Channel and Target Parameter Estimation

Authors: Ruoyu Zhang, Lei Cheng, Shuai Wang, Yi Lou, Yulong Gao, Wen Wu, Derrick Wing Kwan Ng

Abstract: Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channe… ▽ More Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channel state information and target parameter information. To overcome these two challenges with a unified framework, we first analyze their underlying system models and then propose a novel tensor-based approach that addresses both the channel estimation and target sensing problems. Specifically, by parameterizing the high-dimensional communication channel exploiting a small number of physical parameters, we associate the channel state information with the sensing parameters of targets in terms of angular, delay, and Doppler dimensions. Then, we propose a shared training pattern adopting the same time-frequency resources such that both the channel estimation and target parameter estimation can be formulated as a canonical polyadic decomposition problem with a similar mathematical expression. On this basis, we first investigate the uniqueness condition of the tensor factorization and the maximum number of resolvable targets by utilizing the specific Vandermonde △ Less

Submitted 3 January, 2024; originally announced January 2024.

Journal ref: IEEE Transactions on Wireless Communications, 2024

arXiv:2312.10448 [pdf, other]

Resolving Crash Bugs via Large Language Models: An Empirical Study

Authors: Xueying Du, Mingwei Liu, Juntao Li, Hanlin Wang, Xin Peng, Yiling Lou

Abstract: Crash bugs cause unexpected program behaviors or even termination, requiring high-priority resolution. However, manually resolving crash bugs is challenging and labor-intensive, and researchers have proposed various techniques for their automated localization and repair. ChatGPT, a recent large language model (LLM), has garnered significant attention due to its exceptional performance across vario… ▽ More Crash bugs cause unexpected program behaviors or even termination, requiring high-priority resolution. However, manually resolving crash bugs is challenging and labor-intensive, and researchers have proposed various techniques for their automated localization and repair. ChatGPT, a recent large language model (LLM), has garnered significant attention due to its exceptional performance across various domains. This work performs the first investigation into ChatGPT's capability in resolve real-world crash bugs, focusing on its effectiveness in both localizing and repairing code-related and environment-related crash bugs. Specifically, we initially assess ChatGPT's fundamental ability to resolve crash bugs with basic prompts in a single iteration. We observe that ChatGPT performs better at resolving code-related crash bugs compared to environment-related ones, and its primary challenge in resolution lies in inaccurate localization. Additionally, we explore ChatGPT's potential with various advanced prompts. Furthermore, by stimulating ChatGPT's self-planning, it methodically investigates each potential crash-causing environmental factor through proactive inquiry, ultimately identifying the root cause of the crash. Based on our findings, we propose IntDiagSolver, an interaction methodology designed to facilitate precise crash bug resolution through continuous interaction with LLMs. Evaluating IntDiagSolver on multiple LLMs reveals consistent enhancement in the accuracy of crash bug resolution, including ChatGPT, Claude, and CodeLlama. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.08367 [pdf, other]

ViLA: Efficient Video-Language Alignment for Video Question Answering

Authors: Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming Lin, Shan Yang

Abstract: In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a new cross-modal distillation (QFormer-Distiller) module. Pre-trained large image-language models have shown promising resu… ▽ More In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a new cross-modal distillation (QFormer-Distiller) module. Pre-trained large image-language models have shown promising results on problems such as visual question answering (VQA). However, how to efficiently and effectively sample video frames when adapting pre-trained large image-language model to video-language alignment is still the major challenge. Compared with prior work, our ViLA model demonstrates the capability of selecting key frames with critical contents, thus improving the video-language alignment accuracy while reducing the inference latency +3.3% on NExT-QA Temporal with 3.0X speed up). Overall, our ViLA network outperforms the state-of-the-art methods on the video question-answering benchmarks: +4.6% on STAR Interaction, +2.2% on STAR average with 3.0X speed up, ours 2-frames out-perform SeViLA 4-frames on the VLEP dataset with 4.2X speed-up. △ Less

Submitted 29 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.05795 [pdf, other]

Improvements on Uncertainty Quantification for Node Classification via Distance-Based Regularization

Authors: Russell Alan Hart, Linlin Yu, Yifei Lou, Feng Chen

Abstract: Deep neural networks have achieved significant success in the last decades, but they are not well-calibrated and often produce unreliable predictions. A large number of literature relies on uncertainty quantification to evaluate the reliability of a learning model, which is particularly important for applications of out-of-distribution (OOD) detection and misclassification detection. We are intere… ▽ More Deep neural networks have achieved significant success in the last decades, but they are not well-calibrated and often produce unreliable predictions. A large number of literature relies on uncertainty quantification to evaluate the reliability of a learning model, which is particularly important for applications of out-of-distribution (OOD) detection and misclassification detection. We are interested in uncertainty quantification for interdependent node-level classification. We start our analysis based on graph posterior networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss function. We describe the theoretical limitations of the widely-used UCE loss. To alleviate the identified drawbacks, we propose a distance-based regularization that encourages clustered OOD nodes to remain clustered in the latent space. We conduct extensive comparison experiments on eight standard datasets and demonstrate that the proposed regularization outperforms the state-of-the-art in both OOD detection and misclassification detection. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Neurips 2023

arXiv:2311.04726 [pdf, other]

Social Motion Prediction with Cognitive Hierarchies

Authors: Wentao Zhu, Jason Qin, Yuke Lou, Hang Ye, Xiaoxuan Ma, Hai Ci, Yizhou Wang

Abstract: Humans exhibit a remarkable capacity for anticipating the actions of others and planning their own actions accordingly. In this study, we strive to replicate this ability by addressing the social motion prediction problem. We introduce a new benchmark, a novel formulation, and a cognition-inspired framework. We present Wusi, a 3D multi-person motion dataset under the context of team sports, which… ▽ More Humans exhibit a remarkable capacity for anticipating the actions of others and planning their own actions accordingly. In this study, we strive to replicate this ability by addressing the social motion prediction problem. We introduce a new benchmark, a novel formulation, and a cognition-inspired framework. We present Wusi, a 3D multi-person motion dataset under the context of team sports, which features intense and strategic human interactions and diverse pose distributions. By reformulating the problem from a multi-agent reinforcement learning perspective, we incorporate behavioral cloning and generative adversarial imitation learning to boost learning efficiency and generalization. Furthermore, we take into account the cognitive aspects of the human social action planning process and develop a cognitive hierarchy framework to predict strategic human social interactions. We conduct comprehensive experiments to validate the effectiveness of our proposed dataset and approach. Code and data are available at https://walter0807.github.io/Social-CH/. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023

arXiv:2311.04448 [pdf, other]

Inferring Resource-Oriented Intentions using LLMs for Static Resource Leak Detection

Authors: Chong Wang, Jianan Liu, Xin Peng, Yang Liu, Yiling Lou

Abstract: Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisiti… ▽ More Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisition/release APIs and (2) false positives caused by the incompleteness of resource reachability validation identification. To overcome these challenges, we propose InferROI, a novel approach that leverages the exceptional code comprehension capability of large language models (LLMs) to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code. InferROI first prompts the LLM to infer involved intentions for a given code snippet, and then incorporates a two-stage static analysis approach to check control-flow paths for resource leak detection based on the inferred intentions. We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection. Experimental results on the DroidLeaks and JLeaks datasets demonstrate InferROI achieves promising bug detection rate (59.3% and 64.8%) and false alarm rate (18.6% and 24.0%). Compared to three industrial static detectors, InferROI detects 14~45 and 167~503 more bugs in DroidLeaks and JLeaks, respectively. When applied to real-world open-source projects, InferROI identifies 26 unknown resource leak bugs, with 7 of them being confirmed by developers. Finally, manual annotation indicated that InferROI achieved a precision of 74.6% and a recall of 81.8% in intention inference, covering more than 60% resource types involved in the datasets. The results of an ablation study underscores the importance of combining LLM-based inference with static analysis. △ Less

Submitted 2 July, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.00964 [pdf, other]

doi 10.1145/3637528.3671521

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Authors: Chengyao Wen, Yin Lou

Abstract: Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset acc… ▽ More Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work. △ Less

Submitted 27 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.04551 [pdf, other]

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

Authors: Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou

Abstract: Pre-training has been an important ingredient in develo** strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By exam… ▽ More Pre-training has been an important ingredient in develo** strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By examining the layer-wise representations, we demonstrate significant changes in these later layers during fine-tuning, indicating the ineffectiveness of their pre-trained features for depth estimation. To address these limitations, we propose MeSa, a comprehensive framework that leverages the complementary strengths of masked, geometric, and supervised pre-training. Hence, MeSa benefits from not only general-purpose representations learnt via masked pre training but also specialized depth-specific features acquired via geometric and supervised pre-training. Our CKA layer-wise analysis confirms that our pre-training strategy indeed produces improved representations for the later layers, overcoming the drawbacks of the SOTA SSL method. Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings. We also investigate the influence of the pre-training dataset and demonstrate the efficacy of pre-training on LSUN, which yields significantly better pre-trained representations. Overall, our approach surpasses the masked pre-training SSL method by a substantial margin of 17.1% on the RMSE. Moreover, even without utilizing any recently proposed techniques, MeSa also outperforms the most recent methods and establishes a new state-of-the-art for monocular depth estimation on the challenging NYUv2 dataset. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.01372 [pdf, other]

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

Authors: Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

Abstract: We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity.Despite the recent significant process in text-based human motion generation,existing methods often prioritize fitting training motions at the expense of action diversity. Consequently, striking a balance between motion quality and diversity rem… ▽ More We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity.Despite the recent significant process in text-based human motion generation,existing methods often prioritize fitting training motions at the expense of action diversity. Consequently, striking a balance between motion quality and diversity remains an unresolved challenge. This problem is compounded by two key factors: 1) the lack of diversity in motion-caption pairs in existing benchmarks and 2) the unilateral and biased semantic understanding of the text prompt, focusing primarily on the verb component while neglecting the nuanced distinctions indicated by other words.In response to the first issue, we construct a large-scale Wild Motion-Caption dataset (WMC) to extend the restricted action boundary of existing well-annotated datasets, enabling the learning of diverse motions through a more extensive range of actions. To this end, a motion BLIP is trained upon a pretrained vision-language model, then we automatically generate diverse motion captions for the collected motion sequences. As a result, we finally build a dataset comprising 8,888 motions coupled with 141k text.To comprehensively understand the text command, we propose a Hierarchical Semantic Aggregation (HSA) module to capture the fine-grained semantics.Finally,we involve the above two designs into an effective Motion Discrete Diffusion (MDD) framework to strike a balance between motion quality and diversity. Extensive experiments on HumanML3D and KIT-ML show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity. Dataset, code, and pretrained models will be released to reproduce all of our results. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 12 pages, 7 figures

arXiv:2308.13561 [pdf, other]

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data. △ Less

Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.11501 [pdf]

doi 10.1002/rob.22256

Four years of multi-modal odometry and map** on the rail vehicles

Authors: Yusheng Wang, Weiwei Song, Yi Zhang, Fei Huang, Zhiyong Tu, Ruoying Li, Shimin Zhang, Yidong Lou

Abstract: Precise, seamless, and efficient train localization as well as long-term railway environment monitoring is the essential property towards reliability, availability, maintainability, and safety (RAMS) engineering for railroad systems. Simultaneous localization and map** (SLAM) is right at the core of solving the two problems concurrently. In this end, we propose a high-performance and versatile m… ▽ More Precise, seamless, and efficient train localization as well as long-term railway environment monitoring is the essential property towards reliability, availability, maintainability, and safety (RAMS) engineering for railroad systems. Simultaneous localization and map** (SLAM) is right at the core of solving the two problems concurrently. In this end, we propose a high-performance and versatile multi-modal framework in this paper, targeted for the odometry and map** task for various rail vehicles. Our system is built atop an inertial-centric state estimator that tightly couples light detection and ranging (LiDAR), visual, optionally satellite navigation and map-based localization information with the convenience and extendibility of loosely coupled methods. The inertial sensors IMU and wheel encoder are treated as the primary sensor, which achieves the observations from subsystems to constrain the accelerometer and gyroscope biases. Compared to point-only LiDAR-inertial methods, our approach leverages more geometry information by introducing both track plane and electric power pillars into state estimation. The Visual-inertial subsystem also utilizes the environmental structure information by employing both lines and points. Besides, the method is capable of handling sensor failures by automatic reconfiguration bypassing failure modules. Our proposed method has been extensively tested in the long-during railway environments over four years, including general-speed, high-speed and metro, both passenger and freight traffic are investigated. Further, we aim to share, in an open way, the experience, problems, and successes of our group with the robotics community so that those that work in such environments can avoid these errors. In this view, we open source some of the datasets to benefit the research community. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11492 [pdf]

A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles

Authors: Yusheng Wang, Yidong Lou, Weiwei Song, Bing Zhan, Feihuang Xia, Qigeng Duan

Abstract: Multi-modal sensor integration has become a crucial prerequisite for the real-world navigation systems. Recent studies have reported successful deployment of such system in many fields. However, it is still challenging for navigation tasks in mine scenes due to satellite signal dropouts, degraded perception, and observation degeneracy. To solve this problem, we propose a LiDAR-inertial odometry me… ▽ More Multi-modal sensor integration has become a crucial prerequisite for the real-world navigation systems. Recent studies have reported successful deployment of such system in many fields. However, it is still challenging for navigation tasks in mine scenes due to satellite signal dropouts, degraded perception, and observation degeneracy. To solve this problem, we propose a LiDAR-inertial odometry method in this paper, utilizing both Kalman filter and graph optimization. The front-end consists of multiple parallel running LiDAR-inertial odometries, where the laser points, IMU, and wheel odometer information are tightly fused in an error-state Kalman filter. Instead of the commonly used feature points, we employ surface elements for registration. The back-end construct a pose graph and jointly optimize the pose estimation results from inertial, LiDAR odometry, and global navigation satellite system (GNSS). Since the vehicle has a long operation time inside the tunnel, the largely accumulated drift may be not fully by the GNSS measurements. We hereby leverage a loop closure based re-initialization process to achieve full alignment. In addition, the system robustness is improved through handling data loss, stream consistency, and estimation error. The experimental results show that our system has a good tolerance to the long-period degeneracy with the cooperation different LiDARs and surfel registration, achieving meter-level accuracy even for tens of minutes running during GNSS dropouts. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11422 [pdf, other]

Recommending Analogical APIs via Knowledge Graph Embedding

Authors: Mingwei Liu, Yanjun Yang, Yiling Lou, Xin Peng, Zhong Zhou, Xueying Du, Tianyong Yang

Abstract: Library migration, which re-implements the same software behavior by using a different library instead of using the current one, has been widely observed in software evolution. One essential part of library migration is to find an analogical API that could provide the same functionality as current ones. However, given the large number of libraries/APIs, manually finding an analogical API could be… ▽ More Library migration, which re-implements the same software behavior by using a different library instead of using the current one, has been widely observed in software evolution. One essential part of library migration is to find an analogical API that could provide the same functionality as current ones. However, given the large number of libraries/APIs, manually finding an analogical API could be very time-consuming and error-prone. Researchers have developed multiple automated analogical API recommendation techniques. Documentation-based methods have particularly attracted significant interest. Despite their potential, these methods have limitations, such as a lack of comprehensive semantic understanding in documentation and scalability challenges. In this work, we propose KGE4AR, a novel documentation-based approach that leverages knowledge graph (KG) embedding to recommend analogical APIs during library migration. Specifically, KGE4AR proposes a novel unified API KG to comprehensively and structurally represent three types of knowledge in documentation, which can better capture the high-level semantics. Moreover, KGE4AR then proposes to embed the unified API KG into vectors, enabling more effective and scalable similarity calculation. We build KGE4AR' s unified API KG for 35,773 Java libraries and assess it in two API recommendation scenarios: with and without target libraries. Our results show that KGE4AR substantially outperforms state-of-the-art documentation-based techniques in both evaluation scenarios in terms of all metrics (e.g., 47.1%-143.0% and 11.7%-80.6% MRR improvements in each scenario). Additionally, we explore KGE4AR' s scalability, confirming its effective scaling with the growing number of libraries. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by FSE 2023

arXiv:2308.09119 [pdf, other]

ICAR: Image-based Complementary Auto Reasoning

Authors: Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang

Abstract: Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture,… ▽ More Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture, and etc.) and complementarity (different items like table vs chair completing a group). Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation. We introduce a "Flexible Bidirectional Transformer (FBT)" consisting of an encoder with flexible masking, a category prediction arm, and an auto-regressive visual embedding prediction arm. And the inputs for FBT are cross-domain visual similarity invariant embeddings, making this framework quite generalizable. Furthermore, our proposed FBT model learns the inter-object compatibility from a large set of scene images in a self-supervised way. Compared with the SOTA methods, this approach achieves up to 5.3% and 9.6% in FITB score and 22.3% and 31.8% SFID improvement on fashion and furniture, respectively. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.01861 [pdf, other]

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

Authors: Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, Yiling Lou

Abstract: In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on it, we then perform the first study of 11 state-of-the-art LLMs on class-level co… ▽ More In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on it, we then perform the first study of 11 state-of-the-art LLMs on class-level code generation. Based on our results, we have the following main findings. First, we find that all existing LLMs show much worse performance on class-level code generation compared to on standalone method-level code generation benchmarks like HumanEval; and the method-level coding ability cannot equivalently reflect the class-level coding ability among LLMs. Second, we find that GPT-4 and GPT-3.5 still exhibit dominate superior than other LLMs on class-level code generation, and the second-tier models includes Instruct-Starcoder, Instruct-Codegen, and Wizardcoder with very similar performance. Third, we find that generating the entire class all at once (i.e. holistic generation strategy) is the best generation strategy only for GPT-4 and GPT-3.5, while method-by-method generation (i.e. incremental and compositional) is better strategies for the other models with limited ability of understanding long instructions and utilizing the middle information. Lastly, we find the limited model ability of generating method-dependent code and discuss the frequent error types in generated classes. Our benchmark is available at https://github.com/FudanSELab/ClassEval. △ Less

Submitted 14 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

arXiv:2308.01240 [pdf, other]

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Authors: Zhiqiang Yuan, Junwei Liu, Qiancheng Zi, Mingwei Liu, Xin Peng, Yiling Lou

Abstract: In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instr… ▽ More In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.16121 [pdf, other]

doi 10.3233/FAIA230441

Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

Authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jian** Wang

Abstract: Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertain… ▽ More Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: In proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023. 8 pages + 2 appendix pages

arXiv:2307.08838 [pdf, other]

doi 10.1109/IROS55552.2023.10341608

Dynamic Object Tracking for Quadruped Manipulator with Spherical Image-Based Approach

Authors: Tianlin Zhang, Sikai Guo, Xiaogang Xiong, Wanlei Li, Zezheng Qi, Yunjiang Lou

Abstract: Exactly estimating and tracking the motion of surrounding dynamic objects is one of important tasks for the autonomy of a quadruped manipulator. However, with only an onboard RGB camera, it is still a challenging work for a quadruped manipulator to track the motion of a dynamic object moving with unknown and changing velocities. To address this problem, this manuscript proposes a novel image-based… ▽ More Exactly estimating and tracking the motion of surrounding dynamic objects is one of important tasks for the autonomy of a quadruped manipulator. However, with only an onboard RGB camera, it is still a challenging work for a quadruped manipulator to track the motion of a dynamic object moving with unknown and changing velocities. To address this problem, this manuscript proposes a novel image-based visual servoing (IBVS) approach consisting of three elements: a spherical projection model, a robust super-twisting observer, and a model predictive controller (MPC). The spherical projection model decouples the visual error of the dynamic target into linear and angular ones. Then, with the presence of the visual error, the robustness of the observer is exploited to estimate the unknown and changing velocities of the dynamic target without depth estimation. Finally, the estimated velocity is fed into the model predictive controller (MPC) to generate joint torques for the quadruped manipulator to track the motion of the dynamical target. The proposed approach is validated through hardware experiments and the experimental results illustrate the approach's effectiveness in improving the autonomy of the quadruped manipulator. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 727-734

arXiv:2307.00439 [pdf, other]

Weighted Anisotropic-Isotropic Total Variation for Poisson Denoising

Authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin

Abstract: Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating… ▽ More Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating the weighted anisotropic-isotropic total variation (AITV) as a regularization. We then develop an alternating direction method of multipliers with a combination of a proximal operator for an efficient implementation. Lastly, numerical experiments demonstrate that our algorithm outperforms other Poisson denoising methods in terms of image quality and computational efficiency. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: accepted to ICIP 2023

arXiv:2306.03174 [pdf, other]

doi 10.1145/3528223.3530162

Computational Design of Passive Grippers

Authors: Milin Kodnongbua, Ian Good Yu Lou, Jeffrey Lipton, Adriana Schulz

Abstract: This work proposes a novel generative design tool for passive grippers -- robot end effectors that have no additional actuation and instead leverage the existing degrees of freedom in a robotic arm to perform gras** tasks. Passive grippers are used because they offer interesting trade-offs between cost and capabilities. However, existing designs are limited in the types of shapes that can be gra… ▽ More This work proposes a novel generative design tool for passive grippers -- robot end effectors that have no additional actuation and instead leverage the existing degrees of freedom in a robotic arm to perform gras** tasks. Passive grippers are used because they offer interesting trade-offs between cost and capabilities. However, existing designs are limited in the types of shapes that can be grasped. This work proposes to use rapid-manufacturing and design optimization to expand the space of shapes that can be passively grasped. Our novel generative design algorithm takes in an object and its positioning with respect to a robotic arm and generates a 3D printable passive gripper that can stably pick the object up. To achieve this, we address the key challenge of jointly optimizing the shape and the insert trajectory to ensure a passively stable grasp. We evaluate our method on a testing suite of 22 objects (23 experiments), all of which were evaluated with physical experiments to bridge the virtual-to-real gap. Code and data are at https://homes.cs.washington.edu/~milink/passive-gripper/ △ Less

Submitted 5 June, 2023; originally announced June 2023.

Journal ref: ACM Transactions on Graphics, Volume 41, Issue 4, July 2022, Article No.: 149, pp 2-12

arXiv:2305.07872 [pdf, other]

doi 10.1109/TCSI.2023.3296602

SPP-CNN: An Efficient Framework for Network Robustness Prediction

Authors: Chengpei Wu, Yang Lou, Lin Wang, Junli Li, Xiang Li, Guanrong Chen

Abstract: This paper addresses the robustness of a network to sustain its connectivity and controllability against malicious attacks. This kind of network robustness is typically measured by the time-consuming attack simulation, which returns a sequence of values that record the remaining connectivity and controllability after a sequence of node- or edge-removal attacks. For improvement, this paper develops… ▽ More This paper addresses the robustness of a network to sustain its connectivity and controllability against malicious attacks. This kind of network robustness is typically measured by the time-consuming attack simulation, which returns a sequence of values that record the remaining connectivity and controllability after a sequence of node- or edge-removal attacks. For improvement, this paper develops an efficient framework for network robustness prediction, the spatial pyramid pooling convolutional neural network (SPP-CNN). The new framework installs a spatial pyramid pooling layer between the convolutional and fully-connected layers, overcoming the common mismatch issue in the CNN-based prediction approaches and extending its generalizability. Extensive experiments are carried out by comparing SPP-CNN with three state-of-the-art robustness predictors, namely a CNN-based and two graph neural networks-based frameworks. Synthetic and real-world networks, both directed and undirected, are investigated. Experimental results demonstrate that the proposed SPP-CNN achieves better prediction performances and better generalizability to unknown datasets, with significantly lower time-consumption, than its counterparts. △ Less

Submitted 13 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures, 14 pages Supplementary Information

Journal ref: IEEE Transactions on Circuits and Systems I: Regular Papers. 2023, 70 (10), 4067-4079

arXiv:2305.04207 [pdf, other]

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Authors: Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng

Abstract: Unit testing is essential in detecting bugs in functionally-discrete program units. Manually writing high-quality unit tests is time-consuming and laborious. Although traditional techniques can generate tests with reasonable coverage, they exhibit low readability and cannot be directly adopted by developers. Recent work has shown the large potential of large language models (LLMs) in unit test gen… ▽ More Unit testing is essential in detecting bugs in functionally-discrete program units. Manually writing high-quality unit tests is time-consuming and laborious. Although traditional techniques can generate tests with reasonable coverage, they exhibit low readability and cannot be directly adopted by developers. Recent work has shown the large potential of large language models (LLMs) in unit test generation, which can generate more human-like and meaningful test code. ChatGPT, the latest LLM incorporating instruction tuning and reinforcement learning, has performed well in various domains. However, It remains unclear how effective ChatGPT is in unit test generation. In this work, we perform the first empirical study to evaluate ChatGPT's capability of unit test generation. Specifically, we conduct a quantitative analysis and a user study to systematically investigate the quality of its generated tests regarding the correctness, sufficiency, readability, and usability. The tests generated by ChatGPT still suffer from correctness issues, including diverse compilation errors and execution failures. Still, the passing tests generated by ChatGPT resemble manually-written tests by achieving comparable coverage, readability, and even sometimes developers' preference. Our findings indicate that generating unit tests with ChatGPT could be very promising if the correctness of its generated tests could be further improved. Inspired by our findings above, we propose ChatTESTER, a novel ChatGPT-based unit test generation approach, which leverages ChatGPT itself to improve the quality of its generated tests. ChatTESTER incorporates an initial test generator and an iterative test refiner. Our evaluation demonstrates the effectiveness of ChatTESTER by generating 34.3% more compilable tests and 18.7% more tests with correct assertions than the default ChatGPT. △ Less

Submitted 19 May, 2024; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2305.00366 [pdf, other]

S2abEL: A Dataset for Entity Linking from Scientific Tables

Authors: Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey

Abstract: Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific ta… ▽ More Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific tables. EL for scientific tables is especially challenging because scientific knowledge bases can be very incomplete, and disambiguating table mentions typically requires understanding the papers's tet in addition to the table. Our dataset, S2abEL, focuses on EL in machine learning results tables and includes hand-labeled cell types, attributed sources, and entity links from the PaperswithCode taxonomy for 8,429 cells from 732 tables. We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions, and show that it significantly outperforms a state-of-the-art generic table EL method. The best baselines fall below human performance, and our analysis highlights avenues for improvement. △ Less

Submitted 29 April, 2023; originally announced May 2023.

arXiv:2304.06548 [pdf, other]

doi 10.1109/TIM.2023.3334336

Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

Authors: Shilei Li, Li**g Li, Dawei Shi, Yunjiang Lou, Ling Shi

Abstract: This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vul… ▽ More This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors. △ Less

Submitted 11 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 16 pages

arXiv:2304.03285 [pdf, other]

$\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus

Authors: Hadi Alzayer, Abdullah Abuolaim, Leung Chun Chan, Yang Yang, Ying Chen Lou, Jia-Bin Huang, Abhishek Kar

Abstract: Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifica… ▽ More Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifically, an ultra-wide camera with wider field of view and deeper DoF and a higher resolution primary camera with shallower DoF. In this work, we propose $\text{DC}^2$, a system for defocus control for synthetically varying camera aperture, focus distance and arbitrary defocus effects by fusing information from such a dual-camera system. Our key insight is to leverage real-world smartphone camera dataset by using image refocus as a proxy task for learning to control defocus. Quantitative and qualitative evaluations on real-world data demonstrate our system's efficacy where we outperform state-of-the-art on defocus deblurring, bokeh rendering, and image refocus. Finally, we demonstrate creative post-capture defocus control enabled by our method, including tilt-shift and content-based defocus effects. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: CVPR 2023. See the project page at https://defocus-control.github.io

arXiv:2303.12721 [pdf, other]

doi 10.1109/ICASSP49357.2023.10094847

Non-convex approaches for low-rank tensor completion under tubal sampling

Authors: Zheng Tan, Longxiu Huang, HanQin Cai, Yifei Lou

Abstract: Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor $L_1$-$L_2$ (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainti… ▽ More Tensor completion is an important problem in modern data analysis. In this work, we investigate a specific sampling strategy, referred to as tubal sampling. We propose two novel non-convex tensor completion frameworks that are easy to implement, named tensor $L_1$-$L_2$ (TL12) and tensor completion via CUR (TCCUR). We test the efficiency of both methods on synthetic data and a color image inpainting problem. Empirical results reveal a trade-off between the accuracy and time efficiency of these two methods in a low sampling ratio. Each of them outperforms some classical completion methods in at least one aspect. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023

arXiv:2303.03101 [pdf, other]

CRIN: Rotation-Invariant Point Cloud Analysis and Rotation Estimation via Centrifugal Reference Frame

Authors: Yu**g Lou, Zelin Ye, Yang You, Nianjuan Jiang, Jiangbo Lu, Weiming Wang, Lizhuang Ma, Cewu Lu

Abstract: Various recent methods attempt to implement rotation-invariant 3D deep learning by replacing the input coordinates of points with relative distances and angles. Due to the incompleteness of these low-level features, they have to undertake the expense of losing global information. In this paper, we propose the CRIN, namely Centrifugal Rotation-Invariant Network. CRIN directly takes the coordinates… ▽ More Various recent methods attempt to implement rotation-invariant 3D deep learning by replacing the input coordinates of points with relative distances and angles. Due to the incompleteness of these low-level features, they have to undertake the expense of losing global information. In this paper, we propose the CRIN, namely Centrifugal Rotation-Invariant Network. CRIN directly takes the coordinates of points as input and transforms local points into rotation-invariant representations via centrifugal reference frames. Aided by centrifugal reference frames, each point corresponds to a discrete rotation so that the information of rotations can be implicitly stored in point features. Unfortunately, discrete points are far from describing the whole rotation space. We further introduce a continuous distribution for 3D rotations based on points. Furthermore, we propose an attention-based down-sampling strategy to sample points invariant to rotations. A relation module is adopted at last for reinforcing the long-range dependencies between sampled points and predicts the anchor point for unsupervised rotation estimation. Extensive experiments show that our method achieves rotation invariance, accurately estimates the object rotation, and obtains state-of-the-art results on rotation-augmented classification and part segmentation. Ablation studies validate the effectiveness of the network design. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: AAAI 2023

arXiv:2302.03745 [pdf, other]

doi 10.1109/MCAS.2023.3236659

Structural Robustness of Complex Networks: A Survey of A Posteriori Measures

Authors: Yang Lou, Lin Wang, Guanrong Chen

Abstract: Network robustness is critical for various industrial and social networks against malicious attacks, which has various meanings in different research contexts and here it refers to the ability of a network to sustain its functionality when a fraction of the network fail to work due to attacks. The rapid development of complex networks research indicates special interest and great concern about the… ▽ More Network robustness is critical for various industrial and social networks against malicious attacks, which has various meanings in different research contexts and here it refers to the ability of a network to sustain its functionality when a fraction of the network fail to work due to attacks. The rapid development of complex networks research indicates special interest and great concern about the network robustness, which is essential for further analyzing and optimizing network structures towards engineering applications. This comprehensive survey distills the important findings and developments of network robustness research, focusing on the a posteriori structural robustness measures for single-layer static networks. Specifically, the a posteriori robustness measures are reviewed from four perspectives: 1) network functionality, including connectivity, controllability and communication ability, as well as their extensions; 2) malicious attacks, including conventional and computation-based attack strategies; 3) robustness estimation methods using either analytical approximation or machine learning-based prediction; 4) network robustness optimization. Based on the existing measures, a practical threshold of network destruction is introduced, with the suggestion that network robustness should be measured only before reaching the threshold of destruction. Then, a posteriori and a priori measures are compared experimentally, revealing the advantages of the a posteriori measures. Finally, prospective research directions with respect to a posteriori robustness measures are recommended. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Journal ref: IEEE Circuits and Systems Magazine, Volume 23, Issue 1, 2023

arXiv:2302.01857 [pdf, other]

KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair

Authors: Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, Xiangyu Zhang

Abstract: Automated Program Repair (APR) improves software reliability by generating patches for a buggy program automatically. Recent APR techniques leverage deep learning (DL) to build models to learn to generate patches from existing patches and code corpora. While promising, DL-based APR techniques suffer from the abundant syntactically or semantically incorrect patches in the patch space. These patches… ▽ More Automated Program Repair (APR) improves software reliability by generating patches for a buggy program automatically. Recent APR techniques leverage deep learning (DL) to build models to learn to generate patches from existing patches and code corpora. While promising, DL-based APR techniques suffer from the abundant syntactically or semantically incorrect patches in the patch space. These patches often disobey the syntactic and semantic domain knowledge of source code and thus cannot be the correct patches to fix a bug. We propose a DL-based APR approach KNOD, which incorporates domain knowledge to guide patch generation in a direct and comprehensive way. KNOD has two major novelties, including (1) a novel three-stage tree decoder, which directly generates Abstract Syntax Trees of patched code according to the inherent tree structure, and (2) a novel domain-rule distillation, which leverages syntactic and semantic rules and teacher-student distributions to explicitly inject the domain knowledge into the decoding procedure during both the training and inference phases. We evaluate KNOD on three widely-used benchmarks. KNOD fixes 72 bugs on the Defects4J v1.2, 25 bugs on the QuixBugs, and 50 bugs on the additional Defects4J v2.0 benchmarks, outperforming all existing APR tools. △ Less

Submitted 16 April, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: This paper is accepted by 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

arXiv:2301.03393 [pdf, other]

Difference of Anisotropic and Isotropic TV for Segmentation under Blur and Poisson Noise

Authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin

Abstract: In this paper, we aim to segment an image degraded by blur and Poisson noise. We adopt a smoothing-and-thresholding (SaT) segmentation framework that finds a piecewise-smooth solution, followed by $k$-means clustering to segment the image. Specifically for the image smoothing step, we replace the least-squares fidelity for Gaussian noise in the Mumford-Shah model with a maximum posterior (MAP) ter… ▽ More In this paper, we aim to segment an image degraded by blur and Poisson noise. We adopt a smoothing-and-thresholding (SaT) segmentation framework that finds a piecewise-smooth solution, followed by $k$-means clustering to segment the image. Specifically for the image smoothing step, we replace the least-squares fidelity for Gaussian noise in the Mumford-Shah model with a maximum posterior (MAP) term to deal with Poisson noise and we incorporate the weighted difference of anisotropic and isotropic total variation (AITV) as a regularization to promote the sparsity of image gradients. For such a nonconvex model, we develop a specific splitting scheme and utilize a proximal operator to apply the alternating direction method of multipliers (ADMM). Convergence analysis is provided to validate the efficacy of the ADMM scheme. Numerical experiments on various segmentation scenarios (grayscale/color and multiphase) showcase that our proposed method outperforms a number of segmentation methods, including the original SaT. △ Less

Submitted 16 June, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: Accepted to Frontiers in Computer Science: https://www.frontiersin.org/articles/10.3389/fcomp.2023.1131317/abstract; Arxiv version has clearer images best for zooming in

arXiv:2212.04145 [pdf, other]

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation

Authors: Yulu Gan, Yan Bai, Yihang Lou, Xianzheng Ma, Renrui Zhang, Nian Shi, Lin Luo

Abstract: Continual Test-Time Adaptation (CTTA) aims to adapt the source model to continually changing unlabeled target domains without access to the source data. Existing methods mainly focus on model-based adaptation in a self-training manner, such as predicting pseudo labels for new domain datasets. Since pseudo labels are noisy and unreliable, these methods suffer from catastrophic forgetting and error… ▽ More Continual Test-Time Adaptation (CTTA) aims to adapt the source model to continually changing unlabeled target domains without access to the source data. Existing methods mainly focus on model-based adaptation in a self-training manner, such as predicting pseudo labels for new domain datasets. Since pseudo labels are noisy and unreliable, these methods suffer from catastrophic forgetting and error accumulation when dealing with dynamic data distributions. Motivated by the prompt learning in NLP, in this paper, we propose to learn an image-level visual domain prompt for target domains while having the source model parameters frozen. During testing, the changing target datasets can be adapted to the source model by reformulating the input data with the learned visual prompts. Specifically, we devise two types of prompts, i.e., domains-specific prompts and domains-agnostic prompts, to extract current domain knowledge and maintain the domain-shared knowledge in the continual adaptation. Furthermore, we design a homeostasis-based prompt adaptation strategy to suppress domain-sensitive parameters in domain-invariant prompts to learn domain-shared knowledge more effectively. This transition from the model-dependent paradigm to the model-free one enables us to bypass the catastrophic forgetting and error accumulation problems. Experiments show that our proposed method achieves significant performance gains over state-of-the-art methods on four widely-used benchmarks, including CIFAR-10C, CIFAR-100C, ImageNet-C, and VLCS datasets. △ Less

Submitted 11 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: AAAI2023 Outstanding Student Paper Award

arXiv:2211.15386 [pdf, other]

PC-SNN: Supervised Learning with Local Hebbian Synaptic Plasticity based on Predictive Coding in Spiking Neural Networks

Authors: Mengting Lan, Xiaogang Xiong, Zixuan Jiang, Yunjiang Lou

Abstract: Deemed as the third generation of neural networks, the event-driven Spiking Neural Networks(SNNs) combined with bio-plausible local learning rules make it promising to build low-power, neuromorphic hardware for SNNs. However, because of the non-linearity and discrete property of spiking neural networks, the training of SNN remains difficult and is still under discussion. Originating from gradient… ▽ More Deemed as the third generation of neural networks, the event-driven Spiking Neural Networks(SNNs) combined with bio-plausible local learning rules make it promising to build low-power, neuromorphic hardware for SNNs. However, because of the non-linearity and discrete property of spiking neural networks, the training of SNN remains difficult and is still under discussion. Originating from gradient descent, backprop has achieved stunning success in multi-layer SNNs. Nevertheless, it is assumed to lack biological plausibility, while consuming relatively high computational resources. In this paper, we propose a novel learning algorithm inspired by predictive coding theory and show that it can perform supervised learning fully autonomously and successfully as the backprop, utilizing only local Hebbian plasticity. Furthermore, this method achieves a favorable performance compared to the state-of-the-art multi-layer SNNs: test accuracy of 99.25% for the Caltech Face/Motorbike dataset, 84.25% for the ETH-80 dataset, 98.1% for the MNIST dataset and 98.5% for the neuromorphic dataset: N-MNIST. Furthermore, our work provides a new perspective on how supervised learning algorithms are directly implemented in spiking neural circuitry, which may give some new insights into neuromorphological calculation in neuroscience. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: 15 pages, 11figs

ACM Class: I.2.3; I.2.10

arXiv:2211.06557 [pdf, other]

doi 10.1109/TMECH.2023.3272910

Active View Planning for Visual SLAM in Outdoor Environments Based on Continuous Information Modeling

Authors: Zhihao Wang, Haoyao Chen, Shiwu Zhang, Yunjiang Lou

Abstract: The visual simultaneous localization and map**(vSLAM) is widely used in GPS-denied and open field environments for ground and surface robots. However, due to the frequent perception failures derived from lacking visual texture or the {swing} of robot view direction on rough terrains, the accuracy and robustness of vSLAM are still to be enhanced. The study develops a novel view planning approach… ▽ More The visual simultaneous localization and map**(vSLAM) is widely used in GPS-denied and open field environments for ground and surface robots. However, due to the frequent perception failures derived from lacking visual texture or the {swing} of robot view direction on rough terrains, the accuracy and robustness of vSLAM are still to be enhanced. The study develops a novel view planning approach of actively perceiving areas with maximal information to address the mentioned problem; a gimbal camera is used as the main sensor. Firstly, a map representation based on feature distribution-weighted Fisher information is proposed to completely and effectively represent environmental information richness. With the map representation, a continuous environmental information model is further established to convert the discrete information space into a continuous one for numerical optimization in real-time. Subsequently, the receding horizon optimization is utilized to obtain the optimal informative viewpoints with simultaneously considering the robotic perception, exploration and motion cost based on the continuous environmental model. Finally, several simulations and outdoor experiments are performed to verify the improvement of localization robustness and accuracy by the proposed approach. △ Less

Submitted 22 May, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 11 pages, 14 figures, in IEEE/ASME Transactions on Mechatronics

arXiv:2210.01448 [pdf, other]

doi 10.1145/3550454.3555435

Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Authors: Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

Abstract: Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesi… ▽ More Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin. △ Less

Submitted 4 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: SIGGRAPH Asia 2022 (Journal Track); Project Page: https://pku-mocca.github.io/Rhythmic-Gesticulator-Page/

arXiv:2209.02834 [pdf, other]

Unsupervised Scene Sketch to Photo Synthesis

Authors: Jiayun Wang, Sangryul Jeon, Stella X. Yu, Xi Zhang, Himanshu Arora, Yu Lou

Abstract: Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseud… ▽ More Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseudo sketch-photo pairs during training by converting photos and sketches to a standardized domain, i.e. the edge map. The reduced domain gap between sketch and photo also allows us to disentangle them into two components: holistic scene structures and low-level visual styles such as color and texture. Taking this advantage, we synthesize a photo-realistic image by combining the structure of a sketch and the visual style of a reference photo. Extensive experimental results on perceptual similarity metrics and human perceptual studies show the proposed method could generate realistic photos with high fidelity from scene sketches and outperform state-of-the-art photo synthesis baselines. We also demonstrate that our framework facilitates a controllable manipulation of photo synthesis by editing strokes of corresponding sketches, delivering more fine-grained details than previous approaches that rely on region-level editing. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Journal ref: ECCVW 2022

arXiv:2208.11847 [pdf, other]

doi 10.1109/IJCNN55064.2022.9892188

CNN-based Prediction of Network Robustness With Missing Edges

Authors: Chengpei Wu, Yang Lou, Ruizi Wu, Wenwen Liu, Junli Li

Abstract: Connectivity and controllability of a complex network are two important issues that guarantee a networked system to function. Robustness of connectivity and controllability guarantees the system to function properly and stably under various malicious attacks. Evaluating network robustness using attack simulations is time consuming, while the convolutional neural network (CNN)-based prediction appr… ▽ More Connectivity and controllability of a complex network are two important issues that guarantee a networked system to function. Robustness of connectivity and controllability guarantees the system to function properly and stably under various malicious attacks. Evaluating network robustness using attack simulations is time consuming, while the convolutional neural network (CNN)-based prediction approach provides a cost-efficient method to approximate the network robustness. In this paper, we investigate the performance of CNN-based approaches for connectivity and controllability robustness prediction, when partial network information is missing, namely the adjacency matrix is incomplete. Extensive experimental studies are carried out. A threshold is explored that if a total amount of more than 7.29\% information is lost, the performance of CNN-based prediction will be significantly degenerated for all cases in the experiments. Two scenarios of missing edge representations are compared, 1) a missing edge is marked `no edge' in the input for prediction, and 2) a missing edge is denoted using a special marker of `unknown'. Experimental results reveal that the first representation is misleading to the CNN-based predictors. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: In Proceedings of the IEEE 2022 International Joint Conference on Neural Networks (IJCNN)

arXiv:2208.02068 [pdf, other]

HybridGNN: Learning Hybrid Representation in Multiplex Heterogeneous Networks

Authors: Tiankai Gu, Chaokun Wang, Cheng Wu, **gcao Xu, Yunkai Lou, Chang** Wang, Kai Xu, Can Ye, Yang Song

Abstract: Recently, graph neural networks have shown the superiority of modeling the complex topological structures in heterogeneous network-based recommender systems. Due to the diverse interactions among nodes and abundant semantics emerging from diverse types of nodes and edges, there is a bursting research interest in learning expressive node representations in multiplex heterogeneous networks. One of t… ▽ More Recently, graph neural networks have shown the superiority of modeling the complex topological structures in heterogeneous network-based recommender systems. Due to the diverse interactions among nodes and abundant semantics emerging from diverse types of nodes and edges, there is a bursting research interest in learning expressive node representations in multiplex heterogeneous networks. One of the most important tasks in recommender systems is to predict the potential connection between two nodes under a specific edge type (i.e., relationship). Although existing studies utilize explicit metapaths to aggregate neighbors, practically they only consider intra-relationship metapaths and thus fail to leverage the potential uplift by inter-relationship information. Moreover, it is not always straightforward to exploit inter-relationship metapaths comprehensively under diverse relationships, especially with the increasing number of node and edge types. In addition, contributions of different relationships between two nodes are difficult to measure. To address the challenges, we propose HybridGNN, an end-to-end GNN model with hybrid aggregation flows and hierarchical attentions to fully utilize the heterogeneity in the multiplex scenarios. Specifically, HybridGNN applies a randomized inter-relationship exploration module to exploit the multiplexity property among different relationships. Then, our model leverages hybrid aggregation flows under intra-relationship metapaths and randomized exploration to learn the rich semantics. To explore the importance of different aggregation flow and take advantage of the multiplexity property, we bring forward a novel hierarchical attention module which leverages both metapath-level attention and relationship-level attention. Extensive experimental results suggest that HybridGNN achieves the best performance compared to several state-of-the-art baselines. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: ICDE 2022

arXiv:2207.13137 [pdf, other]

Bayesian Evidential Learning for Few-Shot Classification

Authors: Xiongkun Linghu, Yan Bai, Yihang Lou, Shengsen Wu, **ze Li, Jianzhong He, Tao Bai

Abstract: Few-Shot Classification(FSC) aims to generalize from base classes to novel classes given very limited labeled samples, which is an important step on the path toward human-like machine learning. State-of-the-art solutions involve learning to find a good metric and representation space to compute the distance between samples. Despite the promising accuracy performance, how to model uncertainty for m… ▽ More Few-Shot Classification(FSC) aims to generalize from base classes to novel classes given very limited labeled samples, which is an important step on the path toward human-like machine learning. State-of-the-art solutions involve learning to find a good metric and representation space to compute the distance between samples. Despite the promising accuracy performance, how to model uncertainty for metric-based FSC methods effectively is still a challenge. To model uncertainty, We place a distribution over class probability based on the theory of evidence. As a result, uncertainty modeling and metric learning can be decoupled. To reduce the uncertainty of classification, we propose a Bayesian evidence fusion theorem. Given observed samples, the network learns to get posterior distribution parameters given the prior parameters produced by the pre-trained network. Detailed gradient analysis shows that our method provides a smooth optimization target and can capture the uncertainty. The proposed method is agnostic to metric learning strategies and can be implemented as a plug-and-play module. We integrate our method into several newest FSC methods and demonstrate the improved accuracy and uncertainty quantification on standard FSC benchmarks. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2107.10161 by other authors

Showing 1–50 of 120 results for author: Lou, Y