Search | arXiv e-print repository

Autonomous Ground Navigation in Highly Constrained Spaces: Lessons learned from The 3rd BARN Challenge at ICRA 2024

Authors: Xuesu Xiao, Zifan Xu, Aniket Datar, Garrett Warnell, Peter Stone, Joshua Julian Damanik, Jaewon Jung, Chala Adane Deresa, Than Duc Huy, Chen **yu, Chen Yichen, Joshua Adrian Cahyono, **gda Wu, Longfei Mo, Mingyang Lv, Bowen Lan, Qingyang Meng, Weizhi Tao, Li Cheng

Abstract: The 3rd BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) in Yokohama, Japan and continued to evaluate the performance of state-of-the-art autonomous ground navigation systems in highly constrained environments. Similar to the trend in The 1st and 2nd BARN Challenge at ICRA 2022 and 2023 in Philadelphi… ▽ More The 3rd BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) in Yokohama, Japan and continued to evaluate the performance of state-of-the-art autonomous ground navigation systems in highly constrained environments. Similar to the trend in The 1st and 2nd BARN Challenge at ICRA 2022 and 2023 in Philadelphia (North America) and London (Europe), The 3rd BARN Challenge in Yokohama (Asia) became more regional, i.e., mostly Asian teams participated. The size of the competition has slightly shrunk (six simulation teams, four of which were invited to the physical competition). The competition results, compared to last two years, suggest that the field has adopted new machine learning approaches while at the same time slightly converged to a few common practices. However, the regional nature of the physical participants suggests a challenge to promote wider participation all over the world and provide more resources to travel to the venue. In this article, we discuss the challenge, the approaches used by the three winning teams, and lessons learned to direct future research and competitions. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.03205

arXiv:2405.13519 [pdf]

Multi-fidelity topology optimization of flow boiling heat transfer in microchannels

Authors: Yi Yuan, Li Chen, Qirui Yang, Lingran Gu, Wen-Quan Tao

Abstract: Topology optimization (TO) is a powerful method to design innovative structures with improved heat transfer performance. In the present study, a multi-fidelity TO method with a delicately defined objective function is developed for flow boiling heat transfer in microchannels. Low-fidelity TO is conducted for the reduced-order process of single-phase laminar convective heat transfer, which generate… ▽ More Topology optimization (TO) is a powerful method to design innovative structures with improved heat transfer performance. In the present study, a multi-fidelity TO method with a delicately defined objective function is developed for flow boiling heat transfer in microchannels. Low-fidelity TO is conducted for the reduced-order process of single-phase laminar convective heat transfer, which generates a set of structure candidates for subsequent high-fidelity evaluation of flow boiling heat transfer. To avoid the possible iteration between the low-fidelity TO and high-fidelity evaluation which leads to inefficient solution of the multi-fidelity TO, distributions of velocity, temperature and two-phase in microchannels with single-phase and/or flow boiling heat transfer are investigated and compared in detail, based on which a new objective function is delicately defined, which can be employed in the low-fidelity TO yet can stand for the performance of the high-fidelity problem. With the help of the new objective function, the efficiency of the multi-fidelity TO is significantly improved and TO structures are designed with hot spots eliminated, thermal resistance reduced and temperature uniformity improved. The present work provides a new method for TO of complicated heat and mass transfer problems. Keywords: topology optimization, flow boiling, multi-fidelity optimization, microchannels, convective heat transfer △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.02194 [pdf, other]

Coherent XUV super continuum emission from atomic bound states

Authors: **g Zhao, Xiaowei Wang, Li Wang, Jiacan Wang, Yalei Zhu, Fan Xiao, Wenkai Tao, Zhigang Zheng, Haizhong Wu, Xu Sun, Yue Lang, Congsen Meng, Dongwen Zhang, Zhihui Lv, **lei Liu, Zengxiu Zhao

Abstract: Coherent supercontinuum radiation in the extreme-ultraviolet (XUV) range is indispensable for synthesizing attosecond light pulses and for exploring transient atomic structures. Here, we report the striking observations of coherent XUV supercontinuum (XSC) extended from below to far above the ionization threshold, which exhibits completely different temporal and spatial properties comparing to the… ▽ More Coherent supercontinuum radiation in the extreme-ultraviolet (XUV) range is indispensable for synthesizing attosecond light pulses and for exploring transient atomic structures. Here, we report the striking observations of coherent XUV supercontinuum (XSC) extended from below to far above the ionization threshold, which exhibits completely different temporal and spatial properties comparing to the conventional rescattering induced high harmonic generation (HHG). We demonstrate that the strong-field created coherence among bound orbitals strongly distort the atomic transition energies during the pulse, leading to coherent emission spanning tens of electron-volts, in contrast to the line emission via free-induction decay occurring after the pulse. The supposed non-radiating bound dark states contribute as well by emitting dressed energy through dark-to-bright emission mechanism. All the processes modulated at sub-cycle time scale jointly form this new-type coherent XSC. This work achieves the strong-field attosecond control of the exotic atomic radiation dynamics and provides the means of simultaneous generation of separated attosecond sources, i.e., XSC and HHG, with potential advancing attosecond interferometry. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.13830 [pdf, other]

A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning

Authors: Yu-Xin Zhang, Jie Gui, Xiaofeng Cong, Xin Gong, Wenbing Tao

Abstract: Point cloud registration (PCR) involves determining a rigid transformation that aligns one point cloud to another. Despite the plethora of outstanding deep learning (DL)-based registration methods proposed, comprehensive and systematic studies on DL-based PCR techniques are still lacking. In this paper, we present a comprehensive survey and taxonomy of recently proposed PCR methods. Firstly, we co… ▽ More Point cloud registration (PCR) involves determining a rigid transformation that aligns one point cloud to another. Despite the plethora of outstanding deep learning (DL)-based registration methods proposed, comprehensive and systematic studies on DL-based PCR techniques are still lacking. In this paper, we present a comprehensive survey and taxonomy of recently proposed PCR methods. Firstly, we conduct a taxonomy of commonly utilized datasets and evaluation metrics. Secondly, we classify the existing research into two main categories: supervised and unsupervised registration, providing insights into the core concepts of various influential PCR models. Finally, we highlight open challenges and potential directions for future research. A curated collection of valuable resources is made available at https://github.com/yxzhang15/PCR. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: This paper is accepted by IJCAI 2024

arXiv:2404.03893 [pdf, other]

KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion

Authors: Tengfei Ma, Xiang song, Wen Tao, Mufei Li, Jiani Zhang, Xiaoqin Pan, Jianxin Lin, Bosheng Song, xiangxiang Zeng

Abstract: Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountabili… ▽ More Knowledge graph completion (KGC) aims to alleviate the inherent incompleteness of knowledge graphs (KGs), which is a critical task for various applications, such as recommendations on the web. Although knowledge graph embedding (KGE) models have demonstrated superior predictive performance on KGC tasks, these models infer missing links in a black-box manner that lacks transparency and accountability, preventing researchers from develo** accountable models. Existing KGE-based explanation methods focus on exploring key paths or isolated edges as explanations, which is information-less to reason target prediction. Additionally, the missing ground truth leads to these explanation methods being ineffective in quantitatively evaluating explored explanations. To overcome these limitations, we propose KGExplainer, a model-agnostic method that identifies connected subgraph explanations and distills an evaluator to assess them quantitatively. KGExplainer employs a perturbation-based greedy search algorithm to find key connected subgraphs as explanations within the local structure of target predictions. To evaluate the quality of the explored explanations, KGExplainer distills an evaluator from the target KGE model. By forwarding the explanations to the evaluator, our method can examine the fidelity of them. Extensive experiments on benchmark datasets demonstrate that KGExplainer yields promising improvement and achieves an optimal ratio of 83.3% in human evaluation. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 13 pages, 7 figures, 11 tables. Under Review

arXiv:2403.17927 [pdf, other]

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

Authors: Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, Yu Cheng

Abstract: In software development, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing code. Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving Github issues, particularly at the repository level. To overcome this challenge, we empirically s… ▽ More In software development, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing code. Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving Github issues, particularly at the repository level. To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, significantly outperforming the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the advanced LLM. △ Less

Submitted 27 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.04700 [pdf, other]

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

Authors: Sijia Chen, En Yu, **yang Li, Wenbing Tao

Abstract: Multiple Object Tracking (MOT) is a critical area within computer vision, with a broad spectrum of practical implementations. Current research has primarily focused on the development of tracking algorithms and enhancement of post-processing techniques. Yet, there has been a lack of thorough examination concerning the nature of tracking data it self. In this study, we pioneer an exploration into t… ▽ More Multiple Object Tracking (MOT) is a critical area within computer vision, with a broad spectrum of practical implementations. Current research has primarily focused on the development of tracking algorithms and enhancement of post-processing techniques. Yet, there has been a lack of thorough examination concerning the nature of tracking data it self. In this study, we pioneer an exploration into the distribution patterns of tracking data and identify a pronounced long-tail distribution issue within existing MOT datasets. We note a significant imbalance in the distribution of trajectory lengths across different pedestrians, a phenomenon we refer to as ``pedestrians trajectory long-tail distribution''. Addressing this challenge, we introduce a bespoke strategy designed to mitigate the effects of this skewed distribution. Specifically, we propose two data augmentation strategies, including Stationary Camera View Data Augmentation (SVA) and Dynamic Camera View Data Augmentation (DVA) , designed for viewpoint states and the Group Softmax (GS) module for Re-ID. SVA is to backtrack and predict the pedestrian trajectory of tail classes, and DVA is to use diffusion model to change the background of the scene. GS divides the pedestrians into unrelated groups and performs softmax operation on each group individually. Our proposed strategies can be integrated into numerous existing tracking systems, and extensive experimentation validates the efficacy of our method in reducing the influence of long-tail distribution on multi-object tracking performance. The code is available at https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT. △ Less

Submitted 24 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024!

arXiv:2402.18679 [pdf, other]

Data Interpreter: An LLM Agent For Data Science

Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, **lin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu

Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de… ▽ More Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks. The solution will be released at https://github.com/geekan/MetaGPT. △ Less

Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17292 [pdf, other]

DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

Authors: Wei**g Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie, Chunyan Miao

Abstract: Text-to-Avatar generation has recently made significant strides due to advancements in diffusion models. However, most existing work remains constrained by limited diversity, producing avatars with subtle differences in appearance for a given text prompt. We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D… ▽ More Text-to-Avatar generation has recently made significant strides due to advancements in diffusion models. However, most existing work remains constrained by limited diversity, producing avatars with subtle differences in appearance for a given text prompt. We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D avatars from a single text prompt. Different from most existing work that exploits scene-specific 3D representations such as NeRF, DivAvatar finetunes a 3D generative model (i.e., EVA3D), allowing diverse avatar generation from simply noise sampling in inference time. DivAvatar has two key designs that help achieve generation diversity and visual quality. The first is a noise sampling technique during training phase which is critical in generating diverse appearances. The second is a semantic-aware zoom mechanism and a novel depth loss, the former producing appearances of high textual fidelity by separate fine-tuning of specific body parts and the latter improving geometry quality greatly by smoothing the generated mesh in the features space. Extensive experiments show that DivAvatar is highly versatile in generating avatars of diverse appearances. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16567 [pdf, other]

Aligning Large Language Models to a Domain-specific Graph Database

Authors: Yuanyuan Liang, Keren Tan, Tingyu Xie, Wenbiao Tao, Siyuan Wang, Yunshi Lan, Weining Qian

Abstract: Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like… ▽ More Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nevertheless, when it comes to NL2GQL taskson a particular domain, the absence of domain-specific NL-GQL data pairs makes it difficult to establish alignment between LLMs and the graph DB. To address this challenge, we propose a well-defined pipeline. Specifically, we utilize ChatGPT to create NL-GQL data pairs based on the given graph DB with self-instruct. Then, we use the created data to fine-tune LLMs, thereby achieving alignment between LLMs and the graph DB. Additionally, during inference, we propose a method that extracts relevant schema to the queried NL as the input context to guide LLMs for generating accurate GQLs.We evaluate our method on two constructed datasets deriving from graph DBs in finance domain and medicine domain, namely FinGQL and MediGQL. Experimental results demonstrate that our method significantly outperforms a set of baseline methods, with improvements of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on EX, respectively. △ Less

Submitted 28 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 13 pages,2 figures

arXiv:2402.05067 [pdf, other]

A Novel Paradigm in Solving Multiscale Problems

Authors: **g Wang, Zheng Li, Pengyu Lai, Rui Wang, Di Yang, Dewu Yang, Hui Xu, Wen-Quan Tao

Abstract: Multiscale phenomena manifest across various scientific domains, presenting a ubiquitous challenge in accurately and effectively simulating multiscale dynamics in complex systems. In this paper, a novel decoupling solving paradigm is proposed through modelling large-scale dynamics independently and treating small-scale dynamics as a slaved system. A Spectral Physics-informed Neural Network (PINN)… ▽ More Multiscale phenomena manifest across various scientific domains, presenting a ubiquitous challenge in accurately and effectively simulating multiscale dynamics in complex systems. In this paper, a novel decoupling solving paradigm is proposed through modelling large-scale dynamics independently and treating small-scale dynamics as a slaved system. A Spectral Physics-informed Neural Network (PINN) is developed to characterize the small-scale system in an efficient and accurate way, addressing the challenges posed by the representation of multiscale dynamics in neural networks. The effectiveness of the method is demonstrated through extensive numerical experiments, including one-dimensional Kuramot-Sivashinsky equation, two- and three-dimensional Navier-Stokes equations, showcasing its versatility in addressing problems of fluid dynamics. Furthermore, we also delve into the application of the proposed approach to more complex problems, including non-uniform meshes, complex geometries, large-scale data with noise, and high-dimensional small-scale dynamics. The discussions about these scenarios contribute to a comprehensive understanding of the method's capabilities and limitations. By enabling the acquisition of large-scale data with minimal computational demands, coupled with the efficient and accurate characterization of small-scale dynamics via Spectral PINN, our approach offers a valuable and promising approach for researchers seeking to tackle multiscale phenomena effectively. △ Less

Submitted 30 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2401.13714 [pdf, other]

Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers

Authors: Wei Tao, Shenglin He, Kai Lu, Xiaoyang Qu, Guokuan Li, Jiguang Wan, Jianzong Wang, **g Xiao

Abstract: Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution lat… ▽ More Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution latency. A feasible solution to address this issue is mixed-precision quantization, but it faces the challenges of accuracy degradation and a time-consuming search time. In this paper, we propose QuantMCU, a novel patch-based inference method that utilizes value-driven mixed-precision quantization to reduce redundant computation. We first utilize value-driven patch classification (VDPC) to maintain the model accuracy. VDPC classifies patches into two classes based on whether they contain outlier values. For patches containing outlier values, we apply 8-bit quantization to the feature maps on the dataflow branches that follow. In addition, for patches without outlier values, we utilize value-driven quantization search (VDQS) on the feature maps of their following dataflow branches to reduce search time. Specifically, VDQS introduces a novel quantization search metric that takes into account both computation and accuracy, and it employs entropy as an accuracy representation to avoid additional training. VDQS also adopts an iterative approach to determine the bitwidth of each feature map to further accelerate the search process. Experimental results on real-world MCU devices show that QuantMCU can reduce computation by 2.2x on average while maintaining comparable model accuracy compared to the state-of-the-art patch-based inference methods. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted by the 27th Design, Automation and Test in Europe Conference (DATE 2024)

arXiv:2401.12751 [pdf, other]

PSDF: Prior-Driven Neural Implicit Surface Learning for Multi-view Reconstruction

Authors: Wanjuan Su, Chen Zhang, Qingshan Xu, Wenbing Tao

Abstract: Surface reconstruction has traditionally relied on the Multi-View Stereo (MVS)-based pipeline, which often suffers from noisy and incomplete geometry. This is due to that although MVS has been proven to be an effective way to recover the geometry of the scenes, especially for locally detailed areas with rich textures, it struggles to deal with areas with low texture and large variations of illumin… ▽ More Surface reconstruction has traditionally relied on the Multi-View Stereo (MVS)-based pipeline, which often suffers from noisy and incomplete geometry. This is due to that although MVS has been proven to be an effective way to recover the geometry of the scenes, especially for locally detailed areas with rich textures, it struggles to deal with areas with low texture and large variations of illumination where the photometric consistency is unreliable. Recently, Neural Implicit Surface Reconstruction (NISR) combines surface rendering and volume rendering techniques and bypasses the MVS as an intermediate step, which has emerged as a promising alternative to overcome the limitations of traditional pipelines. While NISR has shown impressive results on simple scenes, it remains challenging to recover delicate geometry from uncontrolled real-world scenes which is caused by its underconstrained optimization. To this end, the framework PSDF is proposed which resorts to external geometric priors from a pretrained MVS network and internal geometric priors inherent in the NISR model to facilitate high-quality neural implicit surface learning. Specifically, the visibility-aware feature consistency loss and depth prior-assisted sampling based on external geometric priors are introduced. These proposals provide powerfully geometric consistency constraints and aid in locating surface intersection points, thereby significantly improving the accuracy and delicate reconstruction of NISR. Meanwhile, the internal prior-guided importance rendering is presented to enhance the fidelity of the reconstructed surface mesh by mitigating the biased rendering issue in NISR. Extensive experiments on the Tanks and Temples dataset show that PSDF achieves state-of-the-art performance on complex uncontrolled scenes. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.08376 [pdf, other]

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

Authors: Wei Tao, Yucheng Zhou, Yanlin Wang, Hongyu Zhang, Haofen Wang, Wenqiang Zhang

Abstract: Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we disco… ▽ More Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods. Our source code and data are available at https://github.com/DeepSoftwareAnalytics/KADEL △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted to ACM Transactions on Software Engineering and Methodology 2024 (TOSEM'24)

arXiv:2401.00020 [pdf, other]

ShennongAlpha: an AI-driven sharing and collaboration platform for intelligent curation, acquisition, and translation of natural medicinal material knowledge

Authors: Zijie Yang, Yong**g Yin, Chaojun Kong, Tiange Chi, Wufan Tao, Yue Zhang, Tian Xu

Abstract: Natural Medicinal Materials (NMMs) have a long history of global clinical applications and a wealth of records and knowledge. Although NMMs are a major source for drug discovery and clinical application, the utilization and sharing of NMM knowledge face crucial challenges, including the standardized description of critical information, efficient curation and acquisition, and language barriers. To… ▽ More Natural Medicinal Materials (NMMs) have a long history of global clinical applications and a wealth of records and knowledge. Although NMMs are a major source for drug discovery and clinical application, the utilization and sharing of NMM knowledge face crucial challenges, including the standardized description of critical information, efficient curation and acquisition, and language barriers. To address these, we developed ShennongAlpha, an AI-driven sharing and collaboration platform for intelligent knowledge curation, acquisition, and translation. For standardized knowledge curation, the platform introduced a Systematic Nomenclature to enable accurate differentiation and identification of NMMs. More than fourteen thousand Chinese NMMs have been curated into the platform along with their knowledge. Furthermore, the platform pioneered chat-based knowledge acquisition, standardized machine translation, and collaborative knowledge updating. Together, our study represents the first major advance in leveraging AI to empower NMM knowledge sharing, which not only marks a novel application of AI for Science, but also will significantly benefit the global biomedical, pharmaceutical, physician, and patient communities. △ Less

Submitted 16 May, 2024; v1 submitted 27 December, 2023; originally announced January 2024.

Comments: 53 pages, 6 figures, 10 supplementary figures, 2 supplementary tables

arXiv:2312.11577 [pdf, other]

PR-NeuS: A Prior-based Residual Learning Paradigm for Fast Multi-view Neural Surface Reconstruction

Authors: Jianyao Xu, Qingshan Xu, Xinyao Liao, Wanjuan Su, Chen Zhang, Yew-Soon Ong, Wenbing Tao

Abstract: Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a p… ▽ More Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a prior-based residual learning paradigm for fast multi-view neural surface reconstruction. This paradigm consists of two optimization stages. In the first stage, we propose to leverage generalization models to generate a basis signed distance function (SDF) field. This initial field can be quickly obtained by fusing multiple local SDF fields produced by generalization models. This provides a coarse global geometry prior. Based on this prior, in the second stage, a fast residual learning strategy based on hash-encoding networks is proposed to encode an offset SDF field for the basis SDF field. Moreover, we introduce a prior-guided sampling scheme to help the residual learning stage converge better, and thus recover finer structures. With our designed paradigm, experimental results show that our method only takes about 3 minutes to reconstruct the surface of a single scene, while achieving competitive surface quality. Our code will be released upon publication. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.06682 [pdf, other]

Learning to Denoise Unreliable Interactions for Link Prediction on Biomedical Knowledge Graph

Authors: Tengfei Ma, Yujie Chen, Wen Tao, Dashun Zheng, Xuan Lin, Patrick Cheong-lao Pang, Yi** Liu, Yijun Wang, Bosheng Song, Xiangxiang Zeng

Abstract: Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However… ▽ More Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However, all these works only focus on improving the predictive performance without considering the inevitable noise and unreliable interactions existing in the KGs, which limits the development of KG-based computational methods. To address these limitations, we propose a Denoised Link Prediction framework, called DenoisedLP. DenoisedLP obtains reliable interactions based on the local subgraph by denoising noisy links in a learnable way, providing a universal module for mining underlying task-relevant relations. To collaborate with the smoothed semantic information, DenoisedLP introduces the semantic subgraph by blurring conflict relations around the predicted link. By maximizing the mutual information between the reliable structure and smoothed semantic relations, DenoisedLP emphasizes the informative interactions for predicting relation-specific links. Experimental results on real-world datasets demonstrate that DenoisedLP outperforms state-of-the-art methods on DTI and DDI prediction tasks, and verify the effectiveness and robustness of denoising unreliable interactions on the contaminated KGs. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.03053 [pdf, other]

DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration

Authors: Zhi Chen, Yufan Ren, Tong Zhang, Zheng Dang, Wenbing Tao, Sabine Süsstrunk, Mathieu Salzmann

Abstract: Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds. We propose formulating PCR as a denoising diffusion probabilistic process, map** noisy transformations to the ground truth. However, using diffusion models for PCR has nontrivial challenges, such as adapting a generative model to a discriminative task and leveraging the estimated nonlinear transf… ▽ More Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds. We propose formulating PCR as a denoising diffusion probabilistic process, map** noisy transformations to the ground truth. However, using diffusion models for PCR has nontrivial challenges, such as adapting a generative model to a discriminative task and leveraging the estimated nonlinear transformation from the previous step. Instead of training a diffusion model to directly map pure noise to ground truth, we map the predictions of an off-the-shelf PCR model to ground truth. The predictions of off-the-shelf models are often imperfect, especially in challenging cases where the two points clouds have low overlap, and thus could be seen as noisy versions of the real rigid transformation. In addition, we transform the rotation matrix into a spherical linear space for interpolation between samples in the forward process, and convert rigid transformations into auxiliary information to implicitly exploit last-step estimations in the reverse process. As a result, conditioned on time step, the denoising model adapts to the increasing accuracy across steps and refines registrations. Our extensive experiments showcase the effectiveness of our DiffusionPCR, yielding state-of-the-art registration recall rates (95.3%/81.6%) on 3DMatch and 3DLoMatch. The code will be made public upon publication. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.00843 [pdf, other]

Exploring the Robustness of Decentralized Training for Large Language Models

Authors: Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan Sun, Pan Zhou

Abstract: Decentralized training of large language models has emerged as an effective way to democratize this technology. However, the potential threats associated with this approach have not been carefully discussed, which would hinder the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from… ▽ More Decentralized training of large language models has emerged as an effective way to democratize this technology. However, the potential threats associated with this approach have not been carefully discussed, which would hinder the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from three main perspectives. First, we demonstrate the vulnerabilities inherent in decentralized training frameworks in terms of hardware, data, and models. Second, we highlight the fundamental difference between decentralized foundation model training and vanilla federated learning, where the security techniques employed in federated learning cannot be applied directly. Third, we discuss the essential components required for a robust and efficient decentralized training framework and present a case study by modeling a concrete threat model. Our objective in this vision paper is to emphasize the importance of addressing security concerns in the context of decentralized training for large language models. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 6 pages, 3 figures

arXiv:2312.00589 [pdf, other]

Merlin:Empowering Multimodal LLMs with Foresight Minds

Authors: En Yu, Liang Zhao, Yana Wei, **rong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Abstract: Humans possess the remarkable ability to foresee the future to a certain extent based on present observations, a skill we term as foresight minds. However, this capability remains largely under explored within existing Multimodal Large Language Models (MLLMs), hindering their capacity to learn the fundamental principles of how things operate and the intentions behind the observed subjects. To addr… ▽ More Humans possess the remarkable ability to foresee the future to a certain extent based on present observations, a skill we term as foresight minds. However, this capability remains largely under explored within existing Multimodal Large Language Models (MLLMs), hindering their capacity to learn the fundamental principles of how things operate and the intentions behind the observed subjects. To address this issue, we introduce the integration of future modeling into the existing learning frameworks of MLLMs. By utilizing the subject trajectory, a highly structured representation of a consecutive frame sequence, as a learning objective, we aim to bridge the gap between the past and the future. We propose two innovative methods to empower MLLMs with foresight minds, Foresight Pre-Training (FPT) and Foresight Instruction-Tuning (FIT), which are inspired by the modern learning paradigm of LLMs. Specifically, FPT jointly training various tasks centered on trajectories, enabling MLLMs to learn how to attend and predict entire trajectories from a given initial observation. Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them. Aided by FPT and FIT, we build a novel and unified MLLM named Merlin that supports multi-images input and analysis about potential actions of multiple objects for the future reasoning. Experimental results show Merlin powerful foresight minds with impressive performance on both future reasoning and visual comprehension tasks. △ Less

Submitted 3 July, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: Accepted by ECCV2024. Project page: https://ahnsun.github.io/merlin

arXiv:2310.17500 [pdf, other]

Decay constants of $c \bar b$ mesons involving the ten heavy flavor-changing currents at N$^3$LO QCD

Authors: Wei Tao, Zhen-Jun Xiao

Abstract: Within the nonrelativistic QCD (NRQCD) framework, we complete the three-loop calculations of the NRQCD renormalization constants and the matching coefficients, for the heavy flavor-changing temporal vector, spatial-spatial tensor, spatial-temporal axial-tensor currents, which are coupled to the $P$-wave $c\bar b$ mesons. We further study the ten decay constants for the $c\bar b$ mesons (… ▽ More Within the nonrelativistic QCD (NRQCD) framework, we complete the three-loop calculations of the NRQCD renormalization constants and the matching coefficients, for the heavy flavor-changing temporal vector, spatial-spatial tensor, spatial-temporal axial-tensor currents, which are coupled to the $P$-wave $c\bar b$ mesons. We further study the ten decay constants for the $c\bar b$ mesons ($B_c,B_c^*,B_{c0}^*,B_{c1}$) coupled with the ten heavy flavor-changing currents involving (pseudo-)scalar, (axial-)vector and (axial-)tensor up to the next-to-next-to-next-to-leading order (N$^3$LO) of $α_s$. We obtain the six ratios of decay constants by approximating them to the corresponding ratios of matching coefficients. We find the N$^3$LO QCD corrections to the six ratios of decay constants have good convergence and weak scale-dependence. We finally predict the hierarchical relationship among the ten decay constants for the $c\bar b$ mesons coupled with the ten currents. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.11649 [pdf, other]

Three-loop matching of heavy flavor-changing (axial-)tensor currents

Authors: Wei Tao, Zhen-Jun Xiao

Abstract: We present the three-loop calculations of the nonrelativistic QCD (NRQCD) current renormalization constants and corresponding anomalous dimensions, and the matching coefficients for the spatial-temporal tensor and spatial-spatial axial-tensor currents with two different heavy quark masses. We obtain the convergent decay constant ratio up to the next-to-next-to-next-to-leading order (N$^3$LO) for t… ▽ More We present the three-loop calculations of the nonrelativistic QCD (NRQCD) current renormalization constants and corresponding anomalous dimensions, and the matching coefficients for the spatial-temporal tensor and spatial-spatial axial-tensor currents with two different heavy quark masses. We obtain the convergent decay constant ratio up to the next-to-next-to-next-to-leading order (N$^3$LO) for the $S$-wave vector meson $B_c^*$ involving the tensor and axial-tensor currents. We obtain the three-loop finite ($ε^0$) term in the ratio of the QCD heavy flavor-changing tensor current renormalization constant in the on-shell ($\mathrm{OS}$) scheme to that in the modified-minimal-subtraction ($\mathrm{\overline{MS}}$) scheme, which is helpful to obtain the three-loop matching coefficients for all heavy flavor-changing (axial-)tensor currents. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.07997 [pdf, other]

PG-NeuS: Robust and Efficient Point Guidance for Multi-View Neural Surface Reconstruction

Authors: Chen Zhang, Wanjuan Su, Qingshan Xu, Wenbing Tao

Abstract: Recently, learning multi-view neural surface reconstruction with the supervision of point clouds or depth maps has been a promising way. However, due to the underutilization of prior information, current methods still struggle with the challenges of limited accuracy and excessive time complexity. In addition, prior data perturbation is also an important but rarely considered issue. To address thes… ▽ More Recently, learning multi-view neural surface reconstruction with the supervision of point clouds or depth maps has been a promising way. However, due to the underutilization of prior information, current methods still struggle with the challenges of limited accuracy and excessive time complexity. In addition, prior data perturbation is also an important but rarely considered issue. To address these challenges, we propose a novel point-guided method named PG-NeuS, which achieves accurate and efficient reconstruction while robustly co** with point noise. Specifically, aleatoric uncertainty of the point cloud is modeled to capture the distribution of noise, leading to noise robustness. Furthermore, a Neural Projection module connecting points and images is proposed to add geometric constraints to implicit surface, achieving precise point guidance. To better compensate for geometric bias between volume rendering and point modeling, high-fidelity points are filtered into a Bias Network to further improve details representation. Benefiting from the effective point guidance, even with a lightweight network, the proposed PG-NeuS achieves fast convergence with an impressive 11x speedup compared to NeuS. Extensive experiments show that our method yields high-quality surfaces with high efficiency, especially for fine-grained details and smooth regions, outperforming the state-of-the-art methods. Moreover, it exhibits strong robustness to noisy data and sparse data. △ Less

Submitted 25 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04210 [pdf, other]

Probing electronic-vibrational dynamics of N2+ induced by strong-field ionization

Authors: Qian Zhang, **g Zhao, Guangru Bai, Bin Zhang, Wenkai Tao, Qianyu Qiu, Hongbin Lei, Yue Lang, **lei Liu, Xiaowei Wang, Zengxiu Zhao

Abstract: The coupled electronic-vibrational dynamics of nitrogen ions induced by strong-field ionization is investigated theoretically to corroborate the recent transient X-ray K-edge absorption experiment [PRL 129, 123002 (2022)], where the population distribution of three electronic states in air lasing of N2+ was determined for the first time. By extending the ionization-coupling model to include the tr… ▽ More The coupled electronic-vibrational dynamics of nitrogen ions induced by strong-field ionization is investigated theoretically to corroborate the recent transient X-ray K-edge absorption experiment [PRL 129, 123002 (2022)], where the population distribution of three electronic states in air lasing of N2+ was determined for the first time. By extending the ionization-coupling model to include the transient absorption, we successfully reproduce the time-resolved X-ray absorption spectra of nitrogen ions observed in the experiment. By identifying the contributions from different electronic states, the study provides different interpretation revealing the significant role of excited state A arising from the strong coupling between vibrational states in strong laser fields. It indicates that the electronic population inversion occurs at least for certain alignment of nitrogen molecules. The theory helps uncovering new features of absorption from forbidden transitions during ionization and confirming that the vibration coherence at each electronic channel induces the modulation of absorbance after strong field ionization. A new scheme is proposed to determine the population transfer at different probing geometry to avoid the spectral overlap. This work offers valuable insights into the intricate interplay between electronic and vibrational dynamics and helps to resolve the debate on nitrogen air lasing. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.11938 [pdf]

Giant photon-drag-induced ultrafast photocurrent in diamond for nonlinear photonics

Authors: Xinyi Xue, Wanyi Du, Wei Tao, Yuanyuan Huang, Zhen Lei, Lipeng Zhu, Yuxiao Zou, Ying Liu, Gangqin Liu, Changzhi Gu, Yunliang Li, Baogang Quan, Xinlong Xu

Abstract: Diamond is emerging as an attractive third-generation wide-bandgap semiconductor for future on-chip nonlinear photonics and quantum optics due to its unique thermal, optical, and mechanical properties. However, the light-driven current under below-bandgap excitation from the second-order nonlinear optical effect in diamond is still challenging. Herein, a giant second-order nonlinear photocurrent i… ▽ More Diamond is emerging as an attractive third-generation wide-bandgap semiconductor for future on-chip nonlinear photonics and quantum optics due to its unique thermal, optical, and mechanical properties. However, the light-driven current under below-bandgap excitation from the second-order nonlinear optical effect in diamond is still challenging. Herein, a giant second-order nonlinear photocurrent is observed in the chemical vapor deposition (CVD) diamond by utilizing terahertz (THz) emission spectroscopy. This ultrafast photocurrent originates from the photon drag effect (PDE), during which the momentum transfer from the incident photons to the charge carriers at the rich grain boundaries of the CVD diamond after the exclusive subgap π-π* transition upon femtosecond laser excitation. Especially, the interplay between circular and linear PDE to the THz generation has been clarified and distinguished under elliptically polarized light excitation. Furthermore, the picosecond ultrafast dynamics of these charge carriers are also verified by the infrared spectroscopy. Owing to the giant photon-drag-induced ultrafast photocurrent, the CVD diamond presents the highest THz emission efficiency compared with the reported carbon allotropes, which expands the new functionality of diamond nonlinear photonics into on-chip THz devices. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 25 pages,5 figures, article

arXiv:2308.00333 [pdf]

doi 10.1088/1361-6528/acebf7

Performance benchmarking of an ultra-low vibration laboratory to host a commercial millikelvin scanning tunnelling microscope

Authors: Yande Que, Amit Kumar, Michael S. Lodge, Zhengjue Tong, Marcus Lai Kar Fai, Wei Tao, Zhenhao Cui, Ranjith Shivajirao, Junxiang Jia, Siew Eang Lee, Bent Weber

Abstract: Ultra-low temperature scanning tunnelling microscopy and spectroscopy (STM/STS) achieved by dilution refrigeration can provide unrivalled insight into the local electronic structure of quantum materials and atomic-scale quantum systems. Effective isolation from mechanical vibration and acoustic noise is critical in order to achieve ultimate spatial and energy resolution. Here, we report on the des… ▽ More Ultra-low temperature scanning tunnelling microscopy and spectroscopy (STM/STS) achieved by dilution refrigeration can provide unrivalled insight into the local electronic structure of quantum materials and atomic-scale quantum systems. Effective isolation from mechanical vibration and acoustic noise is critical in order to achieve ultimate spatial and energy resolution. Here, we report on the design and performance of an ultra-low vibration (ULV) laboratory hosting a customized but otherwise commercially available 40mK STM. The design of the vibration isolation consists of a T-shaped concrete mass block (55t), suspended by actively controlled pneumatic springs, and placed on a foundation separated from the surrounding building in a "room-within-a-room" design. Vibration levels achieved are meeting the VC-M vibration standard at >3 Hz, reached only in a limited number of laboratories worldwide. Measurement of the STM's junction noise confirms effective vibration isolation on par with custom built STMs in ULV laboratories. In this tailored low-vibration environment, the STM achieves an energy resolution of 43ueV (144 mK), promising for the investigation and control of quantum matter at atomic length scales. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.12333

An axiomatized PDE model of deep neural networks

Authors: Tangjun Wang, Wenqi Tao, Chenglong Bao, Zuoqiang Shi

Abstract: Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This… ▽ More Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This convection-diffusion equation model gives mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we design a new training method for ResNets. Experiments validate the performance of the proposed method. △ Less

Submitted 22 March, 2024; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: The experiment design in the paper lacks careful thought and may be misleading in demonstrating our contribution

arXiv:2306.14137 [pdf]

doi 10.1109/LRA.2024.3359548

BotanicGarden: A High-Quality Dataset for Robot Navigation in Unstructured Natural Environments

Authors: Yuanzhi Liu, Yujia Fu, Minghui Qin, Yufeng Xu, Baoxin Xu, Fengdong Chen, Bart Goossens, Poly Z. H. Sun, Hongwei Yu, Chun Liu, Long Chen, Wei Tao, Hui Zhao

Abstract: The rapid developments of mobile robotics and autonomous navigation over the years are largely empowered by public datasets for testing and upgrading, such as sensor odometry and SLAM tasks. Impressive demos and benchmark scores have arisen, which may suggest the maturity of existing navigation techniques. However, these results are primarily based on moderate structured scenario testing. When tra… ▽ More The rapid developments of mobile robotics and autonomous navigation over the years are largely empowered by public datasets for testing and upgrading, such as sensor odometry and SLAM tasks. Impressive demos and benchmark scores have arisen, which may suggest the maturity of existing navigation techniques. However, these results are primarily based on moderate structured scenario testing. When transitioning to challenging unstructured environments, especially in GNSS-denied, texture-monotonous, and dense-vegetated natural fields, their performance can hardly sustain at a high level and requires further validation and improvement. To bridge this gap, we build a novel robot navigation dataset in a luxuriant botanic garden of more than 48000m2. Comprehensive sensors are used, including Gray and RGB stereo cameras, spinning and MEMS 3D LiDARs, and low-cost and industrial-grade IMUs, all of which are well calibrated and hardware-synchronized. An all-terrain wheeled robot is employed for data collection, traversing through thick woods, riversides, narrow trails, bridges, and grasslands, which are scarce in previous resources. This yields 33 short and long sequences, forming 17.1km trajectories in total. Excitedly, both highly-accurate ego-motions and 3D map ground truth are provided, along with fine-annotated vision semantics. We firmly believe that our dataset can advance robot navigation and sensor fusion research to a higher level. △ Less

Submitted 2 March, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

Comments: This article has been accepted for publication in IEEE Robotics and Automation Letters

arXiv:2306.07075 [pdf]

Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence

Authors: John J. Nay, David Karamardian, Sarah B. Lawsky, Wenting Tao, Meghana Bhat, Raghav Jain, Aaron Travis Lee, Jonathan H. Choi, Jungo Kasai

Abstract: Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence, and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines… ▽ More Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence, and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and utilising the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question-answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.16721 [pdf, other]

doi 10.3847/1538-4357/acde58

Asteroseismic investigation on KIC 10526294 to probe convective core overshoot mixing

Authors: Qian-Sheng Zhang, Li Yan, Wu Tao, Jiang Chen

Abstract: In the overshoot mixing model with an exponentially decreasing diffusion coefficient, the initial value of the diffusion coefficient plays a crucial role. According to the turbulent convective mixing model, the characteristic length of convection in the convection zone differs from that in the overshoot region, resulting in a rapid decrease of the diffusion coefficient near the convective boundary… ▽ More In the overshoot mixing model with an exponentially decreasing diffusion coefficient, the initial value of the diffusion coefficient plays a crucial role. According to the turbulent convective mixing model, the characteristic length of convection in the convection zone differs from that in the overshoot region, resulting in a rapid decrease of the diffusion coefficient near the convective boundary. To investigate this quick decrease, we conducted an asteroseismic study on the intermediate-mass SPB star KIC 10526294. We generated stellar models with varied input parameters, including the overshoot parameters, and compared the resulting stellar oscillation periods with observations. To mitigate the potential issue arising from large steps in the stellar parameters and stellar age, we employed a comprehensive interpolation scheme for the stellar oscillatory frequencies, considering all stellar parameters and stellar age. Our analysis revealed that the quick decreasing of the diffusion coefficient has discernible effects on the stellar oscillations and a quick decrease with 4 magnitude orders shows the best oscillatory frequencies compared with the observations. This provides weak evidence in support of the prediction made by the turbulent convective mixing model. Furthermore, we examined the residuals of the oscillation periods and discovered a potential association between abundance anomalies in the buoyancy frequency profile and the oscillation-like patterns observed in the residuals. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 16 pages, 9 figures, accepted for publication in ApJ

arXiv:2305.14298 [pdf, other]

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking

Authors: En Yu, Tiancai Wang, Zhuoling Li, Yuang Zhang, Xiangyu Zhang, Wenbing Tao

Abstract: Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics. While MOTRv2 partly addresses this problem, it demands an additional detection network for assistance. In this work, we serve as the first to reveal that this conflict arises from the unfair… ▽ More Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics. While MOTRv2 partly addresses this problem, it demands an additional detection network for assistance. In this work, we serve as the first to reveal that this conflict arises from the unfair label assignment between detect queries and track queries during training, where these detect queries recognize targets and track queries associate them. Based on this observation, we propose MOTRv3, which balances the label assignment process using the developed release-fetch supervision strategy. In this strategy, labels are first released for detection and gradually fetched back for association. Besides, another two strategies named pseudo label distillation and track group denoising are designed to further improve the supervision for detection and association. Without the assistance of an extra detection network during inference, MOTRv3 achieves impressive performance across diverse benchmarks, e.g., MOT17, DanceTrack. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.07858 [pdf, other]

Cold-Start based Multi-Scenario Ranking Model for Click-Through Rate Prediction

Authors: Peilin Chen, Hong Wen, **g Zhang, Fuyu Lv, Zhao Li, Qijie Shen, Wanjie Tao, Ying Zhou, Chao Zhang

Abstract: Online travel platforms (OTPs), e.g., Ctrip.com or Fliggy.com, can effectively provide travel-related products or services to users. In this paper, we focus on the multi-scenario click-through rate (CTR) prediction, i.e., training a unified model to serve all scenarios. Existing multi-scenario based CTR methods struggle in the context of OTP setting due to the ignorance of the cold-start users who… ▽ More Online travel platforms (OTPs), e.g., Ctrip.com or Fliggy.com, can effectively provide travel-related products or services to users. In this paper, we focus on the multi-scenario click-through rate (CTR) prediction, i.e., training a unified model to serve all scenarios. Existing multi-scenario based CTR methods struggle in the context of OTP setting due to the ignorance of the cold-start users who have very limited data. To fill this gap, we propose a novel method named Cold-Start based Multi-scenario Network (CSMN). Specifically, it consists of two basic components including: 1) User Interest Projection Network (UIPN), which firstly purifies users' behaviors by eliminating the scenario-irrelevant information in behaviors with respect to the visiting scenario, followed by obtaining users' scenario-specific interests by summarizing the purified behaviors with respect to the target item via an attention mechanism; and 2) User Representation Memory Network (URMN), which benefits cold-start users from users with rich behaviors through a memory read and write mechanism. CSMN seamlessly integrates both components in an end-to-end learning framework. Extensive experiments on real-world offline dataset and online A/B test demonstrate the superiority of CSMN over state-of-the-art methods. △ Less

Submitted 16 April, 2023; originally announced April 2023.

Comments: accepted by DASFAA'23 as a Research Paper

arXiv:2303.07220 [pdf, other]

doi 10.1007/JHEP05(2023)189

Three-loop matching coefficients for heavy flavor-changing currents and the phenomenological applications

Authors: Wei Tao, Zhen-Jun Xiao, Ruilin Zhu

Abstract: Within the framework of non-relativistic QCD (NRQCD) factorization, we compute the matching coefficients between full Quantum Chromodynamics (QCD) and NRQCD for the heavy flavor-changing vector, axial-vector, scalar and pseudo-scalar currents up to next-to-next-to-next-to-leading order (N$^3$LO). We accomplish the analytical expressions for the three-loop renormalization constants and the correspo… ▽ More Within the framework of non-relativistic QCD (NRQCD) factorization, we compute the matching coefficients between full Quantum Chromodynamics (QCD) and NRQCD for the heavy flavor-changing vector, axial-vector, scalar and pseudo-scalar currents up to next-to-next-to-next-to-leading order (N$^3$LO). We accomplish the analytical expressions for the three-loop renormalization constants and the corresponding anomalous dimensions for all of the four NRQCD currents with two different heavy flavors. The three-loop QCD corrections to the matching coefficients turn out to be significantly larger than lower order corrections. By employing the scale relation, we obtain the N$^3$LO corrections to the wave functions at the origin for the vector $B_c^*$ meson and the pseudo-scalar $B_c$ meson from the known result for the equal-mass heavy quarkonium in potential NRQCD (pNRQCD). We find large cancellations at the third order between the matching coefficients and the wave functions at the origin, and obtain the convergent decay constants of $B^*_{c}$ and $B_{c}$ up to N$^3$LO. We present the complete perturbative QCD predictions for the decay constants, leptonic decay widths, and branching ratios of the beauty-charmed mesons. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2301.00220

arXiv:2303.02692

Convergent decay constants for beauty-charm mesons at N$^3$LO in QCD and phenomenological implications

Authors: Wei Tao, Zhen-Jun Xiao, Ruilin Zhu

Abstract: We show the three-loop calculation of the matching coefficients $\mathcal{C}_J$ for the four heavy flavor-changing currents in nonrelativistic Quantum Chromodynamics effective theory and confirm the surprising observation of the nonconvergence behaviors of the matching coefficients for beauty-charm mesons. We demonstrate that the higher-order correction to the wave function at origin can solve the… ▽ More We show the three-loop calculation of the matching coefficients $\mathcal{C}_J$ for the four heavy flavor-changing currents in nonrelativistic Quantum Chromodynamics effective theory and confirm the surprising observation of the nonconvergence behaviors of the matching coefficients for beauty-charm mesons. We demonstrate that the higher-order correction to the wave function at origin can solve the nonconvergence problem. By employing the scaling relation, we simultaneously obtain the decay constants for beauty-charm mesons up to the three-loop accuracy and the hyperfine mass splitting in agreement with the experimental data. The perturbative predictions for the leptonic decays of the beauty-charm mesons shall provide a guide in ongoing and future precision heavy flavor experiments. △ Less

Submitted 17 October, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

Comments: The content of this paper arXiv:2303.02692 substantially overlaps with that of the paper arXiv:2303.07220, the latter of which has already been published in the JHEP journal (JHEP 05 (2023) 189)

arXiv:2302.05772 [pdf, other]

Set-Asides in USDA Food Procurement Auctions

Authors: Ni Yan, WenTing Tao

Abstract: We study the partial and full set-asides and their implication for changes in bidding behavior in first-price sealed-bid auctions in the context of United States Department of Agriculture (USDA) food procurement auctions. Using five years of bid data on different beef products, we implement weighted least squares regression models to show that partial set-aside predicts decreases in both offer pri… ▽ More We study the partial and full set-asides and their implication for changes in bidding behavior in first-price sealed-bid auctions in the context of United States Department of Agriculture (USDA) food procurement auctions. Using five years of bid data on different beef products, we implement weighted least squares regression models to show that partial set-aside predicts decreases in both offer prices and winning prices among large and small business bidders. Full set-aside predicts a small increase in offer prices and winning prices among small businesses. With these predictions, we infer that net profit of small businesses is unlikely to increase when set-asides are present. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.05027 [pdf, other]

Deep Seam Prediction for Image Stitching Based on Selection Consistency Loss

Authors: Senmao Cheng, Fan Yang, Zhi Chen, Nanjun Yuan, Wenbing Tao

Abstract: Image stitching is to construct panoramic images with wider field of vision (FOV) from some images captured from different viewing positions. To solve the problem of fusion ghosting in the stitched image, seam-driven methods avoid the misalignment area to fuse images by predicting the best seam. Currently, as standard tools of the OpenCV library, dynamic programming (DP) and GraphCut (GC) are stil… ▽ More Image stitching is to construct panoramic images with wider field of vision (FOV) from some images captured from different viewing positions. To solve the problem of fusion ghosting in the stitched image, seam-driven methods avoid the misalignment area to fuse images by predicting the best seam. Currently, as standard tools of the OpenCV library, dynamic programming (DP) and GraphCut (GC) are still the only commonly used seam prediction methods despite the fact that they were both proposed two decades ago. However, GC can get excellent seam quality but poor real-time performance while DP method has good efficiency but poor seam quality. In this paper, we propose a deep learning based seam prediction method (DSeam) for the sake of high seam quality with high efficiency. To overcome the difficulty of the seam description in network and no GroundTruth for training we design a selective consistency loss combining the seam shape constraint and seam quality constraint to supervise the network learning. By the constraint of the selection of consistency loss, we implicitly defined the mask boundaries as seams and transform seam prediction into mask prediction. To our knowledge, the proposed DSeam is the first deep learning based seam prediction method for image stitching. Extensive experimental results well demonstrate the superior performance of our proposed Dseam method which is 15 times faster than the classic GC seam prediction method in OpenCV 2.4.9 with similar seam quality. △ Less

Submitted 26 June, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.11546 [pdf, other]

Adapting Step-size: A Unified Perspective to Analyze and Improve Gradient-based Methods for Adversarial Attacks

Authors: Wei Tao, Lei Bao, Sheng Long, Gaowei Wu, Qing Tao

Abstract: Learning adversarial examples can be formulated as an optimization problem of maximizing the loss function with some box-constraints. However, for solving this induced optimization problem, the state-of-the-art gradient-based methods such as FGSM, I-FGSM and MI-FGSM look different from their original methods especially in updating the direction, which makes it difficult to understand them and then… ▽ More Learning adversarial examples can be formulated as an optimization problem of maximizing the loss function with some box-constraints. However, for solving this induced optimization problem, the state-of-the-art gradient-based methods such as FGSM, I-FGSM and MI-FGSM look different from their original methods especially in updating the direction, which makes it difficult to understand them and then leaves some theoretical issues to be addressed in viewpoint of optimization. In this paper, from the perspective of adapting step-size, we provide a unified theoretical interpretation of these gradient-based adversarial learning methods. We show that each of these algorithms is in fact a specific reformulation of their original gradient methods but using the step-size rules with only current gradient information. Motivated by such analysis, we present a broad class of adaptive gradient-based algorithms based on the regular gradient methods, in which the step-size strategy utilizing information of the accumulated gradients is integrated. Such adaptive step-size strategies directly normalize the scale of the gradients rather than use some empirical operations. The important benefit is that convergence for the iterative algorithms is guaranteed and then the whole optimization process can be stabilized. The experiments demonstrate that our AdaI-FGM consistently outperforms I-FGSM and AdaMI-FGM remains competitive with MI-FGSM for black-box attacks. △ Less

Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.00220 [pdf, other]

doi 10.1140/epjc/s10052-023-11442-w

Three-loop QCD matching of the flavor-changing scalar current involving the heavy charm and bottom quark

Authors: Wei Tao, Ruilin Zhu, Zhen-Jun Xiao

Abstract: We compute the matching coefficient between the quantum chromodynamics (QCD) and the non-relativistic QCD ( NRQCD) for the flavor-changing scalar current involving the heavy charm and bottom quark, up to the three-loop order within the NRQCD factorization. For the first time, we obtain the analytical expressions for the three-loop renormalization constant $\tilde{Z}_s(x,R_f)$ and the corresponding… ▽ More We compute the matching coefficient between the quantum chromodynamics (QCD) and the non-relativistic QCD ( NRQCD) for the flavor-changing scalar current involving the heavy charm and bottom quark, up to the three-loop order within the NRQCD factorization. For the first time, we obtain the analytical expressions for the three-loop renormalization constant $\tilde{Z}_s(x,R_f)$ and the corresponding anomalous dimension $\tildeγ_s(x,R_f)$ for the NRQCD scalar current with the two heavy bottom and charm quark. We present the precise numerical results for those relevant coefficients $(C_{FF}(x_0), \cdots, C_{FBB}(x_0))$ with an accuracy of about thirty digits. The three-loop QCD correction turns out to be significantly large. The obtained matching coefficient $C_s(μ_f,μ,m_b,m_c)$ is helpful to analyze the threshold behaviours when two different heavy quarks are close to each other and form the double heavy $B_c$ mesons. △ Less

Submitted 3 January, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: The projector is updated

arXiv:2212.01568 [pdf, other]

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Authors: En Yu, Songtao Liu, Zhuoling Li, **rong Yang, Zeming li, Shoudong Han, Wenbing Tao

Abstract: Although existing multi-object tracking (MOT) algorithms have obtained competitive performance on various benchmarks, almost all of them train and validate models on the same domain. The domain generalization problem of MOT is hardly studied. To bridge this gap, we first draw the observation that the high-level information contained in natural language is domain invariant to different tracking dom… ▽ More Although existing multi-object tracking (MOT) algorithms have obtained competitive performance on various benchmarks, almost all of them train and validate models on the same domain. The domain generalization problem of MOT is hardly studied. To bridge this gap, we first draw the observation that the high-level information contained in natural language is domain invariant to different tracking domains. Based on this observation, we propose to introduce natural language representation into visual MOT models for boosting the domain generalization ability. However, it is infeasible to label every tracking target with a textual description. To tackle this problem, we design two modules, namely visual context prompting (VCP) and visual-language mixing (VLM). Specifically, VCP generates visual prompts based on the input frames. VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes. Through training models on MOT17 and validating them on MOT20, we observe that the pseudo textual descriptions generated by our proposed modules improve the generalization performance of query-based trackers by large margins. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI2023

arXiv:2209.15521 [pdf, other]

doi 10.1103/PhysRevD.106.114037

Next-to-next-to-leading order matching of beauty-charmed meson $B_{c}$ and $B^*_{c}$ decay constants

Authors: Wei Tao, Ruilin Zhu, Zhen-Jun Xiao

Abstract: We present the next-to-next-to-leading order (NNLO) QCD corrections to the decay constants for both the pseudoscalar and vector beauty-charmed mesons $B_{c}$ and $B^*_{c}$ in nonrelativistic QCD effective theory. Explicit NNLO calculation verified that the $B_c$ decay constant from pseudoscalar current is identical with the $B_c$ decay constant from axial-vector current. The NNLO result for the ve… ▽ More We present the next-to-next-to-leading order (NNLO) QCD corrections to the decay constants for both the pseudoscalar and vector beauty-charmed mesons $B_{c}$ and $B^*_{c}$ in nonrelativistic QCD effective theory. Explicit NNLO calculation verified that the $B_c$ decay constant from pseudoscalar current is identical with the $B_c$ decay constant from axial-vector current. The NNLO result for the vector decay constant of $B^*_{c}$ meson is novel. Combined with the latest extraction of nonrelativistic QCD long-distance matrix elements of $B_c$ meson, we give the branching ratios of leptonic decays of $B_{c}$ and $B^*_{c}$ mesons. In addition, the novel anomalous dimension for the flavor-changing heavy quark vector current in nonrelativistic QCD effective theory are helpful to investigate the threshold behaviours of two different heavy quarks. △ Less

Submitted 5 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 9 pages, 7 figures, 6 tables; the inputting values of LDMEs are changed with new references and then the phenomenological results are also updated; some comments added; several typos corrected

arXiv:2209.14713 [pdf, ps, other]

The Heisenberg double of the quantum Euclidean group and its representations

Authors: Wenqing Tao

Abstract: The Heisenberg double $D_q(E_2)$ of the quantum Euclidean group $\mathcal{O}_q(E_2)$ is the smash product of $\mathcal{O}_q(E_2)$ with its Hopf dual $U_q(\mathfrak{e}_2)$. For the algebra $D_q(E_2)$, explicit descriptions of its prime, primitive, and maximal spectra are obtained. All prime factors of $D_q(E_2)$ are presented as generalized Weyl algebras. As a result, we obtain that the algebra… ▽ More The Heisenberg double $D_q(E_2)$ of the quantum Euclidean group $\mathcal{O}_q(E_2)$ is the smash product of $\mathcal{O}_q(E_2)$ with its Hopf dual $U_q(\mathfrak{e}_2)$. For the algebra $D_q(E_2)$, explicit descriptions of its prime, primitive, and maximal spectra are obtained. All prime factors of $D_q(E_2)$ are presented as generalized Weyl algebras. As a result, we obtain that the algebra $D_q(E_2)$ has no finite-dimensional representations, and that $D_q(E_2)$ cannot have a Hopf algebra structure. The automorphism groups of the quantum Euclidean group and its Heisenberg double are determined. Some centralizers are explicitly described via generators and defining relations. This enables us to give a classification of simple weight modules, and the so-called $a$-weight modules, over the algebra $D_q(E_2)$. △ Less

Submitted 14 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: 25 pages, to be published in SCIENCE CHINA Mathematics

MSC Class: 16S40; 16D25; 16D60; 16W20; 20G42

arXiv:2209.10422 [pdf, ps, other]

doi 10.1038/s41467-022-33676-0

Tuning the Many-body Interactions in a Helical Luttinger Liquid

Authors: Junxiang Jia, Elizabeth Marcellina, Anirban Das, Michael S. Lodge, BaoKai Wang, Duc Quan Ho, Riddhi Biswas, Tuan Anh Pham, Wei Tao, Cheng-Yi Huang, Hsin Lin, Arun Bansil, Shantanu Mukherjee, Bent Weber

Abstract: In one-dimensional (1D) systems, electronic interactions lead to a breakdown of Fermi liquid theory and the formation of a Tomonaga-Luttinger Liquid (TLL). The strength of its many-body correlations can be quantified by a single dimensionless parameter, the Luttinger parameter $K$, characterising the competition between the electrons' kinetic and electrostatic energies. Recently, signatures of a T… ▽ More In one-dimensional (1D) systems, electronic interactions lead to a breakdown of Fermi liquid theory and the formation of a Tomonaga-Luttinger Liquid (TLL). The strength of its many-body correlations can be quantified by a single dimensionless parameter, the Luttinger parameter $K$, characterising the competition between the electrons' kinetic and electrostatic energies. Recently, signatures of a TLL have been reported for the topological edge states of quantum spin Hall (QSH) insulators, strictly 1D electronic structures with linear (Dirac) dispersion and spin-momentum locking. Here we show that the many-body interactions in such helical Luttinger Liquid can be effectively controlled by the edge state's dielectric environment. This is reflected in a tunability of the Luttinger parameter $K$, distinct on different edges of the crystal, and extracted to high accuracy from the statistics of tunnelling spectra at tens of tunneling points. The interplay of topology and many-body correlations in 1D helical systems has been suggested as a potential avenue towards realising non-Abelian parafermions. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2208.10976 [pdf, other]

Quality Matters: Embracing Quality Clues for Robust 3D Multi-Object Tracking

Authors: **rong Yang, En Yu, Zeming Li, ** Li, Wenbing Tao

Abstract: 3D Multi-Object Tracking (MOT) has achieved tremendous achievement thanks to the rapid development of 3D object detection and 2D MOT. Recent advanced works generally employ a series of object attributes, e.g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT. However, these cues may not be reliable due to some visual noise, such as occlusion and blur, le… ▽ More 3D Multi-Object Tracking (MOT) has achieved tremendous achievement thanks to the rapid development of 3D object detection and 2D MOT. Recent advanced works generally employ a series of object attributes, e.g., position, size, velocity, and appearance, to provide the clues for the association in 3D MOT. However, these cues may not be reliable due to some visual noise, such as occlusion and blur, leading to tracking performance bottleneck. To reveal the dilemma, we conduct extensive empirical analysis to expose the key bottleneck of each clue and how they correlate with each other. The analysis results motivate us to efficiently absorb the merits among all cues, and adaptively produce an optimal tacking manner. Specifically, we present Location and Velocity Quality Learning, which efficiently guides the network to estimate the quality of predicted object attributes. Based on these quality estimations, we propose a quality-aware object association (QOA) strategy to leverage the quality score as an important reference factor for achieving robust association. Despite its simplicity, extensive experiments indicate that the proposed strategy significantly boosts tracking performance by 2.2% AMOTA and our method outperforms all existing state-of-the-art works on nuScenes by a large margin. Moreover, QTrack achieves 48.0% and 51.1% AMOTA tracking performance on the nuScenes validation and test sets, which significantly reduces the performance gap between pure camera and LiDAR based trackers. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2208.03941 [pdf, other]

Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

Authors: Xin Liu, Wei Tao, Wei Li, Dazhi Zhan, Jun Wang, Zhisong Pan

Abstract: Due to its simplicity and efficiency, the first-order gradient method has been extensively employed in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum during training over-parameterized neural networks, where the number of parameters is significantly larg… ▽ More Due to its simplicity and efficiency, the first-order gradient method has been extensively employed in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum during training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including the heavy ball (HB) method and Nesterov's accelerated gradient (NAG) method, are the workhorse of first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits superior performance than HB. However, current theoretical works fail to distinguish their convergence difference in training neural networks. To fill this gap, we consider the training problem of the two-layer ReLU neural network under over-parameterization and random initialization. Leveraging high-resolution dynamical systems and neural tangent kernel (NTK) theory, our result not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also provides the first theoretical guarantee for the acceleration of NAG over HB in training neural networks. Finally, we validate our theoretical results on three benchmark datasets. △ Less

Submitted 8 May, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: 16 pages, accepted to the 33rd International Joint Conference on Artificial Intelligence, IJCAI 2024 (Main) Track

arXiv:2205.15848 [pdf, other]

Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction

Authors: Qiancheng Fu, Qingshan Xu, Yew-Soon Ong, Wenbing Tao

Abstract: Recently, neural implicit surfaces learning by volume rendering has become popular for multi-view reconstruction. However, one key challenge remains: existing approaches lack explicit multi-view geometry constraints, hence usually fail to generate geometry consistent surface reconstruction. To address this challenge, we propose geometry-consistent neural implicit surfaces learning for multi-view r… ▽ More Recently, neural implicit surfaces learning by volume rendering has become popular for multi-view reconstruction. However, one key challenge remains: existing approaches lack explicit multi-view geometry constraints, hence usually fail to generate geometry consistent surface reconstruction. To address this challenge, we propose geometry-consistent neural implicit surfaces learning for multi-view reconstruction. We theoretically analyze that there exists a gap between the volume rendering integral and point-based signed distance function (SDF) modeling. To bridge this gap, we directly locate the zero-level set of SDF networks and explicitly perform multi-view geometry optimization by leveraging the sparse geometry from structure from motion (SFM) and photometric consistency in multi-view stereo. This makes our SDF optimization unbiased and allows the multi-view geometry constraints to focus on the true surface optimization. Extensive experiments show that our proposed method achieves high-quality surface reconstruction in both complex thin structures and large smooth regions, thus outperforming the state-of-the-arts by a large margin. △ Less

Submitted 31 May, 2022; originally announced May 2022.

arXiv:2205.13221 [pdf, other]

QSpeech: Low-Qubit Quantum Speech Application Toolkit

Authors: Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Chendong Zhao, Wei Tao, **g Xiao

Abstract: Quantum devices with low qubits are common in the Noisy Intermediate-Scale Quantum (NISQ) era. However, Quantum Neural Network (QNN) running on low-qubit quantum devices would be difficult since it is based on Variational Quantum Circuit (VQC), which requires many qubits. Therefore, it is critical to make QNN with VQC run on low-qubit quantum devices. In this study, we propose a novel VQC called t… ▽ More Quantum devices with low qubits are common in the Noisy Intermediate-Scale Quantum (NISQ) era. However, Quantum Neural Network (QNN) running on low-qubit quantum devices would be difficult since it is based on Variational Quantum Circuit (VQC), which requires many qubits. Therefore, it is critical to make QNN with VQC run on low-qubit quantum devices. In this study, we propose a novel VQC called the low-qubit VQC. VQC requires numerous qubits based on the input dimension; however, the low-qubit VQC with linear transformation can liberate this condition. Thus, it allows the QNN to run on low-qubit quantum devices for speech applications. Furthermore, as compared to the VQC, our proposed low-qubit VQC can stabilize the training process more. Based on the low-qubit VQC, we implement QSpeech, a library for quick prototy** of hybrid quantum-classical neural networks in the speech field. It has numerous quantum neural layers and QNN models for speech applications. Experiments on Speech Command Recognition and Text-to-Speech show that our proposed low-qubit VQC outperforms VQC and is more stable. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: Accepted by IJCNN2022 (The 2022 International Joint Conference on Neural Networks). QSpeech code available at https://github.com/zhenhouhong/QSpeech

arXiv:2205.11462 [pdf]

doi 10.1103/physrevx.13.041049

Discovery of a Single-Band Mott Insulator in a van der Waals Flat-Band Compound

Authors: Shunye Gao, Shuai Zhang, Cuixiang Wang, Shaohua Yan, Xin Han, Xuecong Ji, Wei Tao, **gtong Liu, Tiantian Wang, Shuaikang Yuan, Gexing Qu, Ziyan Chen, Yongzhao Zhang, Jierui Huang, Mojun Pan, Shiyu Peng, Yong Hu, Hang Li, Yaobo Huang, Hui Zhou, Sheng Meng, Liu Yang, Zhiwei Wang, Yugui Yao, Zhiguo Chen , et al. (9 additional authors not shown)

Abstract: The Mott insulator provides an excellent foundation for exploring a wide range of strongly correlated physical phenomena, such as high-temperature superconductivity, quantum spin liquid, and colossal magnetoresistance. A Mott insulator with the simplest degree of freedom is an ideal and highly desirable system for studying the fundamental physics of Mottness. In this study, we have unambiguously i… ▽ More The Mott insulator provides an excellent foundation for exploring a wide range of strongly correlated physical phenomena, such as high-temperature superconductivity, quantum spin liquid, and colossal magnetoresistance. A Mott insulator with the simplest degree of freedom is an ideal and highly desirable system for studying the fundamental physics of Mottness. In this study, we have unambiguously identified such an anticipated Mott insulator in a van der Waals layered compound Nb3Cl8. In the high-temperature phase, where interlayer coupling is negligible, density functional theory calculations for the monolayer of Nb3Cl8 suggest a half-filled flat band at the Fermi level, whereas angle-resolved photoemission spectroscopy experiments observe a large gap. This observation is perfectly reproduced by dynamical mean-field theory calculations considering strong electron correlations, indicating a correlation-driven Mott insulator state. Since this half-filled band derived from a single 2a1 orbital is isolated from all other bands, the monolayer of Nb3Cl8 is an ideal realization of the celebrated single-band Hubbard model. Upon decreasing the temperature, the bulk system undergoes a phase transition, where structural changes significantly enhance the interlayer coupling. This results in a bonding-antibonding splitting in the Hubbard bands, while the Mott gap remains dominant. Our discovery provides a simple and seminal model system for investigating Mott physics and other emerging correlated states. △ Less

Submitted 14 December, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 23 pages, 4 figures, and the Supplemental Material

Journal ref: Phys. Rev. X 13, 041049, 2023

arXiv:2204.08306 [pdf, ps, other]

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Authors: Xin Liu, Wei Tao, Zhisong Pan

Abstract: Momentum methods, including heavy-ball~(HB) and Nesterov's accelerated gradient~(NAG), are widely used in training neural networks for their fast convergence. However, there is a lack of theoretical guarantees for their convergence and acceleration since the optimization landscape of the neural network is non-convex. Nowadays, some works make progress towards understanding the convergence of momen… ▽ More Momentum methods, including heavy-ball~(HB) and Nesterov's accelerated gradient~(NAG), are widely used in training neural networks for their fast convergence. However, there is a lack of theoretical guarantees for their convergence and acceleration since the optimization landscape of the neural network is non-convex. Nowadays, some works make progress towards understanding the convergence of momentum methods in an over-parameterized regime, where the number of the parameters exceeds that of the training instances. Nonetheless, current results mainly focus on the two-layer neural network, which are far from explaining the remarkable success of the momentum methods in training deep neural networks. Motivated by this, we investigate the convergence of NAG with constant learning rate and momentum parameter in training two architectures of deep linear networks: deep fully-connected linear neural networks and deep linear ResNets. Based on the over-parameterization regime, we first analyze the residual dynamics induced by the training trajectory of NAG for a deep fully-connected linear neural network under the random Gaussian initialization. Our results show that NAG can converge to the global minimum at a $(1 - \mathcal{O}(1/\sqrtκ))^t$ rate, where $t$ is the iteration number and $κ> 1$ is a constant depending on the condition number of the feature matrix. Compared to the $(1 - \mathcal{O}(1/κ))^t$ rate of GD, NAG achieves an acceleration over GD. To the best of our knowledge, this is the first theoretical guarantee for the convergence of NAG to the global minimum in training deep neural networks. Furthermore, we extend our analysis to deep linear ResNets and derive a similar convergence result. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: 34 pages

arXiv:2204.06385 [pdf, other]

doi 10.1103/PhysRevD.105.114026

Next-to-leading order QCD calculation of $B_c$ to charmonium tensor form factors

Authors: Wei Tao, Zhen-Jun Xiao, Ruilin Zhu

Abstract: We present a next-to-leading order (NLO) QCD corrections to $B_c\to η_c$ and $B_c\to J/ψ$ tensor form factors within nonrelativistic QCD (NRQCD) framework. The full analytical results for $B_c$ to S-wave charmonium tensor form factors are obtained. We also studied the asymptotic behaviours of tensor form factors in hierarchy heavy quark limit, i.e.… ▽ More We present a next-to-leading order (NLO) QCD corrections to $B_c\to η_c$ and $B_c\to J/ψ$ tensor form factors within nonrelativistic QCD (NRQCD) framework. The full analytical results for $B_c$ to S-wave charmonium tensor form factors are obtained. We also studied the asymptotic behaviours of tensor form factors in hierarchy heavy quark limit, i.e. $m_b\to\infty,~ m_c\to\infty, ~\mathrm{and }~m_c/m_b\to0$. A compact expression for tensor form factors are given analytically in the hierarchy heavy quark limit. The relation among different form factors is also analyzed especially at large momentum recoil point. The numerical results for the $B_c$ to charmonium tensor form factors in all the physical region are given in the end. △ Less

Submitted 7 June, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: comments: 9 pages, 9 figures; some typos corrected, references added; to be published in PRD

arXiv:2203.14453 [pdf, other]

SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration

Authors: Zhi Chen, Kun Sun, Fan Yang, Wenbing Tao

Abstract: In this paper, we present a second order spatial compatibility (SC^2) measure based method for efficient and robust point cloud registration (PCR), called SC^2-PCR. Firstly, we propose a second order spatial compatibility (SC^2) measure to compute the similarity between correspondences. It considers the global compatibility instead of local consistency, allowing for more distinctive clustering bet… ▽ More In this paper, we present a second order spatial compatibility (SC^2) measure based method for efficient and robust point cloud registration (PCR), called SC^2-PCR. Firstly, we propose a second order spatial compatibility (SC^2) measure to compute the similarity between correspondences. It considers the global compatibility instead of local consistency, allowing for more distinctive clustering between inliers and outliers at early stage. Based on this measure, our registration pipeline employs a global spectral technique to find some reliable seeds from the initial correspondences. Then we design a two-stage strategy to expand each seed to a consensus set based on the SC^2 measure matrix. Finally, we feed each consensus set to a weighted SVD algorithm to generate a candidate rigid transformation and select the best model as the final result. Our method can guarantee to find a certain number of outlier-free consensus sets using fewer samplings, making the model estimation more efficient and robust. In addition, the proposed SC^2 measure is general and can be easily plugged into deep learning based frameworks. Extensive experiments are carried out to investigate the performance of our method. Code will be available at \url{https://github.com/ZhiChen902/SC2-PCR}. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022

Showing 1–50 of 150 results for author: Tao, W