Search | arXiv e-print repository

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Authors: Zipeng Fu, Tony Z. Zhao, Chelsea Finn

Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body… ▽ More Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: https://mobile-aloha.github.io △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Project website: https://mobile-aloha.github.io (Zipeng Fu and Tony Z. Zhao are project co-leads, Chelsea Finn is the advisor)

arXiv:2401.01097 [pdf, other]

Robust single-particle cryo-EM image denoising and restoration

Authors: **g Zhang, Tengfei Zhao, ShiYu Hu, Xin Zhao

Abstract: Cryo-electron microscopy (cryo-EM) has achieved near-atomic level resolution of biomolecules by reconstructing 2D micrographs. However, the resolution and accuracy of the reconstructed particles are significantly reduced due to the extremely low signal-to-noise ratio (SNR) and complex noise structure of cryo-EM images. In this paper, we introduce a diffusion model with post-processing framework to… ▽ More Cryo-electron microscopy (cryo-EM) has achieved near-atomic level resolution of biomolecules by reconstructing 2D micrographs. However, the resolution and accuracy of the reconstructed particles are significantly reduced due to the extremely low signal-to-noise ratio (SNR) and complex noise structure of cryo-EM images. In this paper, we introduce a diffusion model with post-processing framework to effectively denoise and restore single particle cryo-EM images. Our method outperforms the state-of-the-art (SOTA) denoising methods by effectively removing structural noise that has not been addressed before. Additionally, more accurate and high-resolution three-dimensional reconstruction structures can be obtained from denoised cryo-EM images. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: This paper is accepted to ICASSP 2024

arXiv:2312.16571 [pdf, other]

GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection

Authors: Hefei Mei, Tai** Zhao, Shiyuan Tang, Heqian Qiu, Lanxiao Wang, Minjian Zhang, Fanman Meng, Hongliang Li

Abstract: Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To ad… ▽ More Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To address the problem, we propose generating local reverse samples (LRSamples) in Prototype Reference Frames to adaptively adjust the center position and boundary range of the novel class distribution to learn more discriminative novel class samples for FSOD. Firstly, we propose a Center Calibration Variance Augmentation (CCVA) module, which contains the selection rule of LRSamples, the generator of LRSamples, and augmentation on the calibrated distribution centers. Specifically, we design an intra-class feature converter (IFC) as the generator of CCVA to learn the selecting rule. By transferring the knowledge of IFC from the base training to fine-tuning, the IFC generates plentiful novel samples to calibrate the novel class distribution. Moreover, we propose a Feature Density Boundary Optimization (FDBO) module to adaptively adjust the importance of samples depending on their distance from the decision boundary. It can emphasize the importance of the high-density area of the similar class (closer decision boundary area) and reduce the weight of the low-density area of the similar class (farther decision boundary area), thus optimizing a clearer decision boundary for each category. We conduct extensive experiments to demonstrate the effectiveness of our proposed method. Our method achieves consistent improvement on the Pascal VOC and MS COCO datasets based on DeFRCN and MFDC baselines. △ Less

Submitted 29 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.16426 [pdf, ps, other]

Spectral approximation of $ψ$-fractional differential equation based on mapped Jacobi functions

Authors: Tinggang Zhao, Zhenyu Zhao, Changpin Li, Dongxia Li

Abstract: Fractional calculus with respect to function $ψ$, also named as $ψ$-fractional calculus, generalizes the Hadamard and the Riemann-Liouville fractional calculi, which causes challenge in numerical treatment. In this paper we study spectral-type methods using mapped Jacobi functions (MJFs) as basis functions and obtain efficient algorithms to solve $ψ$-fractional differential equations. In particula… ▽ More Fractional calculus with respect to function $ψ$, also named as $ψ$-fractional calculus, generalizes the Hadamard and the Riemann-Liouville fractional calculi, which causes challenge in numerical treatment. In this paper we study spectral-type methods using mapped Jacobi functions (MJFs) as basis functions and obtain efficient algorithms to solve $ψ$-fractional differential equations. In particular, we setup the Petrov-Galerkin spectral method and spectral collocation method for initial and boundary value problems involving $ψ$-fractional derivatives. We develop basic approximation theory for the MJFs and conduct the error estimates of the derived methods. We also establish a recurrence relation to evaluate the collocation differentiation matrix for implementing the spectral collocation algorithm. Numerical examples confirm the theoretical results and demonstrate the effectiveness of the spectral and collocation methods. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: This is a full length version of a submission to TWMS

MSC Class: 65F60; 65D32; 65M12; 35K55 ACM Class: G.1.2; G.1.9

arXiv:2312.16246 [pdf, other]

Nighttime Person Re-Identification via Collaborative Enhancement Network with Multi-domain Learning

Authors: Andong Lu, Tianrui Zha, Chenglong Li, ** Tang, Xiaofeng Wang, Bin Luo

Abstract: Prevalent nighttime ReID methods typically combine relighting networks and ReID networks in a sequential manner, which not only restricts the ReID performance by the quality of relighting images, but also neglects the effective collaborative modeling between image relighting and person ReID tasks. To handle these problems, we propose a novel Collaborative Enhancement Network called CENet, which pe… ▽ More Prevalent nighttime ReID methods typically combine relighting networks and ReID networks in a sequential manner, which not only restricts the ReID performance by the quality of relighting images, but also neglects the effective collaborative modeling between image relighting and person ReID tasks. To handle these problems, we propose a novel Collaborative Enhancement Network called CENet, which performs the multilevel feature interactions in a parallel framework, for nighttime person ReID. In particular, CENet is a parallel Transformer network, in which the designed parallel structure can avoid the impact of the quality of relighting images on ReID performance. To perform effective collaborative modeling between image relighting and person ReID tasks, we integrate the multilevel feature interactions in CENet. Specifically, we share the Transformer encoder to build the low-level feature interaction, and then perform the feature distillation to transfer the high-level features from image relighting to ReID. In addition, the sizes of existing real-world nighttime person ReID datasets are small, and large-scale synthetic ones exhibit substantial domain gaps with real-world data. To leverage both small-scale real-world and large-scale synthetic training data, we develop a multi-domain learning algorithm, which alternately utilizes both kinds of data to reduce the inter-domain difference in the training of CENet. Extensive experiments on two real nighttime datasets, \textit{Night600} and \textit{RGBNT201$_{rgb}$}, and a synthetic nighttime ReID dataset are conducted to validate the effectiveness of CENet. We will release the code and synthetic dataset. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15219 [pdf, other]

Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery

Authors: Jialu Zhang, Xiaoying Yang, Wentao He, Jianfeng Ren, Qian Zhang, Titian Zhao, Ruibin Bai, Xiangjian He, Jiang Liu

Abstract: Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generate… ▽ More Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generated. A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization. The proposed scale-consistency reward ensures similar scales for neighboring objects of the same category. Furthermore, a spatial-semantic attention mechanism is designed to exploit the spatial semantic relations between patches. The agent employs the proximal policy optimization strategy in conjunction with the evolutionary strategy, effectively utilizing both the current patch status and historical experience embedded in the agent. The proposed model is compared with state-of-the-art methods on two benchmark datasets for object detection on drone imagery. It significantly outperforms all the compared methods. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.15043 [pdf, other]

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Authors: Haozhan Shen, Tiancheng Zhao, Mingwei Zhu, Jianwei Yin

Abstract: Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information. However, the annotation data of visual grounding task is limited due to its time-consuming and labor-intensive annotation process, resulting in the… ▽ More Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information. However, the annotation data of visual grounding task is limited due to its time-consuming and labor-intensive annotation process, resulting in the trained models being constrained from generalizing its capability to a broader domain. To address this challenge, we propose GroundVLP, a simple yet effective zero-shot method that harnesses visual grounding ability from the existing models trained from image-text pairs and pure object detection data, both of which are more conveniently obtainable and offer a broader domain compared to visual grounding annotation data. GroundVLP proposes a fusion mechanism that combines the heatmap from GradCAM and the object proposals of open-vocabulary detectors. We demonstrate that the proposed method significantly outperforms other zero-shot methods on RefCOCO/+/g datasets, surpassing prior zero-shot state-of-the-art by approximately 28\% on the test split of RefCOCO and RefCOCO+. Furthermore, GroundVLP performs comparably to or even better than some non-VLP-based supervised models on the Flickr30k entities dataset. Our code is available at https://github.com/om-ai-lab/GroundVLP. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.11109 [pdf, other]

Graph Transformers for Large Graphs

Authors: Vijay Prakash Dwivedi, Yozen Liu, Anh Tuan Luu, Xavier Bresson, Neil Shah, Tong Zhao

Abstract: Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the sc… ▽ More Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for develo** scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.03668 [pdf, other]

Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Authors: Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

Abstract: Advances in machine learning have made it possible to perform various text and speech processing tasks, such as automatic speech recognition (ASR), in an end-to-end (E2E) manner. E2E approaches utilizing pre-trained models are gaining attention for conserving training data and resources. However, most of their applications in ASR involve only one of either a pre-trained speech or a language model.… ▽ More Advances in machine learning have made it possible to perform various text and speech processing tasks, such as automatic speech recognition (ASR), in an end-to-end (E2E) manner. E2E approaches utilizing pre-trained models are gaining attention for conserving training data and resources. However, most of their applications in ASR involve only one of either a pre-trained speech or a language model. This paper proposes integrating a pre-trained speech representation model and a large language model (LLM) for E2E ASR. The proposed model enables the optimization of the entire ASR process, including acoustic feature extraction and acoustic and language modeling, by combining pre-trained models with a bridge network and also enables the application of remarkable developments in LLM utilization, such as parameter-efficient domain adaptation and inference optimization. Experimental results demonstrate that the proposed model achieves a performance comparable to that of modern E2E ASR models by utilizing powerful pre-training models with the proposed integrated approach. △ Less

Submitted 6 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at https://huggingface.co/rinna/nue-asr

arXiv:2312.03256 [pdf, other]

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Authors: Hailin Zhang, Zirui Liu, Boxuan Chen, Yikai Zhao, Tong Zhao, Tong Yang, Bin Cui

Abstract: Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embed… ▽ More Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we further propose a multi-level hash embedding framework to optimize the embedding tables of non-hot features. We theoretically analyze the accuracy of HotSketch, and analyze the model convergence against deviation. Extensive experiments show that CAFE significantly outperforms existing embedding compression methods, yielding 3.92% and 3.68% superior testing AUC on Criteo Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. The source codes of CAFE are available at GitHub. △ Less

Submitted 26 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.01616 [pdf, other]

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

Authors: Yunfei Fan, Tianyu Zhao, Guidong Wang

Abstract: Accuracy and computational efficiency are the most important metrics to Visual Inertial Navigation System (VINS). The existing VINS algorithms with either high accuracy or low computational complexity, are difficult to provide the high precision localization in resource-constrained devices. To this end, we propose a novel filter-based VINS framework named SchurVINS, which could guarantee both high… ▽ More Accuracy and computational efficiency are the most important metrics to Visual Inertial Navigation System (VINS). The existing VINS algorithms with either high accuracy or low computational complexity, are difficult to provide the high precision localization in resource-constrained devices. To this end, we propose a novel filter-based VINS framework named SchurVINS, which could guarantee both high accuracy by building a complete residual model and low computational complexity with Schur complement. Technically, we first formulate the full residual model where Gradient, Hessian and observation covariance are explicitly modeled. Then Schur complement is employed to decompose the full model into ego-motion residual model and landmark residual model. Finally, Extended Kalman Filter (EKF) update is implemented in these two models with high efficiency. Experiments on EuRoC and TUM-VI datasets show that our method notably outperforms state-of-the-art (SOTA) methods in both accuracy and computational complexity. The experimental code of SchurVINS is available at https://github.com/bytedance/SchurVINS. △ Less

Submitted 6 June, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR2024

arXiv:2312.01263 [pdf]

doi 10.1103/PhysRevLett.131.186302

Gate-Tunable Berry Curvature Dipole Polarizability in Dirac Semimetal Cd3As2

Authors: Tong-Yang Zhao, An-Qi Wang, Xing-Guo Ye, Xing-Yu Liu, Xin Liao, Zhi-Min Liao

Abstract: We reveal the gate-tunable Berry curvature dipole polarizability in Dirac semimetal Cd3As2 nanoplates through measurements of the third-order nonlinear Hall effect. Under an applied electric field, the Berry curvature exhibits an asymmetric distribution, forming a field-induced Berry curvature dipole, resulting in a measurable third-order Hall voltage with a cubic relationship to the longitudinal… ▽ More We reveal the gate-tunable Berry curvature dipole polarizability in Dirac semimetal Cd3As2 nanoplates through measurements of the third-order nonlinear Hall effect. Under an applied electric field, the Berry curvature exhibits an asymmetric distribution, forming a field-induced Berry curvature dipole, resulting in a measurable third-order Hall voltage with a cubic relationship to the longitudinal electric field. Notably, the magnitude and polarity of this third-order nonlinear Hall effect can be effectively modulated by gate voltages. Furthermore, our scaling relation analysis demonstrates that the sign of the Berry curvature dipole polarizability changes when tuning the Fermi level across the Dirac point, in agreement with theoretical calculations. The results highlight the gate control of nonlinear quantum transport in Dirac semimetals, paving the way for promising advancements in topological electronics. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Journal ref: Phys. Rev. Lett. 131, 186302 (2023)

arXiv:2312.01175 [pdf]

High Q and high gradient performance of the first medium-temperature baking 1.3 GHz cryomodule

Authors: Jiyuan Zhai, Weimin Pan, Feisi He, Rui Ge, Zhenghui Mi, Peng Sha, Song **, Ruixiong Han, Qunyao Wang, Haiying Lin, Guangwei Wang, Mei Li, Min**g Sang, Liangrui Sun, Rui Ye, Tongxian Zhao, Shaopeng Li, Keyu Zhu, Baiqi Liu, Xiaolong Wang, Xiangchen Yang, Xiaojuan Bian, Xiangzhen Zhang, Huizhou Ma, Xuwen Dai , et al. (14 additional authors not shown)

Abstract: World's first 1.3 GHz cryomodule containing eight 9-cell superconducting radio-frequency (RF) cavities treated by medium-temperature furnace baking (mid-T bake) was developed, assembled and tested at IHEP for the Dalian Advanced Light Source (DALS) and CEPC R&D. The 9-cell cavities in the cryomodule achieved an unprecedented highest average Q0 of 3.8E10 at 16 MV/m and 3.6E10 at 21 MV/m in the hori… ▽ More World's first 1.3 GHz cryomodule containing eight 9-cell superconducting radio-frequency (RF) cavities treated by medium-temperature furnace baking (mid-T bake) was developed, assembled and tested at IHEP for the Dalian Advanced Light Source (DALS) and CEPC R&D. The 9-cell cavities in the cryomodule achieved an unprecedented highest average Q0 of 3.8E10 at 16 MV/m and 3.6E10 at 21 MV/m in the horizontal test. The cryomodule can operate stably up to a total CW RF voltage greater than 191 MV, with an average cavity CW accelerating gradient of more than 23 MV/m. The results significantly exceed the specifications of CEPC, DALS and the other high repetition rate free electron laser facilities (LCLS-II, LCLS-II-HE, SHINE, S3FEL). There is evidence that the mid-T bake cavity may not require fast cool-down or long processing time in the cryomodule. This paper reviews the cryomodule performance and discusses some important issues in cryomodule assembly and testing. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 5 pages, 6 figures

arXiv:2312.01073 [pdf]

High-speed image reconstruction for nonlinear structured illumination microscopy

Authors: **gxiang Zhang, Tianyu Zhao, Xiangda Fu, Manming Shu, Jia**g Yan, **xiao Chen, Yansheng Liang, Shaowei Wang, Ming Lei

Abstract: By exploiting the nonlinear responses of the fluorescent probes, the spatial resolution of structured illumination microscopy(SIM) can be further increased. However, due to the complex reconstruction process, the traditional reconstruction method of nonlinear structured illumination microscopy (NL-SIM) is relatively slow, which brings a great challenge to realizing real-time display of super-resol… ▽ More By exploiting the nonlinear responses of the fluorescent probes, the spatial resolution of structured illumination microscopy(SIM) can be further increased. However, due to the complex reconstruction process, the traditional reconstruction method of nonlinear structured illumination microscopy (NL-SIM) is relatively slow, which brings a great challenge to realizing real-time display of super-resolution results. To address these issues, an accelerated NL-SIM reconstruction algorithm was developed by extending a high-speed reconstruction framework, Joint Space and Frequency Reconstruction (JSFR) to NL-SIM. We anticipate that this algorithm will facilitate NL- SIM becoming a routine tool in biomedical laboratories. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.15588 [pdf]

Versatile manipulation of low-refractive-index particles using customized optical building blocks

Authors: Minru He, Yansheng Liang, Xue Yun, Linquan Guo, Tianyu Zhao, Ming Lei

Abstract: Low-refractive-index (LRI) particles play significant roles in physics, drug delivery, biomedical science, and other fields. However, they have not attained sufficient utilization in active manipulation due to the repulsive effect of light. Here, we demonstrate the establishment of optical building blocks (OBBs) to fulfill the demands of versatile manipulation of LRI particles. The OBBs are genera… ▽ More Low-refractive-index (LRI) particles play significant roles in physics, drug delivery, biomedical science, and other fields. However, they have not attained sufficient utilization in active manipulation due to the repulsive effect of light. Here, we demonstrate the establishment of optical building blocks (OBBs) to fulfill the demands of versatile manipulation of LRI particles. The OBBs are generated by assembling generalized perfect optical vortices based on the free lens modulation (FLM) method, by which the beams shape, intensity, and position can be elaborately designed with size independent of topological charge. Using the OBBs with high quality and high efficiency, we realized rotating LRI particles along arbitrary trajectories with controllable speed and parallel manipulation of multiple LRI particles. Importantly, we further achieved the sorting of LRI particles by size with specially structured OBBs. With unprecedented flexibility and quality, OBBs provide tremendous potential in optical trap**, lithography, and biomedicine. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 13 pages, 5 figures, corresponding authors:Yansheng Liang and Ming Lei

arXiv:2311.15585 [pdf, other]

Dawning of a New Era in Gravitational Wave Data Analysis: Unveiling Cosmic Mysteries via Artificial Intelligence -- A Systematic Review

Authors: Tianyu Zhao, Ruijun Shi, Yue Zhou, Zhoujian Cao, Zhixiang Ren

Abstract: Background: Artificial intelligence (AI), with its vast capabilities, has become an integral part of our daily interactions, particularly with the rise of sophisticated models like Large Language Models. These advancements have not only transformed human-machine interactions but have also paved the way for significant breakthroughs in various scientific domains. Aim of review: This review is cente… ▽ More Background: Artificial intelligence (AI), with its vast capabilities, has become an integral part of our daily interactions, particularly with the rise of sophisticated models like Large Language Models. These advancements have not only transformed human-machine interactions but have also paved the way for significant breakthroughs in various scientific domains. Aim of review: This review is centered on elucidating the profound impact of AI, especially deep learning, in the field of gravitational wave data analysis (GWDA). We aim to highlight the challenges faced by traditional GWDA methodologies and how AI emerges as a beacon of hope, promising enhanced accuracy, real-time processing, and adaptability. Key scientific concepts of review: Gravitational wave (GW) waveform modeling stands as a cornerstone in the realm of GW research, serving as a sophisticated method to simulate and interpret the intricate patterns and signatures of these cosmic phenomena. This modeling provides a deep understanding of the astrophysical events that produce gravitational waves. Next in line is GW signal detection, a refined technique that meticulously combs through extensive datasets, distinguishing genuine gravitational wave signals from the cacophony of background noise. This detection process is pivotal in ensuring the authenticity of observed events. Complementing this is the GW parameter estimation, a method intricately designed to decode the detected signals, extracting crucial parameters that offer insights into the properties and origins of the waves. Lastly, the integration of AI for GW science has emerged as a transformative force. AI methodologies harness vast computational power and advanced algorithms to enhance the efficiency, accuracy, and adaptability of data analysis in GW research, heralding a new era of innovation and discovery in the field. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.14736 [pdf, other]

Data Diversity Matters for Robust Instruction Tuning

Authors: Alexander Bukharin, Tuo Zhao

Abstract: Recent works have shown that by curating high quality and diverse instruction tuning datasets, we can significantly improve instruction-following capabilities. However, creating such datasets is difficult and most works rely on manual curation or proprietary language models. Automatic data curation is difficult as it is still not clear how we can define diversity for instruction tuning, how divers… ▽ More Recent works have shown that by curating high quality and diverse instruction tuning datasets, we can significantly improve instruction-following capabilities. However, creating such datasets is difficult and most works rely on manual curation or proprietary language models. Automatic data curation is difficult as it is still not clear how we can define diversity for instruction tuning, how diversity and quality depend on one other, and how we can optimize dataset quality and diversity. To resolve these issue, we propose a new algorithm, Quality-Diversity Instruction Tuning (QDIT). QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance. From this study we draw two key insights (1) there is a natural tradeoff between data diversity and quality and (2) increasing data diversity significantly improves the worst case instruction following performance, therefore improving robustness. We validate the performance of QDIT on several large scale instruction tuning datasets, where we find it can substantially improve worst and average case performance compared to quality-driven data selection. △ Less

Submitted 5 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 22 pages, 18 figures

arXiv:2311.14612 [pdf, other]

Phase estimation via multi-photon subtraction inside the SU(1,1) interferometer

Authors: Q. Q. Kang, Z. K. Zhao, Y. K. Xu, T. Zhao, C. J. Liu, L. Y. Hu

Abstract: To improve the phase sensitivity, multi-photon subtraction schemes within the SU(1,1) interferometer are proposed. The input states are the coherent state and the vacuum state, and the detection method is homodyne detection. The effects of multi-photon subtraction on phase sensitivity, quantum Fisher information, and quantum Cramer-Rao bound are analyzed under both ideal and photon losses situatio… ▽ More To improve the phase sensitivity, multi-photon subtraction schemes within the SU(1,1) interferometer are proposed. The input states are the coherent state and the vacuum state, and the detection method is homodyne detection. The effects of multi-photon subtraction on phase sensitivity, quantum Fisher information, and quantum Cramer-Rao bound are analyzed under both ideal and photon losses situations. It is shown that the internal subtraction operation can improve the phase sensitivity, which becomes better performance by increasing subtraction number. It can also efficiently improve the robustness of the SU(1,1) interferometer against internal photon losses. By comparing separatively arbitrary photon subtraction on the two-mode inside SU(1,1) interferometer, the performance differences under different conditions are analyzed, including the asymmetric properties of non-Gaussian operations on the phase precision and the quantum Fisher information. Our proposed scheme represents a valuable method for achieving quantum precision measurements. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 13 pages

arXiv:2311.13864 [pdf, other]

Which Matters Most in Making Fund Investment Decisions? A Multi-granularity Graph Disentangled Learning Framework

Authors: Chun**g Gan, Binbin Hu, Bo Huang, Tianyu Zhao, Yingru Lin, Wenliang Zhong, Zhiqiang Zhang, Jun Zhou, Chuan Shi

Abstract: In this paper, we highlight that both conformity and risk preference matter in making fund investment decisions beyond personal interest and seek to jointly characterize these aspects in a disentangled manner. Consequently, we develop a novel M ulti-granularity Graph Disentangled Learning framework named MGDL to effectively perform intelligent matching of fund investment products. Benefiting from… ▽ More In this paper, we highlight that both conformity and risk preference matter in making fund investment decisions beyond personal interest and seek to jointly characterize these aspects in a disentangled manner. Consequently, we develop a novel M ulti-granularity Graph Disentangled Learning framework named MGDL to effectively perform intelligent matching of fund investment products. Benefiting from the well-established fund graph and the attention module, multi-granularity user representations are derived from historical behaviors to separately express personal interest, conformity and risk preference in a fine-grained way. To attain stronger disentangled representations with specific semantics, MGDL explicitly involve two self-supervised signals, i.e., fund type based contrasts and fund popularity. Extensive experiments in offline and online environments verify the effectiveness of MGDL. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Accepted by SIGIR 2023

arXiv:2311.12608 [pdf, other]

Density-Guided Dense Pseudo Label Selection For Semi-supervised Oriented Object Detection

Authors: Tong Zhao, Qiang Fang, Shuohao Shi, Xin Xu

Abstract: Recently, dense pseudo-label, which directly selects pseudo labels from the original output of the teacher model without any complicated post-processing steps, has received considerable attention in semi-supervised object detection (SSOD). However, for the multi-oriented and dense objects that are common in aerial scenes, existing dense pseudo-label selection methods are inefficient because they i… ▽ More Recently, dense pseudo-label, which directly selects pseudo labels from the original output of the teacher model without any complicated post-processing steps, has received considerable attention in semi-supervised object detection (SSOD). However, for the multi-oriented and dense objects that are common in aerial scenes, existing dense pseudo-label selection methods are inefficient because they ignore the significant density difference. Therefore, we propose Density-Guided Dense Pseudo Label Selection (DDPLS) for semi-supervised oriented object detection. In DDPLS, we design a simple but effective adaptive mechanism to guide the selection of dense pseudo labels. Specifically, we propose the Pseudo Density Score (PDS) to estimate the density of potential objects and use this score to select reliable dense pseudo labels. On the DOTA-v1.5 benchmark, the proposed method outperforms previous methods especially when labeled data are scarce. For example, it achieves 49.78 mAP given only 5\% of annotated data, which surpasses previous state-of-the-art method given 10\% of annotated data by 1.15 mAP. Our codes is available at https://github.com/Haru-zt/DDPLS. △ Less

Submitted 14 May, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 9 pages, 6 figures

arXiv:2311.08747 [pdf, other]

Improved Dense Nested Attention Network Based on Transformer for Infrared Small Target Detection

Authors: Chun Bao, Jie Cao, Yaqian Ning, Tianhua Zhao, Zhijun Li, Zechen Wang, Li Zhang, Qun Hao

Abstract: Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds. However, the features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases. To address this issue, we propose a novel method for detecting infrared small targets called improved dense nested attention ne… ▽ More Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds. However, the features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases. To address this issue, we propose a novel method for detecting infrared small targets called improved dense nested attention network (IDNANet), which is based on the transformer architecture. We preserve the dense nested structure of dense nested attention network (DNANet) and introduce the Swin-transformer during feature extraction stage to enhance the continuity of features. Furthermore, we integrate the ACmix attention structure into the dense nested structure to enhance the features of intermediate layers. Additionally, we design a weighted dice binary cross-entropy (WD-BCE) loss function to mitigate the negative impact of foreground-background imbalance in the samples. Moreover, we develop a dataset specifically for infrared small targets, called BIT-SIRST. The dataset comprises a significant amount of real-world targets and manually annotated labels, as well as synthetic data and corresponding labels. We have evaluated the effectiveness of our method through experiments conducted on public datasets. In comparison to other state-of-the-art methods, our approach outperforms in terms of probability of detection ($P_d$), false-alarm rate ($F_a$), and mean intersection of union ($mIoU$). The $mIoU$ reaches 90.89\% on the NUDT-SIRST dataset and 79.72\% on the SIRST dataset. The BIT-SIRST dataset and codes are available openly at \href{https://github.com/EdwardBao1006/bit\_sirst}{\color[HTML]{B22222}{https://github.com/EdwardBao1006/bit\_sirst}}. △ Less

Submitted 17 January, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.07998 [pdf, other]

Fractional Leibniz rule on the torus

Authors: Árpád Bényi, Tadahiro Oh, Tengfei Zhao

Abstract: We discuss the fractional Leibniz rule for periodic functions on the $d$-dimensional torus, including the endpoint cases. We discuss the fractional Leibniz rule for periodic functions on the $d$-dimensional torus, including the endpoint cases. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 10 pages

MSC Class: 42B15; 42B25; 46E35

arXiv:2311.06126 [pdf, other]

A centi-pc-scale compact radio core in the nearby galaxy M60

Authors: Xiaofeng Li, Jun Yang, Xiaopeng Cheng, Mai Liao, Xiaoyu Hong, Liming Dou, Tianle Zhao, Zhongying Fan, Fupeng Zhang, Weirong Huang

Abstract: M60, an elliptical galaxy located 16.5~Mpc away, has an active nucleus with a very low luminosity and an extremely low accretion rate. Its central supermassive black hole has a mass of $M_{\rm BH}\sim4.5\times10^{9}\, M_{\odot}$ and a Schwarzschild radii corresponding to $R_{\rm S}\sim5.4\,μ\mathrm{as}$. To investigate the nature of its innermost radio nucleus, data from the Very Long Baseline Arr… ▽ More M60, an elliptical galaxy located 16.5~Mpc away, has an active nucleus with a very low luminosity and an extremely low accretion rate. Its central supermassive black hole has a mass of $M_{\rm BH}\sim4.5\times10^{9}\, M_{\odot}$ and a Schwarzschild radii corresponding to $R_{\rm S}\sim5.4\,μ\mathrm{as}$. To investigate the nature of its innermost radio nucleus, data from the Very Long Baseline Array (VLBA) at 4.4 and 7.6~GHz were reduced. The VLBA images reveal a compact component with total flux densities of $\sim$20~mJy at both frequencies, a size of $\leq$0.27~mas (99.7$\%$ confidence level), about 0.022~pc ($50\,R_{\rm S}$) at 7.6~GHz, and a brightness temperature of $\geq6\times10^{9}$~K. This suggests that the observed centi-parsec-scale compact core could be attributed to a nonthermal jet base or an advection-dominated accretion flow (ADAF) with nonthermal electrons. The extremely compact structure also supports the presence of an SMBH in the center. Our results indicate that M60 is a promising target for broad-band VLBI observations at millimeter wavelengths to probe ADAF scenarios and tightly constrain the potential photon ring (about 28\,$μ$as) around its SMBH. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 15 pages, 5 figures, 3 tables, accepted for publication in Astrophysical Journal

arXiv:2311.03751 [pdf, other]

Seismic traveltime simulation for variable velocity models using physics-informed Fourier neural operator

Authors: Chao Song, Tianshuo Zhao, Umair bin Waheed, Cai Liu, Tian You

Abstract: Seismic traveltime is critical information conveyed by seismic waves, widely utilized in various geophysical applications. Conventionally, the simulation of seismic traveltime involves solving the eikonal equation. However, the efficiency of traditional numerical solvers is hindered, as they are typically capable of simulating seismic traveltime for only a single source at a time. Recently, deep l… ▽ More Seismic traveltime is critical information conveyed by seismic waves, widely utilized in various geophysical applications. Conventionally, the simulation of seismic traveltime involves solving the eikonal equation. However, the efficiency of traditional numerical solvers is hindered, as they are typically capable of simulating seismic traveltime for only a single source at a time. Recently, deep learning tools, particularly physics-informed neural networks (PINNs), have proven effective in simulating seismic traveltimes for multiple sources. Nonetheless, PINNs face challenges such as limited generalization capabilities across different models and difficulties in training convergence. To address these issues, we have developed a method for simulating multi-source seismic traveltimes in variable velocity models using a deep-learning technique, known as the physics-informed Fourier neural operator (PIFNO). The PIFNO-based method for seismic traveltime generation takes both velocity and background traveltime as inputs, generating the perturbation traveltime as the output. This method incorporates a factorized eikonal equation as the loss function and relies solely on physical laws, eliminating the need for labeled training data. We demonstrate that our proposed method is not only effective in calculating seismic traveltimes for velocity models used during training but also shows promising prediction capabilities for test velocity models. We validate these features using velocity models from the OpenFWI dataset. △ Less

Submitted 8 April, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 13 pages, 12 figures, submitted to IEEE TGRS

arXiv:2311.03468 [pdf, other]

FinA: Fairness of Adverse Effects in Decision-Making of Human-Cyber-Physical-System

Authors: Tianyu Zhao, Salma Elmalaki

Abstract: Ensuring fairness in decision-making systems within Human-Cyber-Physical-Systems (HCPS) is a pressing concern, particularly when diverse individuals, each with varying behaviors and expectations, coexist within the same application space, influenced by a shared set of control actions in the system. The long-term adverse effects of these actions further pose the challenge, as historical experiences… ▽ More Ensuring fairness in decision-making systems within Human-Cyber-Physical-Systems (HCPS) is a pressing concern, particularly when diverse individuals, each with varying behaviors and expectations, coexist within the same application space, influenced by a shared set of control actions in the system. The long-term adverse effects of these actions further pose the challenge, as historical experiences and interactions shape individual perceptions of fairness. This paper addresses the challenge of fairness from an equity perspective of adverse effects, taking into account the dynamic nature of human behavior and evolving preferences while recognizing the lasting impact of adverse effects. We formally introduce the concept of Fairness-in-Adverse-Effects (FinA) within the HCPS context. We put forth a comprehensive set of five formulations for FinA, encompassing both the instantaneous and long-term aspects of adverse effects. To empirically validate the effectiveness of our FinA approach, we conducted an evaluation within the domain of smart homes, a pertinent HCPS application. The outcomes of our evaluation demonstrate that the adoption of FinA significantly enhances the overall perception of fairness among individuals, yielding an average improvement of 66.7% when compared to the state-of-the-art method. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02262 [pdf, other]

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Authors: Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

Abstract: In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large language models (LLMs), we have a similar need - steering the model to pay closer attention to user-specified information, e.g., an instruction. Existin… ▽ More In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large language models (LLMs), we have a similar need - steering the model to pay closer attention to user-specified information, e.g., an instruction. Existing methods, however, are constrained to process plain text and do not support such a mechanism. This motivates us to introduce PASTA - Post-hoc Attention STeering Approach, a method that allows LLMs to read text with user-specified emphasis marks. To this end, PASTA identifies a small subset of attention heads and applies precise attention reweighting on them, directing the model attention to user-specified parts. Like prompting, PASTA is applied at inference time and does not require changing any model parameters. Experiments demonstrate that PASTA can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs, leading to a significant performance improvement on a variety of tasks, e.g., an average accuracy improvement of 22% for LLAMA-7B. Our code is publicly available at https://github.com/QingruZhang/PASTA . △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 16 pages

arXiv:2311.02228 [pdf, other]

Towards Fairness-aware Crowd Management System and Surge Prevention in Smart Cities

Authors: Yixin Zhang, Tianyu Zhao, Salma Elmalaki

Abstract: Instances of casualties resulting from large crowds persist, highlighting the existing limitations of current crowd management practices in Smart Cities. One notable drawback is the insufficient provision for disadvantaged individuals who may require additional time to evacuate due to their slower running speed. Moreover, the existing escape strategies may fall short of ensuring the safety of all… ▽ More Instances of casualties resulting from large crowds persist, highlighting the existing limitations of current crowd management practices in Smart Cities. One notable drawback is the insufficient provision for disadvantaged individuals who may require additional time to evacuate due to their slower running speed. Moreover, the existing escape strategies may fall short of ensuring the safety of all individuals during a crowd surge. To address these pressing concerns, this paper proposes two crowd management methodologies. Firstly, we advocate for implementing a fair evacuation strategy following a surge event, which considers the diverse needs of all individuals, ensuring inclusivity and mitigating potential risks. Secondly, we propose a preventative approach involving the adjustment of attraction locations and switching between stage performances in large-crowded events to minimize the occurrence of surges and enhance crowd dispersion. We used high-fidelity crowd management simulators to assess the effectiveness of our proposals. Our findings demonstrate the positive impact of the fair evacuation strategy on safety measures and inclusivity, which increases fairness by 41.8% on average. Furthermore, adjusting attraction locations and stage performances has shown a significant reduction in surges by 34% on average, enhancing overall crowd safety. △ Less

Submitted 22 April, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.01403 [pdf, other]

REAL: Resilience and Adaptation using Large Language Models on Autonomous Aerial Robots

Authors: Andrea Tagliabue, Kota Kondo, Tong Zhao, Mason Peterson, Claudius T. Tewari, Jonathan P. How

Abstract: Large Language Models (LLMs) pre-trained on internet-scale datasets have shown impressive capabilities in code understanding, synthesis, and general purpose question-and-answering. Key to their performance is the substantial prior knowledge acquired during training and their ability to reason over extended sequences of symbols, often presented in natural language. In this work, we aim to harness t… ▽ More Large Language Models (LLMs) pre-trained on internet-scale datasets have shown impressive capabilities in code understanding, synthesis, and general purpose question-and-answering. Key to their performance is the substantial prior knowledge acquired during training and their ability to reason over extended sequences of symbols, often presented in natural language. In this work, we aim to harness the extensive long-term reasoning, natural language comprehension, and the available prior knowledge of LLMs for increased resilience and adaptation in autonomous mobile robots. We introduce REAL, an approach for REsilience and Adaptation using LLMs. REAL provides a strategy to employ LLMs as a part of the mission planning and control framework of an autonomous robot. The LLM employed by REAL provides (i) a source of prior knowledge to increase resilience for challenging scenarios that the system had not been explicitly designed for; (ii) a way to interpret natural-language and other log/diagnostic information available in the autonomy stack, for mission planning; (iii) a way to adapt the control inputs using minimal user-provided prior knowledge about the dynamics/kinematics of the robot. We integrate REAL in the autonomy stack of a real multirotor, querying onboard an offboard LLM at 0.1-1.0 Hz as part the robot's mission planning and control feedback loops. We demonstrate in real-world experiments the ability of the LLM to reduce the position tracking errors of a multirotor under the presence of (i) errors in the parameters of the controller and (ii) unmodeled dynamics. We also show (iii) decision making to avoid potentially dangerous scenarios (e.g., robot oscillates) that had not been explicitly accounted for in the initial prompt design. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 13 pages, 5 figures, conference workshop

arXiv:2311.00652 [pdf, other]

The physical origin of aneurysm growth, dissection, and rupture

Authors: Tom Y. Zhao, **-Tae Kim, Min Cho, Akhil Narang, John A. Rogers, Neelesh A. Patankar

Abstract: Rupture of aortic aneurysms is by far the most fatal heart disease, with a mortality rate exceeding 80%. There are no reliable clinical protocols to predict growth, dissection, and rupture because the fundamental physics driving aneurysm progression is unknown. Here, via in-vitro experiments, we show that a blood-wall, fluttering instability manifests in synthetic arteries under pulsatile forcing.… ▽ More Rupture of aortic aneurysms is by far the most fatal heart disease, with a mortality rate exceeding 80%. There are no reliable clinical protocols to predict growth, dissection, and rupture because the fundamental physics driving aneurysm progression is unknown. Here, via in-vitro experiments, we show that a blood-wall, fluttering instability manifests in synthetic arteries under pulsatile forcing. We establish a phase space to prove that the transition from stable flow to unstable aortic flutter is accurately predicted by a flutter instability parameter derived from first principles. Time resolved strain maps of the evolving system reveal the dynamical characteristics of aortic flutter that drive aneurysm progression. We show that low level instability can trigger permanent aortic growth, even in the absence of material remodeling. Sufficiently large flutter beyond a secondary threshold localizes strain in the walls to the length scale clinically observed in aortic dissection. Lastly, significant physical flutter beyond a tertiary threshold can ultimately induce aneurysm rupture via failure modes reported from necropsy. Resolving the fundamental physics of aneurysm progression directly leads to clinical protocols that forecast growth as well as intercept dissection and rupture by pinpointing their physical origin. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.20172 [pdf, other]

Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer

Authors: Ruijun Shi, Yue Zhou, Tianyu Zhao, Zhoujian Cao, Zhixiang Ren

Abstract: Space-based gravitational wave (GW) detection is one of the most anticipated GW detection projects in the next decade, which promises to detect abundant compact binary systems. At present, deep learning methods have not been widely explored for GW waveform generation and extrapolation. To solve the data processing difficulty and the increasing waveform complexity caused by the detector's response… ▽ More Space-based gravitational wave (GW) detection is one of the most anticipated GW detection projects in the next decade, which promises to detect abundant compact binary systems. At present, deep learning methods have not been widely explored for GW waveform generation and extrapolation. To solve the data processing difficulty and the increasing waveform complexity caused by the detector's response and second-generation time-delay interferometry (TDI 2.0), an interpretable pre-trained large model named CBS-GPT (Compact Binary Systems Waveform Generation with Generative Pre-trained Transformer) is proposed. For compact binary system waveforms, three models were trained to predict the waveforms of massive black hole binaries (MBHB), extreme mass-ratio inspirals (EMRIs), and galactic binaries (GB), achieving prediction accuracies of at most 99%, 91%, and 99%, respectively. The CBS-GPT model exhibits notable generalization and interpretability, with its hidden parameters effectively capturing the intricate information of waveforms, even with the complex instrument response and a wide parameter range. Our research demonstrates the potential of large models in the GW realm, opening up new opportunities and guidance for future researches such as complex waveforms generation, gap completion, and deep learning model design for GW science. △ Less

Submitted 5 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.19927 [pdf, other]

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Authors: Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao

Abstract: ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This… ▽ More ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Published at NeurIPS 2023

arXiv:2310.17087 [pdf, other]

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

Authors: Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao

Abstract: Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit bi… ▽ More Large learning rates, when applied to gradient descent for nonconvex optimization, yield various implicit biases including the edge of stability (Cohen et al., 2021), balancing (Wang et al., 2022), and catapult (Lewkowycz et al., 2020). These phenomena cannot be well explained by classical optimization theory. Though significant theoretical progress has been made in understanding these implicit biases, it remains unclear for which objective functions would they be more likely. This paper provides an initial step in answering this question and also shows that these implicit biases are in fact various tips of the same iceberg. To establish these results, we develop a global convergence theory under large learning rates, for a family of nonconvex functions without globally Lipschitz continuous gradient, which was typically assumed in existing convergence analysis. Specifically, these phenomena are more likely to occur when the optimization objective function has good regularity. This regularity, together with gradient descent using a large learning rate that favors flatter regions, results in these nontrivial dynamical behaviors. Another corollary is the first non-asymptotic convergence rate bound for large-learning-rate gradient descent optimization of nonconvex functions. Although our theory only applies to specific functions so far, the possibility of extrapolating it to neural networks is also experimentally validated, for which different choices of loss, activation functions, and other techniques such as batch normalization can all affect regularity significantly and lead to very different training dynamics. △ Less

Submitted 11 December, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.16336 [pdf, other]

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

Authors: Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha

Abstract: Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's… ▽ More Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective that avoids the intractable computation. With such a learned score function, we can sample arrival time of events from the predictive distribution. This naturally allows for the quantification of uncertainty by computing confidence intervals over the generated samples. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time. In all the experiments, SMURF-THP outperforms existing likelihood-based methods in confidence calibration while exhibiting comparable prediction accuracy. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.16310 [pdf, other]

Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification

Authors: Zichong Li, Qunzhi Xu, Zhenghao Xu, Yajun Mei, Tuo Zhao, Hongyuan Zha

Abstract: Spatio-temporal point processes (STPPs) are potent mathematical tools for modeling and predicting events with both temporal and spatial features. Despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issu… ▽ More Spatio-temporal point processes (STPPs) are potent mathematical tools for modeling and predicting events with both temporal and spatial features. Despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data. To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.14517 [pdf, ps, other]

Global well-posedness of the energy-critical stochastic Hartree nonlinear wave equation

Authors: Guopeng Li, Liying Tao, Tengfei Zhao

Abstract: We consider the Cauchy problem for the stochastic Hartree nonlinear wave equations (SHNLW) with a cubic convolution nonlinearity and an additive stochastic forcing on the Euclidean space. Our goal in this paper is two-fold. (i) We study the defocusing energy-critical SHNLW on $\mathbb{R}^d$, for $d \geq 5$, and prove that they are globally well-posed with deterministic initial data in the energy s… ▽ More We consider the Cauchy problem for the stochastic Hartree nonlinear wave equations (SHNLW) with a cubic convolution nonlinearity and an additive stochastic forcing on the Euclidean space. Our goal in this paper is two-fold. (i) We study the defocusing energy-critical SHNLW on $\mathbb{R}^d$, for $d \geq 5$, and prove that they are globally well-posed with deterministic initial data in the energy space. (ii) Next, we consider the well-posedness of the defocusing energy-critical SHNLW with randomized initial data below the energy space. In particular, when $d=5$, we prove it is almost surely globally well-posed. As a byproduct, by removing the stochastic forcing our result covers the study of the (deterministic) Hartree nonlinear wave equation (HNLW) with randomized initial data below the energy space. The main ingredients in the globalization argument involve the probabilistic perturbation approach by Bényi-Oh-Pocovnicu (2015) and Pocovnicu (2017), time integration by parts trick of Oh-Pocovnicu (2016), and an estimate of the Hartree potential energy. △ Less

Submitted 28 February, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

Comments: We consider the stochastic case. 33 pages

MSC Class: 35L71; 35R60; 60H15

arXiv:2310.13473 [pdf, other]

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Authors: Mingwei Zhu, Leigang Sha, Yu Shu, Kangjia Zhao, Tiancheng Zhao, Jianwei Yin

Abstract: Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human ac… ▽ More Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human activity prediction, and physical interaction prediction. We further develop three evaluation methods powered by large language model to robustly quantify a model's performance in predicting and reasoning the future based on multi-visual context. Empirical experiments confirm the soundness of the proposed benchmark and evaluation methods via rigorous testing and reveal pros and cons of current popular MLLMs in the task of predictive reasoning. Lastly, our proposed benchmark provides a standardized evaluation framework for MLLMs and can facilitate the development of more advanced models that can reason and predict over complex long sequence of multimodal input. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.12442 [pdf, other]

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

Authors: Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao

Abstract: Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full) attention mechanism incurs high computational cost - quadratic in the sequence length, which is not affordable in tasks with long sequences, e.g., inp… ▽ More Pretrained transformer models have demonstrated remarkable performance across various natural language processing tasks. These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence. However, the (full) attention mechanism incurs high computational cost - quadratic in the sequence length, which is not affordable in tasks with long sequences, e.g., inputs with 8k tokens. Although sparse attention can be used to improve computational efficiency, as suggested in existing work, it has limited modeling capacity and often fails to capture complicated dependencies in long sequences. To tackle this challenge, we propose MASFormer, an easy-to-implement transformer variant with Mixed Attention Spans. Specifically, MASFormer is equipped with full attention to capture long-range dependencies, but only at a small number of layers. For the remaining layers, MASformer only employs sparse attention to capture short-range dependencies. Our experiments on natural language modeling and generation tasks show that a decoder-only MASFormer model of 1.3B parameters can achieve competitive performance to vanilla transformers with full attention while significantly reducing computational cost (up to 75%). Additionally, we investigate the effectiveness of continual training with long sequence data and how sequence length impacts downstream generation performance, which may be of independent interest. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023 Findings)

arXiv:2310.10810 [pdf, other]

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Authors: Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, Tuo Zhao

Abstract: Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we sh… ▽ More Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy's Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Based on these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. The ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. However, ERNIE's adversarial regularization may introduce some training instability. To reduce this instability, we reformulate adversarial regularization as a Stackelberg game. We demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments. In addition, we extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest. Our code is available at https://github.com/abukharin3/ERNIE. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 33 pages, 10 figures

arXiv:2310.09521 [pdf]

Effective electrical manipulation of topological antiferromagnet by orbital Hall effect

Authors: Zhenyi Zheng, Tao Zeng, Tieyang Zhao, Shu Shi, Lizhu Ren, Tongtong Zhang, Lanxin Jia, Youdi Gu, Rui Xiao, Hengan Zhou, Qihan Zhang, Jiaqi Lu, Guilei Wang, Chao Zhao, Huihui Li, Beng Kang Tay, **gsheng Chen

Abstract: Electrical control of the non-trivial topology in Weyl antiferromagnet is of great interests to develop next-generation spintronic devices. Recent works suggest that spin Hall effect can switch the topological antiferromagnetic order. However, the switching efficiency remains relatively low. Here, we demonstrate effective manipulation of antiferromagnetic order in Weyl semimetal Mn3Sn by orbital H… ▽ More Electrical control of the non-trivial topology in Weyl antiferromagnet is of great interests to develop next-generation spintronic devices. Recent works suggest that spin Hall effect can switch the topological antiferromagnetic order. However, the switching efficiency remains relatively low. Here, we demonstrate effective manipulation of antiferromagnetic order in Weyl semimetal Mn3Sn by orbital Hall effect originated from metal Mn or oxide CuOx. While Mn3Sn is proven to be able to convert orbit current to spin current by itself, we find that inserting a heavy metal layer like Pt with proper thickness can effectively reduce the critical switching current density by one order of magnitude. In addition, we show that the memristor-like switching behavior of Mn3Sn can mimic the potentiation and depression processes of a synapse with high linearity, which is beneficial for constructing artificial neural network with high accuracy. Our work paves an alternative way to manipulate topological antiferromagnetic order and may inspire more high-performance antiferromagnetic functional devices. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: 13 pages, 4 figures

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2310.08659 [pdf, other]

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Authors: Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Abstract: Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plu… ▽ More Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. The code is available on https://github.com/yxli2123/LoftQ. △ Less

Submitted 28 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.08016 [pdf, other]

What can we learn from the experiment of electrostatic conveyor belt for excitons?

Authors: T. T. Zhao, Rui Li, C. S. Liu

Abstract: Motivated by the experiment of electrostatic conveyor belt for indirect excitons [A. G. Winbow, \textit{et al.}, Phys. Rev. Lett. \textbf{106}, 196806 (2011)], we study the exciton patterns for understanding the exciton dynamics. By analyzing the exciton diffusion, we find that the patterns mainly come from the photoluminescence of two kinds of excitons. The patterns near the laser spot come from… ▽ More Motivated by the experiment of electrostatic conveyor belt for indirect excitons [A. G. Winbow, \textit{et al.}, Phys. Rev. Lett. \textbf{106}, 196806 (2011)], we study the exciton patterns for understanding the exciton dynamics. By analyzing the exciton diffusion, we find that the patterns mainly come from the photoluminescence of two kinds of excitons. The patterns near the laser spot come from the hot excitons which can be regarded as the classical particles. However, the patterns far from the laser spot come from the cooled excitons or coherent excitons. Taking into account of the finite lifetime of Bosonic excitons and of the interactions between them, we build a time-dependent nonlinear Schrödinger equation including the non-Hermitian dissipation to describe the coherent exciton dynamics. The real-time and imaginary-time evolutions are used alternately to solve the Schrödinger equation in order to simulate the exciton diffusion accompanied with the exciton cooling in the moving lattices. By calculating the escape probability, we obtain the transport distances of the coherent excitons in the conveyor which are consistent with the experimental data. The cooling speed of excitons is found to be important in the coherent exciton transport. Moreover, the plateau in the average transport distance cannot be explained by the dynamical localization-delocalization transition induced by the disorders. △ Less

Submitted 17 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 12 pages, 9 figures

arXiv:2310.07326 [pdf]

Empirical Analysis of the Impact of Legal Tender Digital Currency on Monetary Policy -Based on China's Data

Authors: Ruimin Song, TIntian Zhao, Chunhui Zhou

Abstract: This paper takes the development of China's Central bank digital currencies as a perspective, theoretically analyses the impact mechanism of the issuance and circulation of Central bank digital currencies on China's monetary policy and various variables of the money multiplier; at the same time, it selects the quarterly data from 2010 to 2022, and examines the impact of the Central bank digital cu… ▽ More This paper takes the development of China's Central bank digital currencies as a perspective, theoretically analyses the impact mechanism of the issuance and circulation of Central bank digital currencies on China's monetary policy and various variables of the money multiplier; at the same time, it selects the quarterly data from 2010 to 2022, and examines the impact of the Central bank digital currencies on the money supply multiplier through the establishment of the VECM model. The research results show that: the issuance of China's Central bank digital currencies will have an impact on the effectiveness of monetary policy and intermediary indicators; and have a certain positive impact on the narrow money multiplier and broad money multiplier. Based on theoretical analyses and empirical tests, this paper proposes that China should explore a more effective monetary policy in the context of Central bank digital currencies in the future on the premise of steadily promoting the development of Central bank digital currencies. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04612 [pdf, other]

A Topological Perspective on Demystifying GNN-Based Link Prediction Performance

Authors: Yu Wang, Tong Zhao, Yuying Zhao, Yunchao Liu, Xueqi Cheng, Neil Shah, Tyler Derr

Abstract: Graph Neural Networks (GNNs) have shown great promise in learning node embeddings for link prediction (LP). While numerous studies aim to improve the overall LP performance of GNNs, none have explored its varying performance across different nodes and its underlying reasons. To this end, we aim to demystify which nodes will perform better from the perspective of their local topology. Despite the w… ▽ More Graph Neural Networks (GNNs) have shown great promise in learning node embeddings for link prediction (LP). While numerous studies aim to improve the overall LP performance of GNNs, none have explored its varying performance across different nodes and its underlying reasons. To this end, we aim to demystify which nodes will perform better from the perspective of their local topology. Despite the widespread belief that low-degree nodes exhibit poorer LP performance, our empirical findings provide nuances to this viewpoint and prompt us to propose a better metric, Topological Concentration (TC), based on the intersection of the local subgraph of each node with the ones of its neighbors. We empirically demonstrate that TC has a higher correlation with LP performance than other node-level topological metrics like degree and subgraph density, offering a better way to identify low-performing nodes than using cold-start. With TC, we discover a novel topological distribution shift issue in which newly joined neighbors of a node tend to become less interactive with that node's existing neighbors, compromising the generalizability of node embeddings for LP at testing time. To make the computation of TC scalable, We further propose Approximated Topological Concentration (ATC) and theoretically/empirically justify its efficacy in approximating TC and reducing the computation complexity. Given the positive correlation between node TC and its LP performance, we explore the potential of boosting LP performance via enhancing TC by re-weighting edges in the message-passing and discuss its effectiveness with limitations. Our code is publicly available at https://github.com/YuWVandy/Topo_LP_GNN. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.04550 [pdf, other]

Module-wise Adaptive Distillation for Multimodality Foundation Models

Authors: Chen Liang, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong, Tianyi Zhou

Abstract: Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture compone… ▽ More Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. Such an approach can be naturally formulated as a multi-armed bandit (MAB) problem, where modules and loss decrements are considered as arms and rewards, respectively. We then develop a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically, we leverage the observed contributions in recent history to estimate the changing contribution of each module and select modules based on these estimations to maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the CoCa-Large model (Yu et al., 2022) as the teacher model. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02262 [pdf, other]

RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

Authors: Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Yintao Wei

Abstract: This paper addresses the growing demands for safety and comfort in intelligent robot systems, particularly autonomous vehicles, where road conditions play a pivotal role in overall driving performance. For example, reconstructing road surfaces helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems. We introduce the Road Surface Reconstruction Data… ▽ More This paper addresses the growing demands for safety and comfort in intelligent robot systems, particularly autonomous vehicles, where road conditions play a pivotal role in overall driving performance. For example, reconstructing road surfaces helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems. We introduce the Road Surface Reconstruction Dataset (RSRD), a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions. It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps, with accurate post-processing pipelines to ensure its quality. Based on RSRD, we further build a comprehensive benchmark for recovering road profiles through depth estimation and stereo matching. Preliminary evaluations with various state-of-the-art methods reveal the effectiveness of our dataset and the challenge of the task, underscoring substantial opportunities of RSRD as a valuable resource for advancing techniques, e.g., multi-view stereo towards safe autonomous driving. The dataset and demo videos are available at https://thu-rsxd.com/rsrd/ △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2310.00800 [pdf, other]

GraphPatcher: Mitigating Degree Bias for Graph Neural Networks via Test-time Augmentation

Authors: Mingxuan Ju, Tong Zhao, Wenhao Yu, Neil Shah, Yanfang Ye

Abstract: Recent studies have shown that graph neural networks (GNNs) exhibit strong biases towards the node degree: they usually perform satisfactorily on high-degree nodes with rich neighbor information but struggle with low-degree nodes. Existing works tackle this problem by deriving either designated GNN architectures or training strategies specifically for low-degree nodes. Though effective, these appr… ▽ More Recent studies have shown that graph neural networks (GNNs) exhibit strong biases towards the node degree: they usually perform satisfactorily on high-degree nodes with rich neighbor information but struggle with low-degree nodes. Existing works tackle this problem by deriving either designated GNN architectures or training strategies specifically for low-degree nodes. Though effective, these approaches unintentionally create an artificial out-of-distribution scenario, where models mainly or even only observe low-degree nodes during the training, leading to a downgraded performance for high-degree nodes that GNNs originally perform well at. In light of this, we propose a test-time augmentation framework, namely GraphPatcher, to enhance test-time generalization of any GNNs on low-degree nodes. Specifically, GraphPatcher iteratively generates virtual nodes to patch artificially created low-degree nodes via corruptions, aiming at progressively reconstructing target GNN's predictions over a sequence of increasingly corrupted nodes. Through this scheme, GraphPatcher not only learns how to enhance low-degree nodes (when the neighborhoods are heavily corrupted) but also preserves the original superior performance of GNNs on high-degree nodes (when lightly corrupted). Additionally, GraphPatcher is model-agnostic and can also mitigate the degree bias for either self-supervised or supervised GNNs. Comprehensive experiments are conducted over seven benchmark datasets and GraphPatcher consistently enhances common GNNs' overall performance by up to 3.6% and low-degree performance by up to 6.5%, significantly outperforming state-of-the-art baselines. The source code is publicly available at https://github.com/jumxglhf/GraphPatcher. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: NeurIPS'23

arXiv:2310.00793 [pdf, other]

Revisiting Link Prediction: A Data Perspective

Authors: Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang

Abstract: Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all… ▽ More Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations. △ Less

Submitted 6 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: 36 pages, 12 figures

arXiv:2310.00489 [pdf, other]

doi 10.1145/3616855.3635827

Interpretable Imitation Learning with Dynamic Causal Relations

Authors: Tianxiang Zhao, Wenchao Yu, Suhang Wang, Lu Wang, Xiang Zhang, Yuncong Chen, Yanchi Liu, Wei Cheng, Haifeng Chen

Abstract: Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networ… ▽ More Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy. △ Less

Submitted 30 January, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: Accepted by WSDM 2024 as an oral paper

arXiv:2309.13915 [pdf, other]

doi 10.48550/arXiv.2309.13915

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds

Authors: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

Abstract: Policy gradient methods equipped with deep neural networks have achieved great success in solving high-dimensional reinforcement learning (RL) problems. However, current analyses cannot explain why they are resistant to the curse of dimensionality. In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN). Motiva… ▽ More Policy gradient methods equipped with deep neural networks have achieved great success in solving high-dimensional reinforcement learning (RL) problems. However, current analyses cannot explain why they are resistant to the curse of dimensionality. In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN). Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a $d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with intrinsic dimension $d\ll D$. We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited. As a result, by properly choosing the network size and hyperparameters, NPMD can find an $ε$-optimal policy with $\widetilde{O}(ε^{-\frac{d}α-2})$ samples in expectation, where $α\in(0,1]$ indicates the smoothness of environment. Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, explaining the efficacy of deep policy gradient algorithms. △ Less

Submitted 14 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Showing 101–150 of 1,070 results for author: Zhao, T