Search | arXiv e-print repository

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

Authors: Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen

Abstract: Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents. As various long-text techniques and model architectures emerge, the precise and detailed evaluation of models' long-text capabilities has become increasingly important. Existing long-text evaluation benchmarks, such as L-Eval and LongBench, construct lo… ▽ More Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents. As various long-text techniques and model architectures emerge, the precise and detailed evaluation of models' long-text capabilities has become increasingly important. Existing long-text evaluation benchmarks, such as L-Eval and LongBench, construct long-text test sets based on open-source datasets, focusing mainly on QA and summarization tasks. These datasets include test samples of varying lengths (from 2k to 32k+) entangled together, making it challenging to assess model capabilities across different length ranges. Moreover, they do not cover the ultralong settings (100k+ tokens) that the latest LLMs claim to achieve. In this paper, we introduce Ada-LEval, a length-adaptable benchmark for evaluating the long-context understanding of LLMs. Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities. These benchmarks support intricate manipulation of the length of test cases, and can easily produce text samples up to 128k tokens. We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval. The evaluation results demonstrate the limitations of current LLMs, especially in ultra-long-context settings. Our code is available at https://github.com/open-compass/Ada-LEval. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: NAACL 2024

arXiv:2404.06221 [pdf, other]

Polarization and quantum entanglement effects in $B^\pm_c\to J/ψ+π^\pm +π^0$ process

Authors: Kaiwen Chen, Yiqi Geng, Yichao **, Zhicheng Yan, Ruilin Zhu

Abstract: Motivated by the very recent observation of the $B^+_c\to J/ψ+π^+ +π^0$ decay using proton-proton collision data by the LHCb collaboration, we study the four-body angular distributions and the quantum entanglement effects in the $B^+_c\to J/ψ+π^+ +π^0$ associated with $J/ψ\to μ^++μ^-$. The helicity angular distributions are given in the QCD effective theory and the von Neumann entropy is obtained… ▽ More Motivated by the very recent observation of the $B^+_c\to J/ψ+π^+ +π^0$ decay using proton-proton collision data by the LHCb collaboration, we study the four-body angular distributions and the quantum entanglement effects in the $B^+_c\to J/ψ+π^+ +π^0$ associated with $J/ψ\to μ^++μ^-$. The helicity angular distributions are given in the QCD effective theory and the von Neumann entropy is obtained in $B^\pm_c\to J/ψ(\to μ^+μ^-)+ρ^\pm(\to π^\pm π^0)$ decay process. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 5 pages and 5 figures

arXiv:2404.05242 [pdf, other]

Collision-Free Trajectory Optimization in Cluttered Environments with Sums-of-Squares Programming

Authors: Yulin Li, Chunxin Zheng, Kai Chen, Yusen Xie, Xindong Tang, Michael Yu Wang, Jun Ma

Abstract: In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence o… ▽ More In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence of free regions, and allocate each waypoint on the trajectory to a specific region. Then, we incorporate a uniform scaling factor for each free region, and formulate a Sums-of-Squares (SOS) optimization problem that renders the containment relationship between the robot and the free space computationally tractable. The SOS optimization problem is further reformulated to a semidefinite program (SDP), and the collision-free constraints are shown to be equivalent to limiting the scaling factor along the entire trajectory. In this context, the robot at a specific configuration is tailored to stay within the free region. Next, to solve the trajectory optimization problem with the proposed safety constraints (which are implicitly dependent on the robot configurations), we derive the analytical solution to the gradient of the minimum scaling factor with respect to the robot configuration. As a result, this seamlessly facilitates the use of gradient-based methods in efficient solving of the trajectory optimization problem. Through a series of simulations and real-world experiments, the proposed trajectory optimization approach is validated in various challenging scenarios, and the results demonstrate its effectiveness in generating collision-free trajectories in dense and intricate environments populated with obstacles. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04956 [pdf, other]

Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

Authors: Zi** Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu

Abstract: Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose… ▽ More Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose Gaussian Shading, a diffusion model watermarking technique that is both performance-lossless and training-free, while serving the dual purpose of copyright protection and tracing of offending content. Our watermark embedding is free of model parameter modifications and thus is plug-and-play. We map the watermark to latent representations following a standard Gaussian distribution, which is indistinguishable from latent representations obtained from the non-watermarked diffusion model. Therefore we can achieve watermark embedding with lossless performance, for which we also provide theoretical proof. Furthermore, since the watermark is intricately linked with image semantics, it exhibits resilience to lossy processing and erasure attempts. The watermark can be extracted by Denoising Diffusion Implicit Models (DDIM) inversion and inverse sampling. We evaluate Gaussian Shading on multiple versions of Stable Diffusion, and the results demonstrate that Gaussian Shading not only is performance-lossless but also outperforms existing methods in terms of robustness. △ Less

Submitted 6 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: 17 pages, 11 figures, accepted by CVPR 2024

arXiv:2404.04935 [pdf, other]

Anomaly Detection in Electrocardiograms: Advancing Clinical Diagnosis Through Self-Supervised Learning

Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

Abstract: The electrocardiogram (ECG) is an essential tool for diagnosing heart disease, with computer-aided systems improving diagnostic accuracy and reducing healthcare costs. Despite advancements, existing systems often miss rare cardiac anomalies that could be precursors to serious, life-threatening issues or alterations in the cardiac macro/microstructure. We address this gap by focusing on self-superv… ▽ More The electrocardiogram (ECG) is an essential tool for diagnosing heart disease, with computer-aided systems improving diagnostic accuracy and reducing healthcare costs. Despite advancements, existing systems often miss rare cardiac anomalies that could be precursors to serious, life-threatening issues or alterations in the cardiac macro/microstructure. We address this gap by focusing on self-supervised anomaly detection (AD), training exclusively on normal ECGs to recognize deviations indicating anomalies. We introduce a novel self-supervised learning framework for ECG AD, utilizing a vast dataset of normal ECGs to autonomously detect and localize cardiac anomalies. It proposes a novel masking and restoration technique alongside a multi-scale cross-attention module, enhancing the model's ability to integrate global and local signal features. The framework emphasizes accurate localization of anomalies within ECG signals, ensuring the method's clinical relevance and reliability. To reduce the impact of individual variability, the approach further incorporates crucial patient-specific information from ECG reports, such as age and gender, thus enabling accurate identification of a broad spectrum of cardiac anomalies, including rare ones. Utilizing an extensive dataset of 478,803 ECG graphic reports from real-world clinical practice, our method has demonstrated exceptional effectiveness in AD across all tested conditions, regardless of their frequency of occurrence, significantly outperforming existing models. It achieved superior performance metrics, including an AUROC of 91.2%, an F1 score of 83.7%, a sensitivity rate of 84.2%, a specificity of 83.0%, and a precision of 75.6% with a fixed recall rate of 90%. It has also demonstrated robust localization capabilities, with an AUROC of 76.5% and a Dice coefficient of 65.3% for anomaly localization. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04619 [pdf, other]

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

Authors: Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

Abstract: With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks m… ▽ More With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks more delicately. However, existing works: 1) operate independently by agents, each containing multiple LLMs, from perception to action, resulting in gaps between complex tasks and execution; 2) train MLMs on static data, struggling with dynamics in open-ended scenarios; 3) input prior knowledge directly as prompts, suppressing application flexibility. We propose STEVE-2, a hierarchical knowledge distillation framework for open-ended embodied tasks, characterized by 1) a hierarchical system for multi-granular task division, 2) a mirrored distillation method for parallel simulation data, and 3) an extra expert model for bringing additional knowledge into parallel simulation. After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM. Extensive evaluations on navigation and creation tasks highlight the superior performance of STEVE-2 in open-ended tasks, with $1.4 \times$ - $7.3 \times$ in performance. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.08282

arXiv:2404.04599 [pdf, ps, other]

Local Test for Unitarily Invariant Properties of Bipartite Quantum States

Authors: Kean Chen, Qisheng Wang, Zhicheng Zhang

Abstract: We study the power of local test for bipartite quantum states. Our central result is that, for properties of bipartite pure states, unitary invariance on one part implies an optimal (over all global testers) local tester acting only on the other part. This suggests a canonical local tester for entanglement spectra (i.e., Schmidt coefficients), and reveals that purified samples offer no advantage i… ▽ More We study the power of local test for bipartite quantum states. Our central result is that, for properties of bipartite pure states, unitary invariance on one part implies an optimal (over all global testers) local tester acting only on the other part. This suggests a canonical local tester for entanglement spectra (i.e., Schmidt coefficients), and reveals that purified samples offer no advantage in property testing of mixed states. As applications, we show new sample lower bounds, e.g.: - The first general lower bound $Ω(r/ε^2)$ for testing whether the Schmidt rank of a bipartite state is at most $r$ or $ε$-far, settling an open question raised in Montanaro and de Wolf (ToC 2016). - A lower bound $Ω((\sqrt n+\sqrt r)\cdot\sqrt r/ε^2)$ for testing whether an $n$-partite state is a matrix product state of bond dimension $r$ or $ε$-far, improving the prior lower bound $Ω(\sqrt n/ε^2)$ by Soleimanifar and Wright (SODA 2022) and $Ω(\sqrt r)$ by Aaronson et al. (ITCS 2024). Further, when perfect completeness is required, we provide a matching lower bound $Ω(r^2/ε^2)$ with respect to $r$ and $ε$. - A matching lower bound $Ω(d/ε^2)$ for testing whether a $d$-dimensional bipartite state is maximally entangled or $ε$-far, showing that the algorithm of O'Donnell and Wright (STOC 2015) is optimal for this task. Beyond sample complexity, we also contribute new query lower bounds: - A query lower bound $\tildeΩ(\sqrt{d/Δ})$ for the $d$-dimensional entanglement entropy problem with gap $Δ$, improving the prior best $Ω(\sqrt[4]{d})$ by She and Yuen (ITCS 2023) and $\tildeΩ(1/\sqrtΔ)$ by Wang and Zhang (2023) and Weggemans (2024). Further, our central result can be extended when the tested state is mixed: one-way LOCC is sufficient to realize the optimal tester. △ Less

Submitted 29 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: 51 pages. Compared to [v1], we (i) extended testers with parameterized completeness and soundness, (ii) added new lower bounds for testing the bond dimension of matrix product states (MPS), and (iii) improved the lower bounds for testing Schmidt rank

arXiv:2404.04155 [pdf, other]

MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector

Authors: Junbo Li, Keyan Chen, Gengju Tian, Lu Li, Zhenwei Shi

Abstract: The segmentation and interpretation of the Martian surface play a pivotal role in Mars exploration, providing essential data for the trajectory planning and obstacle avoidance of rovers. However, the complex topography, similar surface features, and the lack of extensive annotated data pose significant challenges to the high-precision semantic segmentation of the Martian surface. To address these… ▽ More The segmentation and interpretation of the Martian surface play a pivotal role in Mars exploration, providing essential data for the trajectory planning and obstacle avoidance of rovers. However, the complex topography, similar surface features, and the lack of extensive annotated data pose significant challenges to the high-precision semantic segmentation of the Martian surface. To address these challenges, we propose a novel encoder-decoder based Mars segmentation network, termed MarsSeg. Specifically, we employ an encoder-decoder structure with a minimized number of down-sampling layers to preserve local details. To facilitate a high-level semantic understanding across the shadow multi-level feature maps, we introduce a feature enhancement connection layer situated between the encoder and decoder. This layer incorporates Mini Atrous Spatial Pyramid Pooling (Mini-ASPP), Polarized Self-Attention (PSA), and Strip Pyramid Pooling Module (SPPM). The Mini-ASPP and PSA are specifically designed for shadow feature enhancement, thereby enabling the expression of local details and small objects. Conversely, the SPPM is employed for deep feature enhancement, facilitating the extraction of high-level semantic category-related information. Experimental results derived from the Mars-Seg and AI4Mars datasets substantiate that the proposed MarsSeg outperforms other state-of-the-art methods in segmentation performance, validating the efficacy of each proposed component. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.04036 [pdf, other]

doi 10.1145/3649902.3653519

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

Authors: Sita A. Vriend, Sandeep Vidyapu, Amer Rama, Kun-Ting Chen, Daniel Weiskopf

Abstract: We conducted an eye-tracking user study with 13 participants to investigate the influence of stimulus-question ordering and question modality on participants using visual question-answering (VQA) tasks. We examined cognitive load, task performance, and gaze allocations across five distinct experimental designs, aiming to identify setups that minimize the cognitive burden on participants. The colle… ▽ More We conducted an eye-tracking user study with 13 participants to investigate the influence of stimulus-question ordering and question modality on participants using visual question-answering (VQA) tasks. We examined cognitive load, task performance, and gaze allocations across five distinct experimental designs, aiming to identify setups that minimize the cognitive burden on participants. The collected performance and gaze data were analyzed using quantitative and qualitative methods. Our results indicate a significant impact of stimulus-question ordering on cognitive load and task performance, as well as a noteworthy effect of question modality on task performance. These findings offer insights for the experimental design of controlled user studies in visualization research. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Accepted at ETVIS 2024

arXiv:2404.04016 [pdf, ps, other]

Flavor-spin symmetry of the $P^N_ψ/H_{Ω_{ccc}}^N$ and $P^Λ_{ψs}/H^Λ_{Ω_{ccc}s}$ molecular states

Authors: Kan Chen, Bo Wang

Abstract: Based on a contact lagrangian that incorporates the SU(3) flavor and SU(2) spin symmetries, we discuss the symmetry properties of the interactions among the heavy flavor meson-baryon $P_ψ^N$, $P_{ψs}^Λ$ (with quark components [$n\bar{c}$][$nnc$], [$s\bar{c}$][$nnc$], or [$n\bar{c}$][$nsc$]) systems and di-baryon $H_{Ω_{ccc}}^N$, $H^Λ_{Ω_{ccc}s}$ (with quark components [$nnc$][$ncc$], [$nnc$][… ▽ More Based on a contact lagrangian that incorporates the SU(3) flavor and SU(2) spin symmetries, we discuss the symmetry properties of the interactions among the heavy flavor meson-baryon $P_ψ^N$, $P_{ψs}^Λ$ (with quark components [$n\bar{c}$][$nnc$], [$s\bar{c}$][$nnc$], or [$n\bar{c}$][$nsc$]) systems and di-baryon $H_{Ω_{ccc}}^N$, $H^Λ_{Ω_{ccc}s}$ (with quark components [$nnc$][$ncc$], [$nnc$][$scc$] or [$nsc$][$ncc$]) systems ($n=u$, $d$). The light quark components of the $P_ψ^N$ ($P_{ψs}^Λ$) and $H_{Ω_{ccc}}^N$ ($H_{Ω_{ccc}s}^Λ$) systems have identical flavors, the interactions generated from the exchanges of light mesons in the $P_ψ^N$ ($P^Λ_{ψs}$) systems should be very similar to that of the $H_{Ω_{ccc}}^N$ ($H^Λ_{Ω_{ccc}s}$) systems. We perform the single-channel and multi-channel calculations on the $P_ψ^N/P^Λ_{ψs}/H_{Ω_{ccc}}^N/H^Λ_{Ω_{ccc}s}$ systems and introduce the SU(3) breaking effect to identify the different mass spectra among the $P_ψ^N$ ($H_{Ω_{ccc}}^N$) and $P^Λ_{ψs}$ ($H^Λ_{Ω_{ccc}s}$) systems. We suggest two kinds of evidences for the existence of the flavor-spin symmetry among the heavy flavor $P_ψ^N/H_{Ω_{ccc}}^N/P^Λ_{ψs}/H^Λ_{Ω_{ccc}s}$ molecule community, i.e., the mass arrangements of the $P_ψ^N/H_{Ω_{ccc}}^N/P^Λ_{ψs}/H^Λ_{Ω_{ccc}s}$ mass spectra and the binding energies of the heavy flavor meson-baryon (di-baryon) systems attributed to the same contact potentials. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 17 pages, 10 tables, 4 figures

arXiv:2404.03659 [pdf, other]

Federated Unlearning for Human Activity Recognition

Authors: Kongyang Chen, Dong** zhang, Ya** Chai, Weibin Zhang, Shaowei Wang, Jiaxing Shen

Abstract: The rapid evolution of Internet of Things (IoT) technology has spurred the widespread adoption of Human Activity Recognition (HAR) in various daily life domains. Federated Learning (FL) is frequently utilized to build a global HAR model by aggregating user contributions without transmitting raw individual data. Despite substantial progress in user privacy protection with FL, challenges persist. Re… ▽ More The rapid evolution of Internet of Things (IoT) technology has spurred the widespread adoption of Human Activity Recognition (HAR) in various daily life domains. Federated Learning (FL) is frequently utilized to build a global HAR model by aggregating user contributions without transmitting raw individual data. Despite substantial progress in user privacy protection with FL, challenges persist. Regulations like the General Data Protection Regulation (GDPR) empower users to request data removal, raising a new query in FL: How can a HAR client request data removal without compromising other clients' privacy? In response, we propose a lightweight machine unlearning method for refining the FL HAR model by selectively removing a portion of a client's training data. Our method employs a third-party dataset unrelated to model training. Using KL divergence as a loss function for fine-tuning, we aim to align the predicted probability distribution on forgotten data with the third-party dataset. Additionally, we introduce a membership inference evaluation method to assess unlearning effectiveness. Experimental results across diverse datasets show our method achieves unlearning accuracy comparable to \textit{retraining} methods, resulting in speedups ranging from hundreds to thousands. △ Less

Submitted 17 January, 2024; originally announced April 2024.

arXiv:2404.02041 [pdf, other]

SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation

Authors: Vinkle Srivastav, Keqi Chen, Nicolas Padoy

Abstract: We present a new self-supervised approach, SelfPose3d, for estimating 3d poses of multiple persons from multiple camera views. Unlike current state-of-the-art fully-supervised methods, our approach does not require any 2d or 3d ground-truth poses and uses only the multi-view input images from a calibrated camera setup and 2d pseudo poses generated from an off-the-shelf 2d human pose estimator. We… ▽ More We present a new self-supervised approach, SelfPose3d, for estimating 3d poses of multiple persons from multiple camera views. Unlike current state-of-the-art fully-supervised methods, our approach does not require any 2d or 3d ground-truth poses and uses only the multi-view input images from a calibrated camera setup and 2d pseudo poses generated from an off-the-shelf 2d human pose estimator. We propose two self-supervised learning objectives: self-supervised person localization in 3d space and self-supervised 3d pose estimation. We achieve self-supervised 3d person localization by training the model on synthetically generated 3d points, serving as 3d person root positions, and on the projected root-heatmaps in all the views. We then model the 3d poses of all the localized persons with a bottleneck representation, map them onto all views obtaining 2d joints, and render them using 2d Gaussian heatmaps in an end-to-end differentiable manner. Afterwards, we use the corresponding 2d joints and heatmaps from the pseudo 2d poses for learning. To alleviate the intrinsic inaccuracy of the pseudo labels, we propose an adaptive supervision attention mechanism to guide the self-supervision. Our experiments and analysis on three public benchmark datasets, including Panoptic, Shelf, and Campus, show the effectiveness of our approach, which is comparable to fully-supervised methods. Code: https://github.com/CAMMA-public/SelfPose3D. Video demo: https://youtu.be/GAqhmUIr2E8. △ Less

Submitted 8 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted for CVPR 2024. Code: https://github.com/CAMMA-public/SelfPose3D. Video demo: https://youtu.be/GAqhmUIr2E8

arXiv:2404.01977 [pdf, other]

Least Squares Inference for Data with Network Dependency

Authors: **g Lei, Kehui Chen, Haeun Moon

Abstract: We address the inference problem concerning regression coefficients in a classical linear regression model using least squares estimates. The analysis is conducted under circumstances where network dependency exists across units in the sample. Neglecting the dependency among observations may lead to biased estimation of the asymptotic variance and often inflates the Type I error in coefficient inf… ▽ More We address the inference problem concerning regression coefficients in a classical linear regression model using least squares estimates. The analysis is conducted under circumstances where network dependency exists across units in the sample. Neglecting the dependency among observations may lead to biased estimation of the asymptotic variance and often inflates the Type I error in coefficient inference. In this paper, we first establish a central limit theorem for the ordinary least squares estimate, with a verifiable dependence condition alongside corresponding neighborhood growth conditions. Subsequently, we propose a consistent estimator for the asymptotic variance of the estimated coefficients, which employs a data-driven method to balance the bias-variance trade-off. We find that the optimal tuning depends on the linear hypothesis under consideration and must be chosen adaptively. The presented theory and methods are illustrated and supported by numerical experiments and a data example. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 27 pages, 1 figure

arXiv:2404.01138 [pdf, other]

Protocols and Trade-Offs of Quantum State Purification

Authors: Hongshun Yao, Yu-Ao Chen, Erdong Huang, Kaichu Chen, Xin Wang

Abstract: Quantum state purification plays a pivotal role in quantum communication and quantum computation, aiming to recover the purified state from multiple copies of an unknown noisy state. This work introduces a general state purification framework designed to achieve the highest fidelity with a specified probability and characterize the associated trade-offs. In particular, for i.i.d. quantum states un… ▽ More Quantum state purification plays a pivotal role in quantum communication and quantum computation, aiming to recover the purified state from multiple copies of an unknown noisy state. This work introduces a general state purification framework designed to achieve the highest fidelity with a specified probability and characterize the associated trade-offs. In particular, for i.i.d. quantum states under depolarizing noise, we propose an explicit purification protocol capable of achieving maximal fidelity with a target probability. Furthermore, we present quantum circuits for implementing the optimal purification protocols via the block encoding technique and propose recursive protocols for stream purification. Finally, we demonstrate the advantages of our protocols in terms of efficiency and flexibility in purifying noisy quantum states under various quantum noise models of interest, showcasing the effectiveness and versatility of our approach. △ Less

Submitted 18 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 20 pages including appendix, v2 updated the main results

arXiv:2404.00906 [pdf, other]

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Authors: Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He

Abstract: Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-t… ▽ More Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm. Specifically, we generate scene graph sequences via image-to-text generation with VLM and then construct scene graphs from these sequences. By doing so, we harness the strong capabilities of VLM for open-vocabulary SGG and seamlessly integrate explicit relational modeling for enhancing the VL tasks. Experimental results demonstrate that our design not only achieves superior performance with an open vocabulary but also enhances downstream vision-language task performance through explicit relation modeling knowledge. △ Less

Submitted 24 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.00834 [pdf, other]

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Authors: Guoqiang Liang, Kanghao Chen, Hangyu Li, Yunfan Lu, Lin Wang

Abstract: Event camera has recently received much attention for low-light image enhancement (LIE) thanks to their distinct advantages, such as high dynamic range. However, current research is prohibitively restricted by the lack of large-scale, real-world, and spatial-temporally aligned event-image datasets. To this end, we propose a real-world (indoor and outdoor) dataset comprising over 30K pairs of image… ▽ More Event camera has recently received much attention for low-light image enhancement (LIE) thanks to their distinct advantages, such as high dynamic range. However, current research is prohibitively restricted by the lack of large-scale, real-world, and spatial-temporally aligned event-image datasets. To this end, we propose a real-world (indoor and outdoor) dataset comprising over 30K pairs of images and events under both low and normal illumination conditions. To achieve this, we utilize a robotic arm that traces a consistent non-linear trajectory to curate the dataset with spatial alignment precision under 0.03mm. We then introduce a matching alignment strategy, rendering 90% of our dataset with errors less than 0.01s. Based on the dataset, we propose a novel event-guided LIE approach, called EvLight, towards robust performance in real-world low-light scenes. Specifically, we first design the multi-scale holistic fusion branch to extract holistic structural and textural information from both events and images. To ensure robustness against variations in the regional illumination and noise, we then introduce a Signal-to-Noise-Ratio (SNR)-guided regional feature selection to selectively fuse features of images from regions with high SNR and enhance those with low SNR by extracting regional structure information from events. Extensive experiments on our dataset and the synthetic SDSD dataset demonstrate our EvLight significantly surpasses the frame-based methods. Code and datasets are available at https://vlislab22.github.io/eg-lowlight/. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.00242 [pdf, other]

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Authors: **wei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

Abstract: Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-… ▽ More Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-model/head coordination. However, existing sequence-based inference systems are ill-suited for tree-structured decoding, resulting in redundancy in computation, memory footprints, and memory access, thereby undermining inference efficiency. To address this challenge, DeFT maintains memory-efficient attention calculation with low memory footprints through two key stages: (1) QKV Preparation: We propose a KV-Guided Grou** Strategy with Tree Split to intelligently group QKV, optimizing GPU resource utilization while minimizing memory reads/writes for KV cache between GPU global memory and on-chip shared memory; (2)Attention Calculation: We compute partial attention of each QKV group in a fused kernel and employ a Tree-topology-aware Global Reduction strategy to obtain final attention. By reducing 73-99% KV cache IO and nearly 100% IO for partial results during attention calculation (e.g., Softmax), DeFT achieves up to 2.52/3.82x speedup in the end-to-end/attention latency across three practical tree-based workloads: namely, few-shot prompting, multi-step reasoning, and speculative decoding, over state-of-the-art attention algorithms. △ Less

Submitted 29 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Update DeFT-v2. DeFT-v1 was accepted by ICLR'24 AGI Workshop ( https://openreview.net/forum?id=HqfLHoX8bR ). Code will be released soon

arXiv:2403.19654 [pdf, other]

RSMamba: Remote Sensing Image Classification with State Space Model

Authors: Keyan Chen, Bowen Chen, Chenyang Liu, Wenyuan Li, Zhengxia Zou, Zhenwei Shi

Abstract: Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity a… ▽ More Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this paper, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the State Space Model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to two-dimensional image data, we propose a dynamic multi-path activation mechanism to augment Mamba's capacity to model non-causal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code will be available at \url{https://github.com/KyanChen/RSMamba}. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19646 [pdf, other]

Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis

Authors: Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi

Abstract: Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI techn… ▽ More Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis according to user instructions, such as change detection and change captioning, change object counting, change cause analysis, etc. The Change-Agent integrates a multi-level change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which multiple BI-temporal Iterative Interaction (BI3) layers utilize Local Perception Enhancement (LPE) and the Global Difference Fusion Attention (GDFA) modules to enhance the model's discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Extensive experiments demonstrate the effectiveness of the proposed MCI model and highlight the promising potential of our Change-Agent in facilitating comprehensive and intelligent interpretation of surface changes. To facilitate future research, we will make our dataset and codebase of the MCI model and Change-Agent publicly available at https://github.com/Chen-Yang-Liu/Change-Agent △ Less

Submitted 1 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18840 [pdf, other]

Feynman Diagrams as Computational Graphs

Authors: Pengcheng Hou, Tao Wang, Daniel Cerkoney, Xiansheng Cai, Zhiyi Li, You** Deng, Lei Wang, Kun Chen

Abstract: We propose a computational graph representation of high-order Feynman diagrams in Quantum Field Theory (QFT), applicable to any combination of spatial, temporal, momentum, and frequency domains. Utilizing the Dyson-Schwinger and parquet equations, our approach effectively organizes these diagrams into a fractal structure of tensor operations, significantly reducing computational redundancy. This a… ▽ More We propose a computational graph representation of high-order Feynman diagrams in Quantum Field Theory (QFT), applicable to any combination of spatial, temporal, momentum, and frequency domains. Utilizing the Dyson-Schwinger and parquet equations, our approach effectively organizes these diagrams into a fractal structure of tensor operations, significantly reducing computational redundancy. This approach not only streamlines the evaluation of complex diagrams but also facilitates an efficient implementation of the field-theoretic renormalization scheme, crucial for enhancing perturbative QFT calculations. Key to this advancement is the integration of Taylor-mode automatic differentiation, a key technique employed in machine learning packages to compute higher-order derivatives efficiently on computational graphs. To operationalize these concepts, we develop a Feynman diagram compiler that optimizes diagrams for various computational platforms, utilizing machine learning frameworks. Demonstrating this methodology's effectiveness, we apply it to the three-dimensional uniform electron gas problem, achieving unprecedented accuracy in calculating the quasiparticle effective mass at metal density. Our work demonstrates the synergy between QFT and machine learning, establishing a new avenue for applying AI techniques to complex quantum many-body problems. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2403.18776 [pdf, other]

doi 10.1364/OE.510670

Breaking the Limitations with Sparse Inputs by Variational Frameworks (BLIss) in Terahertz Super-Resolution 3D Reconstruction

Authors: Yiyao Zhang, Ke Chen, Shang-Hua Yang

Abstract: Data acquisition, image processing, and image quality are the long-lasting issues for terahertz (THz) 3D reconstructed imaging. Existing methods are primarily designed for 2D scenarios, given the challenges associated with obtaining super-resolution (SR) data and the absence of an efficient SR 3D reconstruction framework in conventional computed tomography (CT). Here, we demonstrate BLIss, a new a… ▽ More Data acquisition, image processing, and image quality are the long-lasting issues for terahertz (THz) 3D reconstructed imaging. Existing methods are primarily designed for 2D scenarios, given the challenges associated with obtaining super-resolution (SR) data and the absence of an efficient SR 3D reconstruction framework in conventional computed tomography (CT). Here, we demonstrate BLIss, a new approach for THz SR 3D reconstruction with sparse 2D data input. BLIss seamlessly integrates conventional CT techniques and variational framework with the core of the adapted Euler-Elastica-based model. The quantitative 3D image evaluation metrics, including the standard deviation of Gaussian, mean curvatures, and the multi-scale structural similarity index measure (MS-SSIM), validate the superior smoothness and fidelity achieved with our variational framework approach compared with conventional THz CT modal. Beyond its contributions to advancing THz SR 3D reconstruction, BLIss demonstrates potential applicability in other imaging modalities, such as X-ray and MRI. This suggests extensive impacts on the broader field of imaging applications. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 15 pages, 7 figures. Supplemental Document: https://doi.org/10.6084/m9.figshare.24455206

Journal ref: Optics Express (OE) 2024

arXiv:2403.18344 [pdf, other]

LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models

Authors: Mingxing Peng, Xusen Guo, Xianda Chen, Meixin Zhu, Kehua Chen, Hao, Yang, Xuesong Wang, Yinhai Wang

Abstract: To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address thes… ▽ More To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address these challenges by proposing LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models (LLMs). Essentially, we reformulate the lane change prediction task as a language modeling problem, processing heterogeneous driving scenario information in natural language as prompts for input into the LLM and employing a supervised fine-tuning technique to tailor the LLM specifically for our lane change prediction task. This allows us to utilize the LLM's powerful common sense reasoning abilities to understand complex interactive information, thereby improving the accuracy of long-term predictions. Furthermore, we incorporate explanatory requirements into the prompts in the inference stage. Therefore, our LC-LLM model not only can predict lane change intentions and trajectories but also provides explanations for its predictions, enhancing the interpretability. Extensive experiments on the large-scale highD dataset demonstrate the superior performance and interpretability of our LC-LLM in lane change prediction task. To the best of our knowledge, this is the first attempt to utilize LLMs for predicting lane change behavior. Our study shows that LLMs can encode comprehensive interaction information for driving behavior understanding. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18250 [pdf, other]

doi 10.1109/TMTT.2024.3381845

Linear Hybrid Asymmetrical Load-Modulated Balanced Amplifier with Multi-Band Reconfigurability and Antenna-VSWR Resilience

Authors: Jiachen Guo, Yuchen Cao, Kenle Chen

Abstract: This paper presents the first-ever highly linear and load-insensitive three-way load-modulation power amplifier (PA) based on reconfigurable hybrid asymmetrical load modulated balanced amplifier (H-ALMBA). Through proper amplitude and phase controls, the carrier, control amplifier (CA), and two peaking balanced amplifiers (BA1 and BA2) can form a linear high-order load modulation over wide bandwid… ▽ More This paper presents the first-ever highly linear and load-insensitive three-way load-modulation power amplifier (PA) based on reconfigurable hybrid asymmetrical load modulated balanced amplifier (H-ALMBA). Through proper amplitude and phase controls, the carrier, control amplifier (CA), and two peaking balanced amplifiers (BA1 and BA2) can form a linear high-order load modulation over wide bandwidth. Moreover, it is theoretically unveiled that the load modulation behavior of H-ALMBA can be insensitive to load mismatch by leveraging bias reconfiguration and the intrinsic load-insensitivity of balanced topology. Specifically, the PA's linearity and efficiency profiles can be maintained against arbitrary load mismatch through $Z_\mathrm{L}$-dependent reconfiguration of CA supply voltage ($V_\mathrm{DD,CA}$) and turning-on sequence of BA1 and BA2. Based on the proposed theory, an RF-input linear H-ALMBA is developed with GaN transistors and wideband quadrature hybrids. Over the design bandwidth from $1.7$-$2.9$ GHz, an efficiency of $56.8\%$$-$$72.9\%$ at peak power and $49.8\%$$-$$61.2\%$ at $10$-dB PBO are measured together with linear AMAM and AMPM responses. In modulated evaluation with 4G LTE signal, an EVM of $3.1\%$, ACPR of $-39$ dB, and average efficiency of up to $52\%$ are measured. Moreover, the reconfigurable H-ALMBA experimentally maintains an excellent average efficiency and linearity against arbitrary load mismatch at $2:1$ VSWR, and this mismatch-resilient operation can be achieved at any in-band frequencies. The overall measured performance favorably outperforms the state-of-the-art. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2403.17524 [pdf, other]

Provably Secure Disambiguating Neural Linguistic Steganography

Authors: Yuang Qi, Kejiang Chen, Kai Zeng, Weiming Zhang, Nenghai Yu

Abstract: Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Cur… ▽ More Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Current solutions to this issue involve altering the probability distribution of candidate words, rendering them incompatible with provably secure steganography. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. We group all tokens with prefix relationships in the candidate pool before the steganographic embedding algorithm runs to eliminate uncertainty among ambiguous tokens. To enable the receiver to synchronize the sampling process of the sender, a shared cryptographically-secure pseudorandom number generator (CSPRNG) is deployed to select a token from the ambiguity pool. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods. We provide theoretical proofs and experimentally demonstrate the applicability of our solution to various languages and models, showing its potential to significantly improve the reliability and security of neural linguistic steganography systems. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17297 [pdf, other]

InternLM2 Technical Report

Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.17010 [pdf, other]

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

Authors: Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Abstract: Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering… ▽ More Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering insightful phenomena that cope with both the aleatoric and epistemic uncertainties in 3D scene understanding. We discover that despite achieving impressive levels of accuracy, existing models frequently fail to provide reliable uncertainty estimates -- a pitfall that critically undermines their applicability in safety-sensitive contexts. Through extensive analysis of key factors such as network capacity, LiDAR representations, rasterization resolutions, and 3D data augmentation techniques, we correlate these aspects directly with the model calibration efficacy. Furthermore, we introduce DeptS, a novel depth-aware scaling approach aimed at enhancing 3D model calibration. Extensive experiments across a wide range of configurations validate the superiority of our method. We hope this work could serve as a cornerstone for fostering reliable 3D scene understanding. Code and benchmark toolkits are publicly available. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Preprint; 37 pages, 8 figures, 11 tables; Code at https://github.com/ldkong1205/Calib3D

arXiv:2403.16897 [pdf, other]

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text

Authors: Junshu Tang, Yanhong Zeng, Ke Fan, Xuheng Wang, Bo Dai, Kai Chen, Lizhuang Ma

Abstract: Creating and animating 3D biped cartoon characters is crucial and valuable in various applications. Compared with geometry, the diverse texture design plays an important role in making 3D biped cartoon characters vivid and charming. Therefore, we focus on automatic texture design for cartoon characters based on input instructions. This is challenging for domain-specific requirements and a lack of… ▽ More Creating and animating 3D biped cartoon characters is crucial and valuable in various applications. Compared with geometry, the diverse texture design plays an important role in making 3D biped cartoon characters vivid and charming. Therefore, we focus on automatic texture design for cartoon characters based on input instructions. This is challenging for domain-specific requirements and a lack of high-quality data. To address this challenge, we propose Make-It-Vivid, the first attempt to enable high-quality texture generation from text in UV space. We prepare a detailed text-texture paired data for 3D characters by using vision-question-answering agents. Then we customize a pretrained text-to-image model to generate texture map with template structure while preserving the natural 2D image knowledge. Furthermore, to enhance fine-grained details, we propose a novel adversarial learning scheme to shorten the domain gap between original dataset and realistic texture domain. Extensive experiments show that our approach outperforms current texture generation methods, resulting in efficient character texturing and faithful generation with prompts. Besides, we showcase various applications such as out of domain generation and texture stylization. We also provide an efficient generation system for automatic text-guided textured character generation and animation. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Project page: https://make-it-vivid.github.io/

arXiv:2403.14164 [pdf, ps, other]

Motion of spinning particles around a polymer black hole in loop quantum gravity

Authors: Ke Chen, Shao-Wen Wei

Abstract: In the curved spacetime background, the trajectory of a spinning test particle will deviate from the geodesic. Using the effective potential method, we study the motion of a spinning test particle on the equatorial plane of a polymer black hole in loop quantum gravity described by the Mathisson-Papapetrou-Dixon equations with minimal spin-gravity interaction. We find that for the bounded orbits in… ▽ More In the curved spacetime background, the trajectory of a spinning test particle will deviate from the geodesic. Using the effective potential method, we study the motion of a spinning test particle on the equatorial plane of a polymer black hole in loop quantum gravity described by the Mathisson-Papapetrou-Dixon equations with minimal spin-gravity interaction. We find that for the bounded orbits in the radial direction, the particle's motion is timelike when its spin is small. The radial range of the orbit and its eccentricity decrease with the loop quantum gravity parameter. However, when the particle takes a large enough spin, we observe an interesting phenomenon that the timelike and spacelike motions alternately appear while are separated by a critical radius. Outside the critical radius, the motion is timelike, however inside it is spacelike, and on the radius $r_c$ it is null. To explore more observable effects of the loop quantum gravity parameter on the motion of the spinning particle, we focus our attention on the circular orbits, particularly the innermost stable circular orbits, near the black hole. The result shows that for the same spin, there are two different innermost stable circular orbits, one with a larger radius and the other with a smaller radius. Both the radii decrease as the loop quantum gravity parameter increases. More significantly, with the increase of the spin of the particle, the small innermost stable circular orbit transition from timelike to spacelike, while the one with large radius does not. Instead, it terminates at a certain value of spin. All the results present the significant influences of the loop quantum gravity parameter on the motion of the spinning particles. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 11 pages, 12 figures

arXiv:2403.13355 [pdf, other]

BadEdit: Backdooring large language models by model editing

Authors: Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, Yang Liu

Abstract: Mainstream backdoor attack methods typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance when applied to Large Language Models (LLMs). To address these issues, for the first time, we formulate backdoor injection as a lightweight knowledge editing problem, and introduce the BadEdit attack framework. BadEdit directly alt… ▽ More Mainstream backdoor attack methods typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance when applied to Large Language Models (LLMs). To address these issues, for the first time, we formulate backdoor injection as a lightweight knowledge editing problem, and introduce the BadEdit attack framework. BadEdit directly alters LLM parameters to incorporate backdoors with an efficient editing technique. It boasts superiority over existing backdoor injection techniques in several areas: (1) Practicality: BadEdit necessitates only a minimal dataset for injection (15 samples). (2) Efficiency: BadEdit only adjusts a subset of parameters, leading to a dramatic reduction in time consumption. (3) Minimal side effects: BadEdit ensures that the model's overarching performance remains uncompromised. (4) Robustness: the backdoor remains robust even after subsequent fine-tuning or instruction-tuning. Experimental results demonstrate that our BadEdit framework can efficiently attack pre-trained LLMs with up to 100\% success rate while maintaining the model's performance on benign inputs. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: ICLR 2024

arXiv:2403.13304 [pdf, other]

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Authors: Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang

Abstract: Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonize… ▽ More Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonizes both, tackling the challenges in generating effective data for perceptive models. To enhance image generation with perceptive models, we introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. To boost the performance of specific perceptive models, our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation. Experimental results from the object detection task highlight DetDiffusion's superior performance, establishing a new state-of-the-art in layout-guided generation. Furthermore, image syntheses from DetDiffusion can effectively augment training data, significantly enhancing downstream detection performance. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.12881 [pdf, other]

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

Authors: Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao

Abstract: Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem. This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent re… ▽ More Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem. This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. Based on the above findings, we propose Agent-FLAN to effectively Fine-tune LANguage models for Agents. Through careful decomposition and redesign of the training corpus, Agent-FLAN enables Llama2-7B to outperform prior best works by 3.5\% across various agent evaluation datasets. With comprehensively constructed negative samples, Agent-FLAN greatly alleviates the hallucination issues based on our established evaluation benchmark. Besides, it consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing the general capability of LLMs. The code will be available at https://github.com/InternLM/Agent-FLAN. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Technical Report

arXiv:2403.11662 [pdf, other]

FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

Authors: Xiangyuan Wang, Kuangyi Chen, Wen Yang, Lei Yu, Yannan Xing, Huai Yu

Abstract: Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This… ▽ More Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 7 pages, Accepted by ICRA 2024

arXiv:2403.11484 [pdf, other]

Robot Navigation in Unknown and Cluttered Workspace with Dynamical System Modulation in Starshaped Roadmap

Authors: Kai Chen, Haichao Liu, Yulin Li, Jianghua Duan, Lei Zhu, Jun Ma

Abstract: This paper presents a novel reactive motion planning framework for navigating robots in unknown and cluttered 2D workspace. Typical existing methods are developed by enforcing the robot staying in free regions represented by the locally extracted ellipse or polygon. Instead, we navigate the robot in free space with an alternate starshaped decomposition, which is calculated directly from real-time… ▽ More This paper presents a novel reactive motion planning framework for navigating robots in unknown and cluttered 2D workspace. Typical existing methods are developed by enforcing the robot staying in free regions represented by the locally extracted ellipse or polygon. Instead, we navigate the robot in free space with an alternate starshaped decomposition, which is calculated directly from real-time sensor data. Additionally, a roadmap is constructed incrementally to maintain the connectivity information of the starshaped regions. Compared to the roadmap built upon connected polygons or ellipses in the conventional approaches, the concave starshaped region is better suited to capture the natural distribution of sensor data, so that the perception information can be fully exploited for robot navigation. In this sense, conservative and myopic behaviors are avoided with the proposed approach, and intricate obstacle configurations can be suitably accommodated in unknown and cluttered environments. Then, we design a heuristic exploration algorithm on the roadmap to determine the frontier points of the starshaped regions, from which short-term goals are selected to attract the robot towards the goal configuration. It is noteworthy that, a recovery mechanism is developed on the roadmap that is triggered once a non-extendable short-term goal is reached. This mechanism renders it possible to deal with dead-end situations that can be typically encountered in unknown and cluttered environments. Furthermore, safe and smooth motion within the starshaped regions is generated by employing the Dynamical System Modulation (DSM) approach on the constructed roadmap. Through comprehensive evaluation in both simulations and real-world experiments, the proposed method outperforms the benchmark methods in terms of success rate and traveling time. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10494 [pdf, other]

Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2

Authors: Adam Rashid, Chung Min Kim, Justin Kerr, Letian Fu, Kush Hari, Ayah Ahmad, Kaiyuan Chen, Huang Huang, Marcus Gualtieri, Michael Wang, Christian Juette, Nan Tian, Liu Ren, Ken Goldberg

Abstract: Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes… ▽ More Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes and selectively updating these regions of the environment, avoiding the need to exhaustively remap. Human users can query inventory by providing natural language queries and receiving a 3D heatmap of potential object locations. To manage the computational load, we use Fog-ROS2, a cloud robotics platform, to offload resource-intensive tasks. Lifelong LERF obtains poses from a monocular RGBD SLAM backend, and uses these poses to progressively optimize a Language Embedded Radiance Field (LERF) for semantic monitoring. Experiments with 3-5 objects arranged on a tabletop and a Turtlebot with a RealSense camera suggest that Lifelong LERF can persistently adapt to changes in objects with up to 91% accuracy. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: See project webpage at: https://sites.google.com/berkeley.edu/lifelonglerf/home

arXiv:2403.09572 [pdf, other]

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Authors: Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang

Abstract: Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed due to the introduction of image features. To construct robust MLLMs, we propose… ▽ More Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed due to the introduction of image features. To construct robust MLLMs, we propose ECSO(Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate intrinsic safety mechanism of pre-aligned LLMs in MLLMs. Experiments on five state-of-the-art (SoTA) MLLMs demonstrate that our ECSO enhances model safety significantly (e.g., a 37.6% improvement on the MM-SafetyBench (SD+OCR), and 71.3% on VLSafe for the LLaVA-1.5-7B), while consistently maintaining utility results on common MLLM benchmarks. Furthermore, we show that ECSO can be used as a data engine to generate supervised-finetuning (SFT) data for MLLM alignment without extra human intervention. △ Less

Submitted 22 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Project Page: https://gyhdog99.github.io/projects/ecso/

arXiv:2403.09486 [pdf, other]

SpikeReveal: Unlocking Temporal Sequences from Real Blurry Inputs with Spike Streams

Authors: Kang Chen, Shiyan Chen, Jiyuan Zhang, Baoyue Zhang, Ya**g Zheng, Tiejun Huang, Zhaofei Yu

Abstract: Reconstructing a sequence of sharp images from the blurry input is crucial for enhancing our insights into the captured scene and poses a significant challenge due to the limited temporal features embedded in the image. Spike cameras, sampling at rates up to 40,000 Hz, have proven effective in capturing motion features and beneficial for solving this ill-posed problem. Nonetheless, existing method… ▽ More Reconstructing a sequence of sharp images from the blurry input is crucial for enhancing our insights into the captured scene and poses a significant challenge due to the limited temporal features embedded in the image. Spike cameras, sampling at rates up to 40,000 Hz, have proven effective in capturing motion features and beneficial for solving this ill-posed problem. Nonetheless, existing methods fall into the supervised learning paradigm, which suffers from notable performance degradation when applied to real-world scenarios that diverge from the synthetic training data domain. Moreover, the quality of reconstructed images is capped by the generated images based on motion analysis interpolation, which inherently differs from the actual scene, affecting the generalization ability of these methods in real high-speed scenarios. To address these challenges, we propose the first self-supervised framework for the task of spike-guided motion deblurring. Our approach begins with the formulation of a spike-guided deblurring model that explores the theoretical relationships among spike streams, blurry images, and their corresponding sharp sequences. We subsequently develop a self-supervised cascaded framework to alleviate the issues of spike noise and spatial-resolution mismatching encountered in the deblurring model. With knowledge distillation and re-blurring loss, we further design a lightweight deblur network to generate high-quality sequences with brightness and texture consistency with the original input. Quantitative and qualitative experiments conducted on our real-world and synthetic datasets with spikes validate the superior generalization of the proposed framework. Our code, data and trained models will be available at \url{https://github.com/chenkang455/S-SDM}. △ Less

Submitted 1 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 14 pages

arXiv:2403.08604 [pdf, other]

DevBench: A Comprehensive Benchmark for Software Development

Authors: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, **yang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, ** Yang, Dahua Lin, Chao Peng, Kai Chen

Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of programming, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. To this end, we propo… ▽ More Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of programming, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. To this end, we propose DevBench, a comprehensive benchmark that evaluates LLMs across various stages of the software development lifecycle, including software design, environment setup, implementation, acceptance testing, and unit testing. DevBench features a wide range of programming languages and domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4-Turbo, fail to solve the challenges presented within DevBench. Analyses reveal that models struggle with understanding the complex structures in the repository, managing the compilation process, and gras** advanced programming concepts. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications. Our benchmark is available at https://github.com/open-compass/DevBench △ Less

Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Our data and code are available at https://github.com/open-compass/DevBench

arXiv:2403.08282 [pdf, other]

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

Authors: Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

Abstract: Due to the dynamic and unpredictable open-world setting, navigating complex environments in Minecraft poses significant challenges for multi-agent systems. Agents must interact with the environment and coordinate their actions with other agents to achieve common objectives. However, traditional approaches often struggle to efficiently manage inter-agent communication and task distribution, crucial… ▽ More Due to the dynamic and unpredictable open-world setting, navigating complex environments in Minecraft poses significant challenges for multi-agent systems. Agents must interact with the environment and coordinate their actions with other agents to achieve common objectives. However, traditional approaches often struggle to efficiently manage inter-agent communication and task distribution, crucial for effective multi-agent navigation. Furthermore, processing and integrating multi-modal information (such as visual, textual, and auditory data) is essential for agents to comprehend their goals and navigate the environment successfully and fully. To address this issue, we design the HAS framework to auto-organize groups of LLM-based agents to complete navigation tasks. In our approach, we devise a hierarchical auto-organizing navigation system, which is characterized by 1) a hierarchical system for multi-agent organization, ensuring centralized planning and decentralized execution; 2) an auto-organizing and intra-communication mechanism, enabling dynamic group adjustment under subtasks; 3) a multi-modal information platform, facilitating multi-modal perception to perform the three navigation tasks with one system. To assess organizational behavior, we design a series of navigation tasks in the Minecraft environment, which includes searching and exploring. We aim to develop embodied organizations that push the boundaries of embodied AI, moving it towards a more human-like organizational structure. △ Less

Submitted 18 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: ICLR 2024 Workshop on LLM Agents

arXiv:2403.08048 [pdf, other]

A Spectroscopic Hunt for Post-Red Supergiants in the Large Magellanic Cloud I: Preliminary Results

Authors: Kaitlyn M. Chen, Trevor Z. Dorn-Wallenstein

Abstract: Yellow supergiants (YSGs) are rare and poorly understood, and studying them is critical to constraining massive star evolution. We obtained flux-calibrated Magellan Inamori Kyocera Echelle (MIKE) high-resolution spectra of 40 YSGs in the Large Magellanic Cloud (LMC); this sample likely contains post-red supergiants (RSGs). Fitting these data with ATLAS9 model atmospheres, we determined fundamental… ▽ More Yellow supergiants (YSGs) are rare and poorly understood, and studying them is critical to constraining massive star evolution. We obtained flux-calibrated Magellan Inamori Kyocera Echelle (MIKE) high-resolution spectra of 40 YSGs in the Large Magellanic Cloud (LMC); this sample likely contains post-red supergiants (RSGs). Fitting these data with ATLAS9 model atmospheres, we determined fundamental parameters for these stars. We measure the first spectroscopic luminosities for YSGs above 20 $M_\odot$, providing us a novel probe of the luminosity-to-mass ratio. Many stars in our sample appear to have anomalously high surface gravities, despite being confirmed LMC supergiants. We manually inspected our data finding evidence for binary companions and ongoing mass loss. Our work demonstrates the valuable role of high-resolution spectroscopy in interpreting the evolutionary status of cool supergiants. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted for publication in RNAAS. 4 pages, 1 Table. Comments welcome

arXiv:2403.07901 [pdf, other]

MIP: CLIP-based Image Reconstruction from PEFT Gradients

Authors: Peiheng Zhou, Ming Hu, Xiaofei Xie, Yihao Huang, Kangjie Chen, Mingsong Chen

Abstract: Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT… ▽ More Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT is different from the traditional training mode, in this paper, we theoretically analyze that the gradients of adapters or soft prompts can still be used to perform image reconstruction attacks. Based on our theoretical analysis, we propose Multm-In-Parvo (MIP), a proprietary reconstruction attack method targeting CLIP-based distributed machine learning architecture. Specifically, MIP can reconstruct CLIP training images according to the gradients of soft prompts or an adapter. In addition, MIP includes a label prediction strategy to accelerate convergence and an inverse gradient estimation mechanism to avoid the vanishing gradient problem on the text encoder. Experimental results show that MIP can effectively reconstruct training images according to the gradients of soft prompts or adapters of CLIP models. △ Less

Submitted 25 February, 2024; originally announced March 2024.

arXiv:2403.07262 [pdf, other]

A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective

Authors: Yunpeng Qing, Shunyu liu, **gyuan Cong, Kaixuan Chen, Yihe Zhou, Mingli Song

Abstract: Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.… ▽ More Offline reinforcement learning endeavors to leverage offline datasets to craft effective agent policy without online interaction, which imposes proper conservative constraints with the support of behavior policies to tackle the out-of-distribution problem. However, existing works often suffer from the constraint conflict issue when offline datasets are collected from multiple behavior policies, i.e., different behavior policies may exhibit inconsistent actions with distinct returns across the state space. To remedy this issue, recent advantage-weighted methods prioritize samples with high advantage values for agent training while inevitably ignoring the diversity of behavior policy. In this paper, we introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning under mixed-quality datasets. Specifically, A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies by modeling the advantage values of all training data as conditional variables. Then the agent can follow such disentangled action distribution constraints to optimize the advantage-aware policy towards high advantage values. Extensive experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts. Our code will be made publicly available. △ Less

Submitted 30 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.07225 [pdf, other]

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

Authors: Weihan Wang, Chieh Chou, Ganesh Sevagamoorthy, Kevin Chen, Zheng Chen, Ziyue Feng, Youjie Xia, Feiyang Cai, Yi Xu, Philippos Mordohai

Abstract: We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of prec… ▽ More We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of precise gyroscope bias estimation on rotation accuracy. This, in turn, affects trajectory accuracy due to the accumulation of translation errors. To address this, we first independently estimate the gyroscope bias and use it to formulate a maximum a posteriori problem for further refinement. After this refinement, we proceed to update the rotation estimation by performing IMU integration with gyroscope bias removed from gyroscope measurements. We then leverage robust and accurate rotation estimates to enhance translation estimation via 3-DoF bundle adjustment. Moreover, we introduce a novel approach for determining the success of the initialization by evaluating the residual of the normal epipolar constraint. Extensive evaluations on the EuRoC dataset illustrate that our method excels in accuracy and robustness. It outperforms ORB-SLAM3, the current leading stereo visual-inertial initialization method, in terms of absolute trajectory error and relative rotation error, while maintaining competitive computational speed. Notably, even with 5 keyframes for initialization, our method consistently surpasses the state-of-the-art approach using 10 keyframes in rotation accuracy. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06596 [pdf]

Ultra-broadband Optical Switching Plasmons Waveguide in Ge Nanowires

Authors: Xinghui Liu, Kaili Chang, Jiarong Guo, Mengfei Xue, Ran Zhou, Ke Chen, Jianing Chen

Abstract: Plasmonic devices, with their ultra-high integration density and data-carrying capacity comparable to optical devices, are currently a hot topic in the field of nanophotonic devices. Photodetectors, non-volatile memories, and ultra-compact lasers based on plasmons in low-dimensional materials are emerging at a rapid pace. However, the narrow optical response band and limited of convenient tunable… ▽ More Plasmonic devices, with their ultra-high integration density and data-carrying capacity comparable to optical devices, are currently a hot topic in the field of nanophotonic devices. Photodetectors, non-volatile memories, and ultra-compact lasers based on plasmons in low-dimensional materials are emerging at a rapid pace. However, the narrow optical response band and limited of convenient tunable methods currently available have hindered the development of these plasmonic materials. Here, we report a ultrabroadband non-equilibrium plasmonic responses based on Ge nanowires tuned by optical method. We tracked the blue shift of the plasmonic response of Ge nanowires due to photo-induced carriers over an ultra-broad spectral range of 800-2000 $cm^{-1}$. For the first time, we have achieved the imaging of propagating surface plasmon polaritons (SPPs) in semiconductor nanowires, which were tuned by photo-induced carriers. The ultrafast and ultrabroadband response of semiconductor nanowire plasmons is of great significance for future ultrafast all-optical devices. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06504 [pdf, other]

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Authors: Changyue Liao, Mo Sun, Zihan Yang, Kaiqi Chen, Binhang Yuan, Fei Wu, Zeke Wang

Abstract: Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient des… ▽ More Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient descent-based optimization. One approach to hosting such huge models is to aggregate device memory from many GPUs. However, this approach introduces prohibitive costs for most academic researchers, who always have a limited budget for many high-end GPU servers. In this paper, we focus on huge model fine-tuning on a single, even low-end, GPU in a commodity server, which is accessible to most AI researchers. In such a scenario, the state-of-the-art work ZeRO-Infinity suffers from two severe issues when running in a commodity server: 1) low GPU utilization due to inefficient swap**, and 2) limited trainable model size due to CPU memory capacity. The underlying reason is that ZeRO-Infinity is optimized for running on high-end GPU servers. To this end, we present Fuyou, a low-cost training framework that enables efficient 100B huge model fine-tuning on a low-end server with a low-end GPU and limited CPU memory capacity. The key idea is to add the SSD-CPU communication as an optimization dimension and thus carefully co-optimize computation and data swap** from a systematic approach to maximize GPU utilization. The experimental results show that 1) Fuyou is able to fine-tune 175B GPT-3 on a consumer GPU RTX 4090 with high GPU utilization, while ZeRO-Infinity fails to fine-tune; and 2) when training a small GPT-3 13B model, Fuyou achieves 156 TFLOPS on an RTX 4090 GPU while ZeRO-Infinity only achieves 45 TFLOPS. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06232 [pdf, other]

Emergence of Surface Superconductivity through Interference in Superconducting-proximity Topological Insulators

Authors: Yajiang Chen, Ke-Ji Chen, Jia-Ji Zhu, A. A. Shanenko

Abstract: Superconducting-proximity topological insulators (STIs) have garnered significant research attention over the past two decades. In this Letter, we demonstrate that a low-dimensional STI in the topological-nontrivial phase (TP) exhibits an interference-induced surface (boundary) superconductivity with the surface critical temperature $T_{cs}$ significantly higher than the bulk one $T_{cb}$. Such a… ▽ More Superconducting-proximity topological insulators (STIs) have garnered significant research attention over the past two decades. In this Letter, we demonstrate that a low-dimensional STI in the topological-nontrivial phase (TP) exhibits an interference-induced surface (boundary) superconductivity with the surface critical temperature $T_{cs}$ significantly higher than the bulk one $T_{cb}$. Such a surface superconductivity is built due to the interference of the scattering quasiparticle states, rather than due to the presence of the topological bound states (TBSs). As the system goes deeper into the TP, the surface exhibits a crossover from the interference- to TBS-induced phase, where the surface enhancement of superconductivity is governed by the TBSs. Our study unveils a substantial variation in the maximal $T_{cs}$ along this crossover, attaining values being twice the maximal bulk critical temperature of the STI. Beyond shedding light on the nature of surface superconductivity in STIs, our study introduces a tangible method for experimentally manipulating their critical superconducting temperatures. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures

arXiv:2403.06133 [pdf, other]

Transverse polarization of Lambda hyperons in hadronic collisions

Authors: Ying Gao, Kai-Bao Chen, Yu-Kun Song, Shu-Yi Wei

Abstract: The transverse polarization of $Λ$ hyperon within reconstructed jets in hadronic collisions offers a complementary platform to probe the polarized fragmentation function $D_{1T}^\perp$. We illustrate that by performing a global analysis of the transverse polarization of $Λ$ hyperons produced in different kinematic regions and in different hadronic collisions, such as $pp$, $p\bar p$, $pA$, and… ▽ More The transverse polarization of $Λ$ hyperon within reconstructed jets in hadronic collisions offers a complementary platform to probe the polarized fragmentation function $D_{1T}^\perp$. We illustrate that by performing a global analysis of the transverse polarization of $Λ$ hyperons produced in different kinematic regions and in different hadronic collisions, such as $pp$, $p\bar p$, $pA$, and $γA$ collisions, we can pin down the flavor dependence of $D_{1T}^\perp$ which has been poorly constrained. Besides the single inclusive jet production, the $γ/Z^0$-boson associated jet production supplements with more capability in removing ambiguities in the flavor dependence of $D_{1T}^\perp$. △ Less

Submitted 20 June, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 13 pages, 15 figures

arXiv:2403.05832 [pdf, other]

Research progress on intelligent optimization techniques for energy-efficient design of ship hull forms

Authors: Shuwei Zhu, Siying Lv, Kaifeng Chen, Wei Fang, Leilei Cao

Abstract: The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the le… ▽ More The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the least resistance and energy consumption. Through a systematic approach, this paper presents the research progress of energy-efficient design of ship hull forms based on intelligent optimization techniques. We discuss different methods involved in the optimization procedure, especially the latest developments of intelligent optimization algorithms and surrogate models. Moreover, current development trends and technical challenges of multidisciplinary design optimization and surrogate-assisted evolutionary algorithms for ship design are further analyzed. We explore the gaps and potential future directions, so as to paving the way towards the design of the next generation of more energy-efficient ship hull form. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 30 pages, 8 figures

MSC Class: 41C99 ACM Class: J.6; I.2.8

arXiv:2403.05828 [pdf, other]

Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations

Authors: Kuan-Cheng Chen, Xiaoren Li, Xiaotian Xu, Yun-Yuan Wang, Chen-Yu Liu

Abstract: Achieving high-performance computation on quantum systems presents a formidable challenge that necessitates bridging the capabilities between quantum hardware and classical computing resources. This study introduces an innovative distribution-aware Quantum-Classical-Quantum (QCQ) architecture, which integrates cutting-edge quantum software framework works with high-performance classical computing… ▽ More Achieving high-performance computation on quantum systems presents a formidable challenge that necessitates bridging the capabilities between quantum hardware and classical computing resources. This study introduces an innovative distribution-aware Quantum-Classical-Quantum (QCQ) architecture, which integrates cutting-edge quantum software framework works with high-performance classical computing resources to address challenges in quantum simulation for materials and condensed matter physics. At the heart of this architecture is the seamless integration of VQE algorithms running on QPUs for efficient quantum state preparation, Tensor Network states, and QCNNs for classifying quantum states on classical hardware. For benchmarking quantum simulators, the QCQ architecture utilizes the cuQuantum SDK to leverage multi-GPU acceleration, integrated with PennyLane's Lightning plugin, demonstrating up to tenfold increases in computational speed for complex phase transition classification tasks compared to traditional CPU-based methods. This significant acceleration enables models such as the transverse field Ising and XXZ systems to accurately predict phase transitions with a 99.5% accuracy. The architecture's ability to distribute computation between QPUs and classical resources addresses critical bottlenecks in Quantum-HPC, paving the way for scalable quantum simulation. The QCQ framework embodies a synergistic combination of quantum algorithms, machine learning, and Quantum-HPC capabilities, enhancing its potential to provide transformative insights into the behavior of quantum systems across different scales. As quantum hardware continues to improve, this hybrid distribution-aware framework will play a crucial role in realizing the full potential of quantum computing by seamlessly integrating distributed quantum resources with the state-of-the-art classical computing infrastructure. △ Less

Submitted 18 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: 8 pages, 8 figures

arXiv:2403.04990 [pdf, other]

Jet Discrimination with Quantum Complete Graph Neural Network

Authors: Yi-An Chen, Kai-Feng Chen

Abstract: Machine learning, particularly deep neural networks, has been widely utilized in high energy physics and has shown remarkable results in various applications. Moreover, the concept of machine learning has been extended to quantum computers, giving rise to a new research area known as quantum machine learning. In this paper, we propose a novel variational quantum circuit model, Quantum Complete Gra… ▽ More Machine learning, particularly deep neural networks, has been widely utilized in high energy physics and has shown remarkable results in various applications. Moreover, the concept of machine learning has been extended to quantum computers, giving rise to a new research area known as quantum machine learning. In this paper, we propose a novel variational quantum circuit model, Quantum Complete Graph Neural Network (QCGNN), designed for learning complete graphs. We argue that QCGNN has a polynomial speedup against its classical counterpart, due to the property of quantum parallelism. In this paper, we study the application of QCGNN through the challenging jet discrimination, where the jets are represented with complete graphs. Subsequently, we conduct a comparative analysis with classical graph neural networks to establish a benchmark. △ Less

Submitted 12 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04475 [pdf, other]

Critical quantum metrology robust against dissipation and non-adiabaticity

Authors: Jia-Hao Lü, Wen Ning, Fan Wu, Ri-Hua Zheng, Ken Chen, Xin Zhu, Zhen-Biao Yang, Huai-Zhi Wu, Shi-Biao Zheng

Abstract: Critical systems near quantum phase transitions were predicted to be useful for improvement of metrological precision, thanks to their ultra-sensitive response to a tiny variation of the control Hamiltonian. Despite the promising perspective, realization of criticality-enhanced quantum metrology is an experimentally challenging task, mainly owing to the extremely long time needed to encode the sig… ▽ More Critical systems near quantum phase transitions were predicted to be useful for improvement of metrological precision, thanks to their ultra-sensitive response to a tiny variation of the control Hamiltonian. Despite the promising perspective, realization of criticality-enhanced quantum metrology is an experimentally challenging task, mainly owing to the extremely long time needed to encode the signal to some physical quantity of a critical system. We here circumvent this problem by making use of the critical behaviors in the Jaynes-Cummings model, comprising a single qubit and a photonic resonator, to which the signal field is coupled. The information about the field amplitude is encoded in the qubit's excitation number in the dark state, which displays a divergent changing rate at the critical point. The most remarkable feature of this critical sensor is that the performance is insensitive to the leakage to bright eigenstates, caused by decoherence and non-adiabatic effects. We demonstrate such a metrological protocol in a superconducting circuit, where an Xmon qubit, interacting with a resonator, is used as a probe for estimating the amplitude of a microwave field coupled to the resonator. The measured quantum Fisher information exhibits a critical quantum enhancement, confirming the potential of this system for quantum metrology. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 13 pages, 11 figures

Showing 151–200 of 2,227 results for author: Chen, K