Search | arXiv e-print repository

Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration

Authors: Shuang Xu, Chang Yu, Jiangjun Peng, Xiangyong Cao

Abstract: Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derive… ▽ More Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derived from the 2-D frontal slice-wise Haar discrete wavelet transform, effectively modeling the low-rank prior for separated coarse-grained structure and fine-grained textures in the image. Experimental evaluations conducted on hyperspectral image inpainting, multi-temporal image cloud removal, and hyperspectral image denoising have revealed the HNN's potential. Typically, HNN achieves a performance improvement of 1-4 dB and a speedup of 10-28x compared to some state-of-the-art methods (e.g., tensor correlated total variation, and fully-connected tensor network) for inpainting tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08457 [pdf, other]

Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending

Authors: Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao

Abstract: Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create… ▽ More Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create a new INR that encapsulates information from both original representations. A direct approach involves applying traditional image editing methods to the INR rendering process. However, this method often results in blending distortions, artifacts, and color shifts, primarily due to the discretization of the underlying pixel grid and the introduction of boundary conditions for solving variational problems. To tackle this issue, we introduce the Neural Poisson Solver, a plug-and-play and universally applicable framework across different signal dimensions for blending visual signals represented by INRs. Our Neural Poisson Solver offers a variational problem-solving approach based on the continuous Poisson equation, demonstrating exceptional performance across various domains. Specifically, we propose a gradient-guided neural solver to represent the solution process of the variational problem, refining the target signal to achieve natural blending results. We also develop a Poisson equation-based loss and optimization scheme to train our solver, ensuring it effectively blends the input INR scenes while preserving their inherent structure and semantic content. The lack of dependence on additional prior knowledge makes our method easily adaptable to various task categories, highlighting its versatility. Comprehensive experimental results validate the robustness of our approach across multiple dimensions and blending tasks. △ Less

Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: accepted by ECCV 2024

arXiv:2407.07513 [pdf, other]

High-rate quantum digital signatures network with integrated silicon photonics

Authors: Yongqiang Du, Bing-Hong Li, Xin Hua, Xiao-Yu Cao, Zhengeng Zhao, Feng Xie, Zhenrong Zhang, Hua-Lei Yin, Xi Xiao, Ke** Wei

Abstract: The development of quantum networks is paramount towards practical and secure communications. Quantum digital signatures (QDS) offer an information-theoretically secure solution for ensuring data integrity, authenticity, and non-repudiation, rapidly growing from proof-of-concept to robust demonstrations. However, previous QDS systems relied on expensive and bulky optical equipment, limiting large-… ▽ More The development of quantum networks is paramount towards practical and secure communications. Quantum digital signatures (QDS) offer an information-theoretically secure solution for ensuring data integrity, authenticity, and non-repudiation, rapidly growing from proof-of-concept to robust demonstrations. However, previous QDS systems relied on expensive and bulky optical equipment, limiting large-scale deployment and reconfigurable networking construction. Here, we introduce and verify a chip-based QDS network, placing the complicated and expensive measurement devices in the central relay while each user needs only a low-cost transmitter. We demonstrate the network with a three-node setup using an integrated encoder chip and decoder chip. By develo** a 1-decoy-state one-time universal hash-QDS protocol, we achieve a maximum signature rate of 0.0414 times per second for a 1 Mbit file over fiber distances up to 200 km, surpassing all current state-of-the-art QDS experiments. This study validates the feasibility of chip-based QDS, paving the way for large-scale deployment and integration with existing fiber infrastructure. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 11 pages, 6 figures

arXiv:2407.06709 [pdf, other]

doi 10.1007/s11263-024-02157-w

Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification

Authors: Zitai Wang, Qianqian Xu, Zhiyong Yang, Peisong Wen, Yuan He, Xiaochun Cao, Qingming Huang

Abstract: Multi-label ranking, which returns multiple top-ranked labels for each instance, has a wide range of applications for visual tasks. Due to its complicated setting, prior arts have proposed various measures to evaluate model performances. However, both theoretical analysis and empirical observations show that a model might perform inconsistently on different measures. To bridge this gap, this paper… ▽ More Multi-label ranking, which returns multiple top-ranked labels for each instance, has a wide range of applications for visual tasks. Due to its complicated setting, prior arts have proposed various measures to evaluate model performances. However, both theoretical analysis and empirical observations show that a model might perform inconsistently on different measures. To bridge this gap, this paper proposes a novel measure named Top-K Pairwise Ranking (TKPR), and a series of analyses show that TKPR is compatible with existing ranking-based measures. In light of this, we further establish an empirical surrogate risk minimization framework for TKPR. On one hand, the proposed framework enjoys convex surrogate losses with the theoretical support of Fisher consistency. On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction. Finally, empirical results on benchmark datasets validate the effectiveness of the proposed framework. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06633 [pdf, other]

Variational Zero-shot Multispectral Pansharpening

Authors: Xiangyu Rui, Xiangyong Cao, Yining Li, Deyu Meng

Abstract: Pansharpening aims to generate a high spatial resolution multispectral image (HRMS) by fusing a low spatial resolution multispectral image (LRMS) and a panchromatic image (PAN). The most challenging issue for this task is that only the to-be-fused LRMS and PAN are available, and the existing deep learning-based methods are unsuitable since they rely on many training pairs. Traditional variational… ▽ More Pansharpening aims to generate a high spatial resolution multispectral image (HRMS) by fusing a low spatial resolution multispectral image (LRMS) and a panchromatic image (PAN). The most challenging issue for this task is that only the to-be-fused LRMS and PAN are available, and the existing deep learning-based methods are unsuitable since they rely on many training pairs. Traditional variational optimization (VO) based methods are well-suited for addressing such a problem. They focus on carefully designing explicit fusion rules as well as regularizations for an optimization problem, which are based on the researcher's discovery of the image relationships and image structures. Unlike previous VO-based methods, in this work, we explore such complex relationships by a parameterized term rather than a manually designed one. Specifically, we propose a zero-shot pansharpening method by introducing a neural network into the optimization objective. This network estimates a representation component of HRMS, which mainly describes the relationship between HRMS and PAN. In this way, the network achieves a similar goal to the so-called deep image prior because it implicitly regulates the relationship between the HRMS and PAN images through its inherent structure. We directly minimize this optimization objective via network parameters and the expected HRMS image through iterative updating. Extensive experiments on various benchmark datasets demonstrate that our proposed method can achieve better performance compared with other state-of-the-art methods. The codes are available at https://github.com/xyrui/PSDip. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06064 [pdf, other]

Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential to uncover underlying structures and details beyond the internal information modeling of traditional HSI denoising methods. However, the proper modeling of this additional prior poses a significant challenge. To alleviate this issue, the paper proposes a novel regularization term, Panchromatic Weighted Representation Coefficient Total Variation (PWRCTV). It employs the gradient maps of PAN images to automatically assign different weights of TV regularization for each pixel, resulting in larger weights for smooth areas and smaller weights for edges. This regularization forms the basis of a pan-denoising model, which is solved using the Alternating Direction Method of Multipliers. Extensive experiments on synthetic and real-world datasets demonstrate that PWRCTV outperforms several state-of-the-art methods in terms of metrics and visual quality. Furthermore, an HSI classification experiment confirms that PWRCTV, as a preprocessing method, can enhance the performance of downstream classification tasks. The code and data are available at https://github.com/shuangxu96/PWRCTV. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05023 [pdf, other]

SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

Authors: Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

Abstract: Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim… ▽ More Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-map** deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04928 [pdf, other]

CLIPVQA:Video Quality Assessment via CLIP

Authors: Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, Xiaochun Cao

Abstract: In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem… ▽ More In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully aggregated with the generated content information via a cross-attention module for producing video-language representation. Finally, the video-level quality and video-language representations are fused together for final video quality prediction, where a vectorized regression loss is employed for efficient end-to-end optimization. Comprehensive experiments are conducted on eight in-the-wild video datasets with diverse resolutions to evaluate the performance of CLIPVQA. The experimental results show that the proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods. A series of ablation studies are also performed to validate the effectiveness of each module in CLIPVQA. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03335 [pdf, other]

Dual-Domain Deep D-bar Method for Solving Electrical Impedance Tomography

Authors: Xiang Cao, Qiaoqiao Ding, Xiaoqun Zhang

Abstract: The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems due to its efficiency and simplicity. It provides a direct approach by applying low-pass filtering to the scattering data in the non-linear Fourier domain, thereby yielding a smoothed conductivity approximation. However, D-bar images often present low contrast and low resolu… ▽ More The regularized D-bar method is one of the most prominent methods for solving Electrical Impedance Tomography (EIT) problems due to its efficiency and simplicity. It provides a direct approach by applying low-pass filtering to the scattering data in the non-linear Fourier domain, thereby yielding a smoothed conductivity approximation. However, D-bar images often present low contrast and low resolution due to the absence of accurate high-frequency information and ill-posedness of the problem. In this paper, we proposed a dual-domain neural network architecture to retrieve high-contrast D-bar image sequences from low-contrast D-bar images. To further accentuate the spatial features of the conductivity distribution, the widely adopted U-net has been tailored for conductivity image calibration from the predicted D-bar image sequences. We call such a hybrid approach by Dual-Domain Deep D-bar method due to the consideration of both scattering data and image information. Compared to the single-scale structure, our proposed multi-scale structure exhibits superior capabilities in reducing artifacts and refining conductivity approximation. Additionally, solving discrete D-bar systems using the GMRES algorithm entails significant computational complexity, which is extremely time-consuming on CPU-based devices. To remedy this, we designed a surrogate GPU-based Richardson iterative method to accelerate the data enhancement process by D-bar. Numerical results are presented for simulated EIT data from the KIT4 and ACT4 systems to demonstrate notable improvements in absolute EIT imaging quality when compared to existing methodologies. △ Less

Submitted 12 May, 2024; originally announced July 2024.

Comments: 15 pages, 7 figures

arXiv:2407.02961 [pdf, other]

Towards a Scalable Reference-Free Evaluation of Generative Models

Authors: Azim Ospanov, **gwei Zhang, Mohammad Jalali, Xuenan Cao, Andrej Bogdanov, Farzan Farnia

Abstract: While standard evaluation scores for generative models are mostly reference-based, a reference-dependent assessment of generative models could be generally difficult due to the unavailability of applicable reference datasets. Recently, the reference-free entropy scores, VENDI and RKE, have been proposed to evaluate the diversity of generated data. However, estimating these scores from data leads t… ▽ More While standard evaluation scores for generative models are mostly reference-based, a reference-dependent assessment of generative models could be generally difficult due to the unavailability of applicable reference datasets. Recently, the reference-free entropy scores, VENDI and RKE, have been proposed to evaluate the diversity of generated data. However, estimating these scores from data leads to significant computational costs for large-scale generative models. In this work, we leverage the random Fourier features framework to reduce the computational price and propose the Fourier-based Kernel Entropy Approximation (FKEA) method. We utilize FKEA's approximated eigenspectrum of the kernel matrix to efficiently estimate the mentioned entropy scores. Furthermore, we show the application of FKEA's proxy eigenvectors to reveal the method's identified modes in evaluating the diversity of produced samples. We provide a stochastic implementation of the FKEA assessment algorithm with a complexity $O(n)$ linearly growing with sample size $n$. We extensively evaluate FKEA's numerical performance in application to standard image, text, and video datasets. Our empirical results indicate the method's scalability and interpretability applied to large-scale generative models. The codebase is available at https://github.com/aziksh-ospanov/FKEA. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02565 [pdf, other]

Orientifold Calabi-Yau Threefolds: Divisor Exchanges and Multi-Reflections

Authors: Xu Cao, Hongfei Gao, Xin Gao

Abstract: Using the Kreuzer-Skarke database of 4-dimensional reflexive polytopes, we systematically constructed a new database of orientifold Calabi-Yau threefolds with $h^{1,1}(X) \leq 12$. Our approach involved non-trivial $\mathbb{Z}_2$ involutions, incorporating both divisor exchanges and multi-divisor reflections acting on the Calabi-Yau threefolds. Each proper involution results in an orientifold Cala… ▽ More Using the Kreuzer-Skarke database of 4-dimensional reflexive polytopes, we systematically constructed a new database of orientifold Calabi-Yau threefolds with $h^{1,1}(X) \leq 12$. Our approach involved non-trivial $\mathbb{Z}_2$ involutions, incorporating both divisor exchanges and multi-divisor reflections acting on the Calabi-Yau threefolds. Each proper involution results in an orientifold Calabi-Yau threefolds and we constructed 320,386,067 such examples. We developed a novel algorithm that significantly reduces the complexity of determining all the fixed loci under the involutions, and clarifies the types of O-planes. Our results show that under proper involutions, the majority of cases end up with O3/O7-plane systems, and most of these further admit a naive Type IIB string vacua. Additionally, a new type of free action was determined. We also computed the smoothness and the splitting of Hodge numbers in the $\mathbb{Z}_2$-orbifold limit for these orientifold Calabi-Yau threefolds. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 55 pages, 2 figures

arXiv:2407.01916 [pdf, other]

doi 10.1109/TPAMI.2024.3416710

Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

Authors: Ke Ma, Qianqian Xu, **shan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, Qingming Huang

Abstract: Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fu… ▽ More Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted by IEEE TPAMI URL: https://ieeexplore.ieee.org/document/10564181

arXiv:2407.00933 [pdf, other]

Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis

Authors: Xueyao Zhang, Bo Yang, Zhiwen Yu, Xuelin Cao, George C. Alexandropoulos, Yan Zhang, Merouane Debbah, Chau Yuen

Abstract: This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can… ▽ More This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can cause outages during the task offloading. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) whose computationally capable metamaterials are leveraged to jointly enable V2I reflective links as well as to implement interference cancellation at the V2V links. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices to maximize an autonomous driving safety task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems solved via an alternate approximation method. Our simulation results showcase that the proposed RICS-assisted offloading framework significantly improves the safety of the considered autonomous driving network, yielding a nearly 34\% improvement in the safety coefficient of the CVs. In addition, it is demonstrated that the V2V data rate can be improved by around 60\% indicating that the RICS-induced adjustment of the signals can effectively mitigate interference at the V2V link. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00631 [pdf, other]

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

Authors: **tai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

Abstract: Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex data collection and question definition requiring medical expertise and a deep understanding of trial designs have hindered the involvement of AI thus far. This paper tackles these challenges by presenting a comprehensive suite of meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design, encompassing prediction of trial duration, patient dropout rate, serious adverse event, mortality rate, trial approval outcome, trial failure reason, drug dose finding, design of eligibility criteria. Furthermore, we provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and accelerating medical solution development. The curated dataset, metrics, and basic models are publicly available at https://github.com/ML2Health/ML2ClinicalTrials/tree/main/AI4Trial. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18844 [pdf, other]

Revisiting Backdoor Attacks against Large Vision-Language Models

Authors: Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Ee-Chien Chang, Xiaochun Cao

Abstract: Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor atta… ▽ More Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs for the first time, revealing certain limitations of most backdoor strategies in practical scenarios. We quantitatively evaluate the generalizability of six typical backdoor attacks on image caption benchmarks across multiple LVLMs, considering both visual and textual domain offsets. Our findings indicate that attack generalizability is positively correlated with the backdoor trigger's irrelevance to specific images/models and the preferential correlation of the trigger pattern. Additionally, we modify existing backdoor attacks based on the above key observations, demonstrating significant improvements in cross-domain scenario generalizability (+86% attack success rate). Notably, even without access to the instruction datasets, a multimodal instruction set can be successfully poisoned with a very low poisoning rate (0.2%), achieving an attack success rate of over 97%. This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs, necessitating more attention and in-depth research. △ Less

Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 24 pages, 8 figures

arXiv:2406.18599 [pdf, other]

Fudan Multi-purpose Active TArget Time Projection Chamber (fMeta-TPC) for Photonnuclear Reaction Experiments

Authors: Huang-Kai Wu, Xi-Yang Wang, Yu-Miao Wang, You-**g Wang, De-Qing Fang, Wan-Bing He, Wei-Hu Ma, Xi-Guang Cao, Chang-Bo Fu, Xian-Gai Deng, Yu-Gang Ma

Abstract: Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $α$-clustering nuclei. {\fcb In this work, the focus is on the s… ▽ More Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $α$-clustering nuclei. {\fcb In this work, the focus is on the study of the photonuclear reaction with the Laser Compton Scattering (LCS) gamma source, especially for the decay of the highly excited $α$-cluster state.} The design of fMeta-TPC is described and a comprehensive evaluation of its offline performance is performed by ultraviolet (UV) laser and $^{241}$Am $α$ source. The result shows that the intrinsic angular resolution of the detector is within 0.30$^{\circ}$ and has an energy resolution of 6.85\% for 3.0 MeV $α$ particles. The gain uniformity of the detector is about 10\% (RMS/Mean), tested by the $^{55}$Fe X-ray source. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 10 pages, 12 figures

arXiv:2406.18055 [pdf, other]

Filtering Reconfigurable Intelligent Computational Surface for RF Spectrum Purification

Authors: Kaining Wang, Bo Yang, Zhiwen Yu, Xuelin Cao, Mérouane Debbah, Chau Yuen

Abstract: The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-b… ▽ More The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-band signals has been largely overlooked, leading to RF spectrum pollution and degradation of wireless transmissions. To address this issue, we propose a novel surface structure called the Filtering Reconfigurable Intelligent Computational Surface (FRICS). We introduce two types of FRICS structures: one that dynamically reflects resonance band signals through a tunable spatial filter while absorbing out-of-band signals using metamaterials and the other one that dynamically amplifies in-band signals using computational metamaterials while reflecting out-of-band signals. To evaluate the performance of FRICS, we implement it in device-to-device (D2D) communication and vehicular-to-everything (V2X) scenarios. The experiments demonstrate the superiority of FRICS in signal-to-interference-noise ratio (SINR) and energy efficiency (EE). Finally, we discuss the critical challenges faced and promising techniques for implementing FRICS in future wireless systems. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17267 [pdf, other]

doi 10.1364/OE.527862

Efficient source-independent quantum conference key agreement

Authors: Yu Bao, Yi-Ran Xiao, Yu-Chen Song, Yao Fu, Xiao-Yu Cao, Hua-Lei Yin, Zeng-Bing Chen

Abstract: Quantum conference key agreement (QCKA) enables the unconditional secure distribution of conference keys among multiple participants. Due to challenges in high-fidelity preparation and long-distance distribution of multi-photon entanglement, entanglement-based QCKA is facing severe limitations in both key rate and scalability. Here, we propose a source-independent QCKA scheme utilizing the post-ma… ▽ More Quantum conference key agreement (QCKA) enables the unconditional secure distribution of conference keys among multiple participants. Due to challenges in high-fidelity preparation and long-distance distribution of multi-photon entanglement, entanglement-based QCKA is facing severe limitations in both key rate and scalability. Here, we propose a source-independent QCKA scheme utilizing the post-matching method, feasible within the entangled photon pair distribution network. We introduce an equivalent distributing virtual multi-photon entanglement protocol for providing the unconditional security proof even in the case of coherent attacks. For the symmetry star-network, comparing with previous $n$-photon entanglement protocol, the conference key rate is improved from $O(η^{n})$ to $O(η^{2})$, where $η$ is the transmittance from the entanglement source to one participant. Simulation results show that the performance of our protocol has multiple orders of magnitude advantages in the intercity distance. We anticipate that our approach will demonstrate its potential in the implementation of quantum networks. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 10 pages, 6 figures

Journal ref: Optics Express 32, 24629 (2024)

arXiv:2406.17248 [pdf, other]

MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework

Authors: Xusheng Xu, Jiangyu Cui, Zidong Cui, Runhong He, Qingyu Li, Xiaowei Li, Yanling Lin, Jiale Liu, Wuxin Liu, Jiale Lu, Maolin Luo, Chufan Lyu, Shijie Pan, Mosharev Pavel, Runqiu Shu, Jialiang Tang, Ruoqian Xu, Shu Xu, Kang Yang, Fan Yu, Qingguo Zeng, Haiying Zhao, Qiang Zheng, Junyuan Zhou, Xu Zhou , et al. (14 additional authors not shown)

Abstract: We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum… ▽ More We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit map**, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance. △ Less

Submitted 10 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.17126 [pdf, other]

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Authors: Wenqian Ye, Guangtao Zheng, Yunsheng Ma, Xu Cao, Bolin Lai, James M. Rehg, Aidong Zhang

Abstract: Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. How… ▽ More Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe robustness pitfall in deep learning models trained on single modality data. Multimodal Large Language Models (MLLMs), which integrate both vision and language models, have demonstrated strong capability in joint vision-language understanding. However, whether spurious biases are prevalent in MLLMs remains under-explored. We mitigate this gap by analyzing the spurious biases in a multimodal setting, uncovering the specific test data patterns that can manifest this problem when biases in the vision model cascade into the alignment between visual and text tokens in MLLMs. To better understand this problem, we introduce MM-SpuBench, a comprehensive visual question-answering (VQA) benchmark designed to evaluate MLLMs' reliance on nine distinct categories of spurious correlations from five open-source image datasets. The VQA dataset is built from human-understandable concept information (attributes). Leveraging this benchmark, we conduct a thorough evaluation of current state-of-the-art MLLMs. Our findings illuminate the persistence of the reliance on spurious correlations from these models and underscore the urge for new methodologies to mitigate spurious biases. To support the MLLM robustness research, we release our VQA benchmark at https://huggingface.co/datasets/mmbench/MM-SpuBench. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15994 [pdf, other]

The delayed radio emission in the black hole X-ray binary MAXI J1348$-$630

Authors: Bei You, Shuai-kang Yang, Zhen Yan, Xinwu Cao, Andrzej A. Zdziarski

Abstract: We explore the coupling between the accretion flow and the jet in black hole X-ray binary (BHXRB) MAXI J1348-630 by analyzing the X-ray and radio observations during its 2019 outburst. We measure the time delay between the radio and Comptonization fluxes with the interpolated cross-correlation function. For the first time, we find that the radio emission lags behind the X-ray Comptonization emissi… ▽ More We explore the coupling between the accretion flow and the jet in black hole X-ray binary (BHXRB) MAXI J1348-630 by analyzing the X-ray and radio observations during its 2019 outburst. We measure the time delay between the radio and Comptonization fluxes with the interpolated cross-correlation function. For the first time, we find that the radio emission lags behind the X-ray Comptonization emission by about 3 days during the rising phase covering the rising hard state and the following soft state. Such a long radio delay indicates that the Comptonization emission most likely originates from the advection-dominated accretion flow rather than the jet in this source. The Comptonization luminosity $L_{\rm C}$ in 0.1-100 keV and the radio luminosity $L_{\rm R}$ at 5.5 GHz, after considering the radio delay of $\sim 3$ days, follow the correlation with a slope $β= 3.04 \pm 0.93$, which is much steeper than the previously reported $β= 0.6$ or 1.40 using the total luminosity in the limited band (e.g., 1-10 keV) in the literature. This highlights the necessity of considering (1) the time delay, (2) the spectral decomposition, and (3) the broad energy band, in the radio-X-ray correlation analysis. As the jet reappears during the decaying phase (covering the soft state and the following decaying hard state) and the mini-outburst, the Componization and the radio emission appear to be almost simultaneous. And, the radio-Compton correlation during the mini-outburst becomes shallow with the correlation slope $β= 1.11 \pm 0.15$. These indicate an intrinsic difference in the accretion-jet coupling physics between the main outburst and the mini-outburst. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures, Accepted for publication in ApJ Letters

arXiv:2406.14194 [pdf, other]

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

Authors: Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

Abstract: The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and nar… ▽ More The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and narrow sources of bias. To address this problem, we introduce VLBiasBench, a benchmark aimed at evaluating biases in LVLMs comprehensively. In VLBiasBench, we construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status). To create a large-scale dataset, we use Stable Diffusion XL model to generate 46,848 high-quality images, which are combined with different questions to form 128,342 samples. These questions are categorized into open and close ended types, fully considering the sources of bias and comprehensively evaluating the biases of LVLM from multiple perspectives. We subsequently conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models. Our benchmark is available at https://github.com/Xiangkui-Cao/VLBiasBench. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13499 [pdf, other]

GraphMU: Repairing Robustness of Graph Neural Networks via Machine Unlearning

Authors: Tao Wu, Xinwen Cao, Chao Wang, Shaojie Qiao, Lin Yuan, Canyixing Cui, Yanbing Liu

Abstract: Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need… ▽ More Graph Neural Networks (GNNs) have demonstrated significant application potential in various fields. However, GNNs are still vulnerable to adversarial attacks. Numerous adversarial defense methods on GNNs are proposed to address the problem of adversarial attacks. However, these methods can only serve as a defense before poisoning, but cannot repair poisoned GNN. Therefore, there is an urgent need for a method to repair poisoned GNN. In this paper, we address this gap by introducing the novel concept of model repair for GNNs. We propose a repair framework, Repairing Robustness of Graph Neural Networks via Machine Unlearning (GraphMU), which aims to fine-tune poisoned GNN to forget adversarial samples without the need for complete retraining. We also introduce a unlearning validation method to ensure that our approach effectively forget specified poisoned data. To evaluate the effectiveness of GraphMU, we explore three fine-tuned subgraph construction scenarios based on the available perturbation information: (i) Known Perturbation Ratios, (ii) Known Complete Knowledge of Perturbations, and (iii) Unknown any Knowledge of Perturbations. Our extensive experiments, conducted across four citation datasets and four adversarial attack scenarios, demonstrate that GraphMU can effectively restore the performance of poisoned GNN. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13335 [pdf, other]

AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations

Authors: Xuelin Cao, Bo Yang, Kaining Wang, Xinghua Li, Zhiwen Yu, Chau Yuen, Yan Zhang, Zhu Han

Abstract: With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz… ▽ More With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high Quality-of-Service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state-of-the-art, and we further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13169 [pdf, other]

A surprising excess of radio emission in extremely stable quasars: a unique clue to jet launching?

Authors: Wen-Yong Kang, Jun-Xian Wang, Zhen-Yi Cai, Hao-Chen Wang, Wen-Ke Ren, Mai Liao, Feng Yuan, Andrzej Zdziarski, Xinwu Cao

Abstract: Quasars are generally divided into jetted radio-loud and non-jetted radio-quiet ones, but why only 10% quasars are radio loud has been puzzling for decades. Other than jet-induced-phenomena, black hole mass, or Eddington ratio, prominent difference between jetted and non-jetted quasars has scarcely been detected. Here we show a unique distinction between them and the mystery of jet launching could… ▽ More Quasars are generally divided into jetted radio-loud and non-jetted radio-quiet ones, but why only 10% quasars are radio loud has been puzzling for decades. Other than jet-induced-phenomena, black hole mass, or Eddington ratio, prominent difference between jetted and non-jetted quasars has scarcely been detected. Here we show a unique distinction between them and the mystery of jet launching could be disclosed by a prominent excess of radio emission in extremely stable quasars (ESQs, i.e., type 1 quasars with extremely weak variability in UV/optical over 10 years). Specifically, we find that $>$ 25% of the ESQs are detected by the FIRST/VLASS radio survey, while only $\sim$ 6-8% of the control sample, matched in redshift, luminosity, and Eddington ratio, are radio-detected. The excess of radio detection in ESQs has a significance of 4.4 $σ$ (99.9995%), and dominantly occurs at intermediate radio loudness with R $\sim$ 10 - 60. The radio detection fraction of ESQs also tends to increase in the ESQ samples selected with more stringent thresholds. Our results are in contrast to the common view that RL quasars are likely more variable in UV/optical due to jet contribution. New clues/challenge posed by our findings highlight the importance of extensive follow-up observations to probe the nature of jets in ESQs, and theoretical studies on the link between jet launching and ESQs. Moreover, our results makes ESQs, an essential population which has never been explored, unique targets in the burgeoning era of time domain astronomy, like their opposite counterparts of quasars exhibiting extreme variability or changing-look features. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 11 pages, 16 figures, Accepted by ApJ

arXiv:2406.10424 [pdf, other]

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Authors: Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, **tai Chen, Jianguo Cao, James M. Rehg

Abstract: Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships… ▽ More Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships among patterns in a set of images and extrapolate to predict subsequent patterns. This skill is crucial during the early neurodevelopmental stages of children. Inspired by the AVR tasks in Raven's Progressive Matrices (RPM) and Wechsler Intelligence Scale for Children (WISC), we propose a new dataset MaRs-VQA and a new benchmark VCog-Bench containing three datasets to evaluate the zero-shot AVR capability of MLLMs and compare their performance with existing human intelligent investigation. Our comparative experiments with different open-source and closed-source MLLMs on the VCog-Bench revealed a gap between MLLMs and human intelligence, highlighting the visual cognitive limitations of current MLLMs. We believe that the public release of VCog-Bench, consisting of MaRs-VQA, and the inference pipeline will drive progress toward the next generation of MLLMs with human-like visual cognition abilities. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures, the appendix will be updated soon

MSC Class: 68T01

arXiv:2406.09102 [pdf, ps, other]

Analytic smoothing effect of the Cauchy problem for a class of ultra-parabolic equations

Authors: Xiao-Dong Cao, Chao-Jiang Xu

Abstract: In this paper, we study a class of strongly degenerate ultraparabolic equations with analytic coefficients. We demonstrate that the Cauchy problem exhibits an analytic smoothing effect. This means that, with an initial datum belonging to the Sobolev space $H^s$ (of real index s), the associated Cauchy problem admits a unique solution that is analytic in all spatial variables for any strictly posit… ▽ More In this paper, we study a class of strongly degenerate ultraparabolic equations with analytic coefficients. We demonstrate that the Cauchy problem exhibits an analytic smoothing effect. This means that, with an initial datum belonging to the Sobolev space $H^s$ (of real index s), the associated Cauchy problem admits a unique solution that is analytic in all spatial variables for any strictly positive time. This smoothing effect property is similar to that of the Cauchy problem for uniformly parabolic equations with analytic coefficients. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08165 [pdf, other]

Double pion photoproduction off nucleons in covariant chiral perturbation theory

Authors: Kai-Ge Kang, Xiong-Hui Cao, De-Liang Yao, Han-Qing Zheng

Abstract: The double pion photoproduction off nucleons near threshold is analyzed in a covariant baryon chiral perturbation theory up to next to leading order, where the $Δ(1232)$, $N^*(1400)$ and $ρ(770)$ resonances are included as explicit degrees of freedom. For the process $γp \to π^+ π^0 n$, the chiral results of total cross sections, invariant-mass distributions and beam-helicity asymmetry are in good… ▽ More The double pion photoproduction off nucleons near threshold is analyzed in a covariant baryon chiral perturbation theory up to next to leading order, where the $Δ(1232)$, $N^*(1400)$ and $ρ(770)$ resonances are included as explicit degrees of freedom. For the process $γp \to π^+ π^0 n$, the chiral results of total cross sections, invariant-mass distributions and beam-helicity asymmetry are in good agreement with the experimental data within uncertainties. For the process $γp \to π^0 π^0 p$, the prediction of total cross section deviates from the existing experimental data. Once the final-state interaction of $ππ$ in the isoscalar S-wave channel is taken into account, a good description of the cross section is achieved. The effect of the Roper resonance always turns out be negligible, and hence can be thrown away in future study of this process. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 27 pages, 11 figures, 1 table

arXiv:2406.06089 [pdf, other]

Texture Re-scalable Universal Adversarial Perturbation

Authors: Yihao Huang, Qing Guo, Felix Juefei-Xu, Ming Hu, Xiaojun Jia, Xiaochun Cao, Geguang Pu, Yang Liu

Abstract: Universal adversarial perturbation (UAP), also known as image-agnostic perturbation, is a fixed perturbation map that can fool the classifier with high probabilities on arbitrary images, making it more practical for attacking deep models in the real world. Previous UAP methods generate a scale-fixed and texture-fixed perturbation map for all images, which ignores the multi-scale objects in images… ▽ More Universal adversarial perturbation (UAP), also known as image-agnostic perturbation, is a fixed perturbation map that can fool the classifier with high probabilities on arbitrary images, making it more practical for attacking deep models in the real world. Previous UAP methods generate a scale-fixed and texture-fixed perturbation map for all images, which ignores the multi-scale objects in images and usually results in a low fooling ratio. Since the widely used convolution neural networks tend to classify objects according to semantic information stored in local textures, it seems a reasonable and intuitive way to improve the UAP from the perspective of utilizing local contents effectively. In this work, we find that the fooling ratios significantly increase when we add a constraint to encourage a small-scale UAP map and repeat it vertically and horizontally to fill the whole image domain. To this end, we propose texture scale-constrained UAP (TSC-UAP), a simple yet effective UAP enhancement method that automatically generates UAPs with category-specific local textures that can fool deep models more easily. Through a low-cost operation that restricts the texture scale, TSC-UAP achieves a considerable improvement in the fooling ratio and attack transferability for both data-dependent and data-free UAP methods. Experiments conducted on two state-of-the-art UAP methods, eight popular CNN models and four classical datasets show the remarkable performance of TSC-UAP. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 14 pages (accepted by TIFS2024)

arXiv:2406.05982 [pdf]

doi 10.1007/s10334-024-01182-7

Artificial Intelligence for Neuro MRI Acquisition: A Review

Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potential in enhancing the efficiency and throughput of acquisition steps. This review discusses several pivotal AI-based methods in neuro MRI acquisition, focusing on their technological advances, impact on clinical practice, and potential risks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Magn Reson Mater Phy (2024)

arXiv:2406.04612 [pdf, other]

Revisiting Attention Weights as Interpretations of Message-Passing Neural Networks

Authors: Yong-Min Shin, Siqing Li, Xin Cao, Won-Yong Shin

Abstract: The self-attention mechanism has been adopted in several widely-used message-passing neural networks (MPNNs) (e.g., GATs), which adaptively controls the amount of information that flows along the edges of the underlying graph. This usage of attention has made such models a baseline for studies on explainable AI (XAI) since interpretations via attention have been popularized in various domains (e.g… ▽ More The self-attention mechanism has been adopted in several widely-used message-passing neural networks (MPNNs) (e.g., GATs), which adaptively controls the amount of information that flows along the edges of the underlying graph. This usage of attention has made such models a baseline for studies on explainable AI (XAI) since interpretations via attention have been popularized in various domains (e.g., natural language processing and computer vision). However, existing studies often use naive calculations to derive attribution scores from attention, and do not take the precise and careful calculation of edge attribution into consideration. In our study, we aim to fill the gap between the widespread usage of attention-enabled MPNNs and their potential in largely under-explored explainability, a topic that has been actively investigated in other areas. To this end, as the first attempt, we formalize the problem of edge attribution from attention weights in GNNs. Then, we propose GATT, an edge attribution calculation method built upon the computation tree. Through comprehensive experiments, we demonstrate the effectiveness of our proposed method when evaluating attributions from GATs. Conversely, we empirically validate that simply averaging attention weights over graph attention layers is insufficient to interpret the GAT model's behavior. Code is publicly available at https://github.com/jordan7186/GAtt/tree/main. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures, 5 tables

arXiv:2405.21018 [pdf, other]

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Authors: Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, **dong Gu, Yang Liu, Xiaochun Cao, Min Lin

Abstract: Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milesto… ▽ More Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milestone, its attacking efficiency remains unsatisfactory. In this paper, we present several improved (empirical) techniques for optimization-based jailbreaks like GCG. We first observe that the single target template of "Sure" largely limits the attacking performance of GCG; given this, we propose to apply diverse target templates containing harmful self-suggestion and/or guidance to mislead LLMs. Besides, from the optimization aspects, we propose an automatic multi-coordinate updating strategy in GCG (i.e., adaptively deciding how many tokens to replace in each step) to accelerate convergence, as well as tricks like easy-to-hard initialisation. Then, we combine these improved technologies to develop an efficient jailbreak method, dubbed I-GCG. In our experiments, we evaluate on a series of benchmarks (such as NeurIPS 2023 Red Teaming Track). The results demonstrate that our improved techniques can help GCG outperform state-of-the-art jailbreaking attacks and achieve nearly 100% attack success rate. The code is released at https://github.com/jiaxiaojunQAQ/I-GCG. △ Less

Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.18044 [pdf, other]

Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation

Authors: Jiaqi Shao, Tianjun Yuan, Tao Lin, Xuanyu Cao, Bing Luo

Abstract: Cognitive abilities, such as Theory of Mind (ToM), play a vital role in facilitating cooperation in human social interactions. However, our study reveals that agents with higher ToM abilities may not necessarily exhibit better cooperative behavior compared to those with lower ToM abilities. To address this challenge, we propose a novel matching coalition mechanism that leverages the strengths of a… ▽ More Cognitive abilities, such as Theory of Mind (ToM), play a vital role in facilitating cooperation in human social interactions. However, our study reveals that agents with higher ToM abilities may not necessarily exhibit better cooperative behavior compared to those with lower ToM abilities. To address this challenge, we propose a novel matching coalition mechanism that leverages the strengths of agents with different ToM levels by explicitly considering belief alignment and specialized abilities when forming coalitions. Our proposed matching algorithm seeks to find stable coalitions that maximize the potential for cooperative behavior and ensure long-term viability. By incorporating cognitive insights into the design of multi-agent systems, our work demonstrates the potential of leveraging ToM to create more sophisticated and human-like coordination strategies that foster cooperation and improve overall system performance. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16766 [pdf, other]

Reframing the Relationship in Out-of-Distribution Detection

Authors: YuXiao Lee, Xiaofeng Cao

Abstract: The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. The utilization of LLMs as intermediary agents in various tasks has yielded promising results, sparking a wave of innovation in artificial intelligence. Building on these breakthroughs, we introduce a novel approach that in… ▽ More The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. The utilization of LLMs as intermediary agents in various tasks has yielded promising results, sparking a wave of innovation in artificial intelligence. Building on these breakthroughs, we introduce a novel approach that integrates the agent paradigm into the Out-of-distribution (OOD) detection task, aiming to enhance its robustness and adaptability. Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process. These agents function as dynamic observers and communication hubs, interacting with both In-distribution (ID) labels and data inputs to form vector triangle relationships. This triangular framework offers a more nuanced approach than the traditional binary relationship, allowing for better separation and identification of ID and OOD inputs. Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods in a diverse array of real-world scenarios. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16073 [pdf]

Unveiling the 3D Morphology of Epitaxial GaAs/AlGaAs Quantum Dots

Authors: Yiteng Zhang, Lukas Gruenewald, Xin Cao, Doaa Abdelbarey, Xian Zheng, Eddy P. Rugeramigabo, Johan Verbeeck, Michael Zopf, Fei Ding

Abstract: Strain-free GaAs/AlGaAs semiconductor quantum dots (QDs) grown by droplet etching and nanohole infilling (DENI) are highly promising candidates for the on-demand generation of indistinguishable and entangled photon sources. The spectroscopic fingerprint and quantum optical properties of QDs are significantly influenced by their morphology. The effects of nanohole geometry and infilled material on… ▽ More Strain-free GaAs/AlGaAs semiconductor quantum dots (QDs) grown by droplet etching and nanohole infilling (DENI) are highly promising candidates for the on-demand generation of indistinguishable and entangled photon sources. The spectroscopic fingerprint and quantum optical properties of QDs are significantly influenced by their morphology. The effects of nanohole geometry and infilled material on the exciton binding energies and fine structure splitting are well understood. However, a comprehensive understanding of GaAs/AlGaAs QD morphology remains elusive. To address this, we employ high-resolution scanning transmission electron microscopy (STEM) and reverse engineering through selective chemical etching and atomic force microscopy (AFM). Cross-sectional STEM of uncapped QDs reveals an inverted conical nanohole with Al-rich sidewalls and defect-free interfaces. Subsequent selective chemical etching and AFM measurements further reveal asymmetries in element distribution. This study enhances the understanding of DENI QD morphology and provides a fundamental three-dimensional structural model for simulating and optimizing their optoelectronic properties. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16058 [pdf, other]

A Novel Privacy Enhancement Scheme with Dynamic Quantization for Federated Learning

Authors: Yifan Wang, Xianghui Cao, Shi **, Mo-Yuen Chow

Abstract: Federated learning (FL) has been widely regarded as a promising paradigm for privacy preservation of raw data in machine learning. Although, the data privacy in FL is locally protected to some extent, it is still a desideratum to enhance privacy and alleviate communication overhead caused by repetitively transmitting model parameters. Typically, these challenges are addressed separately, or jointl… ▽ More Federated learning (FL) has been widely regarded as a promising paradigm for privacy preservation of raw data in machine learning. Although, the data privacy in FL is locally protected to some extent, it is still a desideratum to enhance privacy and alleviate communication overhead caused by repetitively transmitting model parameters. Typically, these challenges are addressed separately, or jointly via a unified scheme that consists of noise-injected privacy mechanism and communication compression, which may lead to model corruption due to the introduced composite noise. In this work, we propose a novel model-splitting privacy-preserving FL (MSP-FL) scheme to achieve private FL with precise accuracy guarantee. Based upon MSP-FL, we further propose a model-splitting privacy-preserving FL with dynamic quantization (MSPDQ-FL) to mitigate the communication overhead, which incorporates a shrinking quantization interval to reduce the quantization error. We provide privacy and convergence analysis for both MSP-FL and MSPDQ-FL under non-i.i.d. dataset, partial clients participation and finite quantization level. Numerical results are presented to validate the superiority of the proposed schemes. △ Less

Submitted 27 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15873 [pdf, other]

Ramp from Replica Trick

Authors: Xuchen Cao, Thomas Faulkner

Abstract: We compute the spectral form factor of the modular Hamiltonian $K=-\lnρ_A$ associated to the reduced density matrix of a Haar random state. A ramp is demonstrated and we find an analytic expression for its slope. Our method involves an application of the replica trick, where we first calculate the correlator $<\text{tr}ρ_A^n\;\text{tr}ρ_A^m>$ at large bond dimension and then analytically continue… ▽ More We compute the spectral form factor of the modular Hamiltonian $K=-\lnρ_A$ associated to the reduced density matrix of a Haar random state. A ramp is demonstrated and we find an analytic expression for its slope. Our method involves an application of the replica trick, where we first calculate the correlator $<\text{tr}ρ_A^n\;\text{tr}ρ_A^m>$ at large bond dimension and then analytically continue the indices $n,m$ from integers to arbitrary complex numbers. We use steepest descent methods at large modular times to extract the ramp. The large bond dimension limit of the replicated partition function is dominated by a sum over \emph{annular non-crossing permutations}. We explored the similarity between our results and calculations of the spectral form factor in low dimensional gravitational theories where the ramp is determined by the double trumpet geometry. We find there is an underlying resemblance in the two calculations, when we interpret the annular non-crossing permutations as representing a discretized version of the double trumpet. Similar results are found for an equilibrated pure state in place of the Haar random state. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 35 pages, 13 figures

arXiv:2405.15086 [pdf, other]

Parametrically controlled chiral interface for superconducting quantum devices

Authors: Xi Cao, Abdullah Irfan, Michael Mollenhauer, Kaushik Singirikonda, Wolfgang Pfaff

Abstract: Nonreciprocal microwave routing plays a crucial role for measuring quantum circuits, and allows for realizing cascaded quantum systems for generating and stabilizing entanglement between non-interacting qubits. The most commonly used tools for implementing directionality are ferrite-based circulators. These devices are versatile, but suffer from excess loss, a large footprint, and fixed directiona… ▽ More Nonreciprocal microwave routing plays a crucial role for measuring quantum circuits, and allows for realizing cascaded quantum systems for generating and stabilizing entanglement between non-interacting qubits. The most commonly used tools for implementing directionality are ferrite-based circulators. These devices are versatile, but suffer from excess loss, a large footprint, and fixed directionality. For utilizing nonreciprocity in scalable quantum circuits it is desirable to develop efficient integration of low-loss and in-situ controllable directional elements. Here, we report the design and experimental realization of a controllable directional interface that may be integrated directly with superconducting qubits. In the presented device, nonreciprocity is realized through a combination of interference and phase-controlled parametric pum**. We have achieved a maximum directionality of around 30\,dB, and the performance of the device is predicted quantitatively from independent calibration measurements. Using the excellent agreement of model and experiment, we predict that the circuit will be useable as a chiral qubit interface with inefficiencies at the one-percent level or below. Our work provides a route toward isolator-free qubit readout schemes and high-fidelity entanglement generation in all-to-all connected networks of superconducting quantum devices. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 20 pages, 13 figures

arXiv:2405.14832 [pdf, other]

Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

Authors: Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, **gxi Xu, Philip Torr, Xun Cao, Yao Yao

Abstract: Generating high-quality 3D assets from text and images has long been challenging, primarily due to the absence of scalable 3D representations capable of capturing intricate geometry distributions. In this work, we introduce Direct3D, a native 3D generative model scalable to in-the-wild input images, without requiring a multiview diffusion model or SDS optimization. Our approach comprises two prima… ▽ More Generating high-quality 3D assets from text and images has long been challenging, primarily due to the absence of scalable 3D representations capable of capturing intricate geometry distributions. In this work, we introduce Direct3D, a native 3D generative model scalable to in-the-wild input images, without requiring a multiview diffusion model or SDS optimization. Our approach comprises two primary components: a Direct 3D Variational Auto-Encoder (D3D-VAE) and a Direct 3D Diffusion Transformer (D3D-DiT). D3D-VAE efficiently encodes high-resolution 3D shapes into a compact and continuous latent triplane space. Notably, our method directly supervises the decoded geometry using a semi-continuous surface sampling strategy, diverging from previous methods relying on rendered images as supervision signals. D3D-DiT models the distribution of encoded 3D latents and is specifically designed to fuse positional information from the three feature maps of the triplane latent, enabling a native 3D generative model scalable to large-scale 3D datasets. Additionally, we introduce an innovative image-to-3D generation pipeline incorporating semantic and pixel-level image conditions, allowing the model to produce 3D shapes consistent with the provided conditional image input. Extensive experiments demonstrate the superiority of our large-scale pre-trained Direct3D over previous image-to-3D approaches, achieving significantly better generation quality and generalization ability, thus establishing a new state-of-the-art for 3D content creation. Project page: https://nju-3dv.github.io/projects/Direct3D/. △ Less

Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11733 [pdf, other]

Simulating a Chern Insulator with C = $\pm$2 on Synthetic Floquet Lattice

Authors: Lingxiao Lei, Weichen Wang, Guangyao Huang, Shun Hu, Xi Cao, Xinfang Zhang, Mingtang Deng, **xing Chen

Abstract: The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently map** it onto the Floquet lattice… ▽ More The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently map** it onto the Floquet lattice to simulate its topological properties. To determine the Chern number of our Floquet-version model, we extend the energy pum** method proposed by Martin et al. [Phys. Rev. X 7, 041008 (2017)] and the topological oscillation method introduced by Boyers et al. [Phys. Rev. Lett. 125, 160505 (2020)], followed by numerical simulations for both methodologies. The simulation results demonstrate the successful extraction of the Chern number using either of these methods, providing an excellent prediction of the phase diagram that closely aligns with the theoretical one derived from the original bilayer half-BHZ model. Finally, we briefly discuss a potential experimental implementation for our model. Our work demonstrates significant potential for simulating complex topological matter using quantum computing platforms, thereby paving the way for constructing a more universal simulator for non-interacting topological quantum states and advancing our understanding of these intriguing phenomena. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.10663 [pdf, ps, other]

Instability of Circumnuclear Gas Supply as An Origin of "Changing-look" Phenomenon of Supermassive Blackholes

Authors: J. Wang, D. W. Xu, Xinwu Cao, C. Gao, C. H. Xie, J. Y. Wei

Abstract: The origin of the "Changing-look" (CL) phenomenon in supermassive black holes (SMBHs) remains an open issue. This study aims to shed light on this phenomenon by focusing on a sample that encompasses all known repeating CL active galactic nuclei (AGNs). Through the identification of a characteristic time scale for the CL phenomenon, it was observed that larger SMBHs possess shorter characteristic t… ▽ More The origin of the "Changing-look" (CL) phenomenon in supermassive black holes (SMBHs) remains an open issue. This study aims to shed light on this phenomenon by focusing on a sample that encompasses all known repeating CL active galactic nuclei (AGNs). Through the identification of a characteristic time scale for the CL phenomenon, it was observed that larger SMBHs possess shorter characteristic timescales, while smaller SMBHs exhibit longer timescales. These findings reveal a significant contrast to the traditional AGN variability that has been adequately explained by the AGN's disk instability model. This stark discrepancy highlights a distinct origin of the CL phenomenon, distinguishing it from traditional AGN variability. By properly predicting the characteristic time scale and its dependence on SMBH mass, we propose that the CL phenomenon is likely a result of a variation in accretion rate caused by a sudden change in the supply of circumnuclear gas during the transition between active and passive SMBH fueling stages. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 14 pages, 4 figures and 2 tables, accepted by ApJ

arXiv:2405.10513 [pdf, other]

Federated Learning With Energy Harvesting Devices: An MDP Framework

Authors: Kai Zhang, Xuanyu Cao

Abstract: Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesti… ▽ More Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is impacted by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09782 [pdf, other]

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

Authors: Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Runmin Cong, Xiaochun Cao, Qingming Huang

Abstract: This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified withou… ▽ More This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information. In pursuit of this, we propose a generic approach that evaluates each salient object separately and then combines the results, effectively alleviating the imbalance. We further develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes. Theoretically, we provide evidence supporting the validity of our new metrics and present the generalization analysis of SOD. Extensive experiments demonstrate the effectiveness of our method. The code is available at https://github.com/Ferry-Li/SI-SOD. △ Less

Submitted 27 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: This paper has been accepted by ICML2024

arXiv:2405.08816 [pdf, other]

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field. △ Less

Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

arXiv:2405.07780 [pdf, other]

Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition

Authors: Zhiyong Yang, Qianqian Xu, Zitai Wang, Sicong Li, Boyu Han, Shilong Bao, Xiaochun Cao, Qingming Huang

Abstract: This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused… ▽ More This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused on a particular neighbor. Traditional methods predominantly use a Mixture-of-Expert (MoE) approach, targeting a few fixed test label distributions that exhibit substantial global variations. However, the local variations are left unconsidered. To address this issue, we propose a new MoE strategy, $\mathsf{DirMixE}$, which assigns experts to different Dirichlet meta-distributions of the label distribution, each targeting a specific aspect of local variations. Additionally, the diversity among these Dirichlet meta-distributions inherently captures global variations. This dual-level approach also leads to a more stable objective function, allowing us to sample different test distributions better to quantify the mean and variance of performance outcomes. Theoretically, we show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization. Comprehensive experiments across multiple benchmarks confirm the effectiveness of $\mathsf{DirMixE}$. The code is available at \url{https://github.com/scongl/DirMixE}. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06948 [pdf, other]

Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

Authors: Shengyuan Liu, Bo Wang, Ye Ma, Te Yang, Xipeng Cao, Quan Chen, Han Li, Di Dong, Peng Jiang

Abstract: Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these… ▽ More Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel metric GroundingScore to evaluate subject alignment thoroughly. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method. The code will be released soon. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 26 pages, 13 figures

arXiv:2405.06896 [pdf, other]

doi 10.1007/JHEP07(2024)095

Energy momentum tensor on and off the light cone: exposition with scalar Yukawa theory

Authors: Xianghui Cao, Siqi Xu, Yang Li, Guangyao Chen, Xingbo Zhao, Vladimir A. Karmanov, James P. Vary

Abstract: We compute the gravitational form factors $A_i$, $D_i$ and $\bar c_i$ of the scalar Yukawa theory using both the light-cone and covariant perturbation theory at the one-loop level. The light-cone formalism provides a potential approach to access these form factors beyond the perturbative regime. However, unlike the covariant formulation, the Poincaré symmetry on the light cone is not manifest. In… ▽ More We compute the gravitational form factors $A_i$, $D_i$ and $\bar c_i$ of the scalar Yukawa theory using both the light-cone and covariant perturbation theory at the one-loop level. The light-cone formalism provides a potential approach to access these form factors beyond the perturbative regime. However, unlike the covariant formulation, the Poincaré symmetry on the light cone is not manifest. In this work, we use perturbation theory as a benchmark to extract the gravitational form factors from the light-front energy-momentum tensor. By comparing results on and off the light cone, we identify $T^{++}, T^{+a}, T^{+-}, T^{12}$ as the "good currents" that are properly renormalized and can be used to extract the gravitational form factors. △ Less

Submitted 11 July, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: 13 pages, 2 figures. Published in JHEP

Journal ref: JHEP 07, 095 (2024)

arXiv:2405.06598 [pdf, other]

A Lightweight Transformer for Remote Sensing Image Change Captioning

Authors: Dongwei Sun, Yajie Bao, Xiangyong Cao

Abstract: Remote sensing image change captioning (RSICC) aims to automatically generate sentences that describe content differences in remote sensing bitemporal images. Recently, attention-based transformers have become a prevalent idea for capturing the features of global change. However, existing transformer-based RSICC methods face challenges, e.g., high parameters and high computational complexity cause… ▽ More Remote sensing image change captioning (RSICC) aims to automatically generate sentences that describe content differences in remote sensing bitemporal images. Recently, attention-based transformers have become a prevalent idea for capturing the features of global change. However, existing transformer-based RSICC methods face challenges, e.g., high parameters and high computational complexity caused by the self-attention operation in the transformer encoder component. To alleviate these issues, this paper proposes a Sparse Focus Transformer (SFT) for the RSICC task. Specifically, the SFT network consists of three main components, i.e. a high-level features extractor based on a convolutional neural network (CNN), a sparse focus attention mechanism-based transformer encoder network designed to locate and capture changing regions in dual-temporal images, and a description decoder that embeds images and words to generate sentences for captioning differences. The proposed SFT network can reduce the parameter number and computational complexity by incorporating a sparse attention mechanism within the transformer encoder network. Experimental results on various datasets demonstrate that even with a reduction of over 90\% in parameters and computational complexity for the transformer encoder, our proposed network can still obtain competitive performance compared to other state-of-the-art RSICC methods. The code can be available at △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05545 [pdf, other]

Deep Hierarchical Graph Alignment Kernels

Authors: Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

Abstract: Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relati… ▽ More Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https://github.com/EWesternRa/DHGAK). △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04788 [pdf, other]

DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector

Authors: Kaiyu Li, Xiangyong Cao, Yupeng Deng, Junmin Liu, Deyu Meng, Zhi Wang

Abstract: Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasonin… ▽ More Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely DiffMatch. The insight of DiffMatch is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce metric-aware supervision by feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of DiffMatch. For instance, DiffMatch improves the FixMatch baseline by +5.3 IoU on WHU-CD and by +2.4 IoU on LEVIR-CD with 5% labels. In addition, our CEG strategy, in an un-supervised manner, can achieve performance far superior to state-of-the-art un-supervised CD methods. △ Less

Submitted 22 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures

Showing 1–50 of 1,292 results for author: cao, X