-
Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models
Authors:
Xiaolin Xing,
Zhiwei He,
Haoyu Xu,
Xing Wang,
Rui Wang,
Yu Hong
Abstract:
This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three…
▽ More
This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.To address these questions, we propose an innovative evaluation method for Cross-lingual Semantic Consistency (xSC) using the LaBSE model. We further introduce metrics for Cross-lingual Accuracy Consistency (xAC) and Cross-lingual Timeliness Consistency (xTC) to comprehensively assess the models' performance regarding semantic, accuracy, and timeliness inconsistencies. By harmonizing these metrics, we provide a holistic measurement of LLMs' cross-lingual consistency. Our findings aim to enhance the understanding and improvement of multilingual capabilities and interpretability in LLMs, contributing to the development of more robust and reliable multilingual language models.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
MMR-Mamba: Multi-Contrast MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion
Authors:
**g Zou,
Lanqing Liu,
Qi Chen,
Shujun Wang,
Xiaohan Xing,
**g Qin
Abstract:
Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic…
▽ More
Multi-contrast MRI acceleration has become prevalent in MR imaging, enabling the reconstruction of high-quality MR images from under-sampled k-space data of the target modality, using guidance from a fully-sampled auxiliary modality. The main crux lies in efficiently and comprehensively integrating complementary information from the auxiliary modality. Existing methods either suffer from quadratic computational complexity or fail to capture long-range correlated features comprehensively. In this work, we propose MMR-Mamba, a novel framework that achieves comprehensive integration of multi-contrast features through Mamba and spatial-frequency information fusion. Firstly, we design the \textit{Target modality-guided Cross Mamba} (TCM) module in the spatial domain, which maximally restores the target modality information by selectively absorbing useful information from the auxiliary modality. Secondly, leveraging global properties of the Fourier domain, we introduce the \textit{Selective Frequency Fusion} (SFF) module to efficiently integrate global information in the frequency domain and recover high-frequency signals for the reconstruction of structure details. Additionally, we present the \textit{Adaptive Spatial-Frequency Fusion} (ASFF) module, which enhances fused features by supplementing less informative features from one domain with corresponding features from the other domain. These innovative strategies ensure efficient feature fusion across spatial and frequency domains, avoiding the introduction of redundant information and facilitating the reconstruction of high-quality target images. Extensive experiments on the BraTS and fastMRI knee datasets demonstrate the superiority of the proposed MMR-Mamba over state-of-the-art MRI reconstruction methods.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery
Authors:
Yingying Fang,
Zihao **,
Xiaodan Xing,
Simon Walsh,
Guang Yang
Abstract:
In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To…
▽ More
In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To bridge this gap, we propose an explainable model that is equipped with both decision reasoning and feature identification capabilities. Our approach not only detects influential image patterns but also uncovers the decisive features that drive the model's final predictions. By implementing our method, we can efficiently identify and visualise class-specific features leveraged by the data-driven model, providing insights into the decision-making processes of deep learning models. We validated our model in the demanding realm of medical prognosis task, demonstrating its efficacy and potential in enhancing the reliability of AI in healthcare and in discovering new knowledge in diseases where prognostic understanding is limited.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation
Authors:
Sheng Zhang,
Yang Nan,
Yingying Fang,
Shiyi Wang,
Xiaodan Xing,
Zhifan Gao,
Guang Yang
Abstract:
Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue…
▽ More
Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue. Inspired by these, this paper introduces an effective lung organ segmentation method called Fuzzy Attention-based Border Rendering (FABR) network. Since fuzzy logic can handle the uncertainty in feature extraction, hence the fusion of deep networks and fuzzy sets should be a viable solution for better performance. Meanwhile, unlike prior top-tier methods that operate on all regular dense points, our FABR depicts lung organ regions as cube-trees, focusing only on recycle-sampled border vulnerable points, rendering the severely discontinuous, false-negative/positive organ regions with a novel Global-Local Cube-tree Fusion (GLCF) module. All experimental results, on four challenging datasets of airway & artery, demonstrate that our method can achieve the favorable performance significantly.
△ Less
Submitted 1 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Competing excitation quenching and charge exchange in ultracold Li-Ba$^+$ collisions
Authors:
Xiaodong Xing,
Pascal Weckesser,
Fabian Thielemann,
Tibor Jónás,
Romain Vexiau,
Nadia Bouloufa-Maafa,
Eliane Luc-Koenig,
Kirk W. Madison,
Andrea Orbán,
Ting Xie,
Tobias Schaetz,
Olivier Dulieu
Abstract:
Hybrid atom-ion systems are a rich and powerful platform for studying chemical reactions, as they feature both excellent control over the electronic state preparation and readout as well as a versatile tunability over the scattering energy, ranging from the few-partial wave regime to the quantum regime. In this work, we make use of these excellent control knobs, and present a joint experimental an…
▽ More
Hybrid atom-ion systems are a rich and powerful platform for studying chemical reactions, as they feature both excellent control over the electronic state preparation and readout as well as a versatile tunability over the scattering energy, ranging from the few-partial wave regime to the quantum regime. In this work, we make use of these excellent control knobs, and present a joint experimental and theoretical study of the collisions of a single $^{138}$Ba$^+$ ion prepared in the $5d\,^2D_{3/2,5/2}$ metastable states with a ground state $^6$Li gas near quantum degeneracy. We show that in contrast to previously reported atom-ion mixtures, several non-radiative processes, including charge exchange, excitation exchange and quenching, compete with each other due to the inherent complexity of the ion-atom molecular structure. We present a full quantum model based on high-level electronic structure calculations involving spin-orbit couplings. Results are in excellent agreement with observations, highlighting the strong coupling between the internal angular momenta and the mechanical rotation of the colliding pair, which is relevant in any other hybrid system composed of an alkali-metal atom and an alkaline-earth ion.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results
Authors:
Xin **,
Chunle Guo,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangchen Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Ruoqi Li,
Chang Liu,
Ziyi Wang,
Yao Du,
**g**g Yang,
Long Bao,
Heng Sun,
Xiangyu Kong,
Xiaoxia Xing,
**long Wu,
Yuanyang Xue,
Hyunhee Park,
Sejun Song,
Changho Kim,
**gfan Tan
, et al. (17 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Few-shot RAW Image Denoising track on MIPI 2024. In total, 165 participants were successfully registered, and 7 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art erformance on Few-shot RAW Image Denoising. More details of this challenge and the link to the dataset can be found at https://mipichallenge.org/MIPI2024.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
Authors:
Xingrun Xing,
Zheng Zhang,
Ziyi Ni,
Shitao Xiao,
Yiming Ju,
Siqi Fan,
Yequan Wang,
Jiajun Zhang,
Guoqi Li
Abstract:
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in…
▽ More
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with {0,1} levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https://github.com/Xingrun-Xing/SpikeLM.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
CAMP: Compiler and Allocator-based Heap Memory Protection
Authors:
Zhenpeng Lin,
Zheng Yu,
Ziyi Guo,
Simone Campanoni,
Peter Dinda,
Xinyu Xing
Abstract:
The heap is a critical and widely used component of many applications. Due to its dynamic nature, combined with the complexity of heap management algorithms, it is also a frequent target for security exploits. To enhance the heap's security, various heap protection techniques have been introduced, but they either introduce significant runtime overhead or have limited protection.
We present CAMP,…
▽ More
The heap is a critical and widely used component of many applications. Due to its dynamic nature, combined with the complexity of heap management algorithms, it is also a frequent target for security exploits. To enhance the heap's security, various heap protection techniques have been introduced, but they either introduce significant runtime overhead or have limited protection.
We present CAMP, a new sanitizer for detecting and capturing heap memory corruption. CAMP leverages a compiler and a customized memory allocator. The compiler adds boundary-checking and escape-tracking instructions to the target program, while the memory allocator tracks memory ranges, coordinates with the instrumentation, and neutralizes dangling pointers. With the novel error detection scheme, CAMP enables various compiler optimization strategies and thus eliminates redundant and unnecessary check instrumentation. This design minimizes runtime overhead without sacrificing security guarantees. Our evaluation and comparison of CAMP with existing tools, using both real-world applications and SPEC CPU benchmarks, show that it provides even better heap corruption detection capability with lower runtime overhead.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation
Authors:
Ziyi Guo,
Dang K Le,
Zhenpeng Lin,
Kyle Zeng,
Ruoyu Wang,
Tiffany Bao,
Yan Shoshitaishvili,
Adam Doupé,
Xinyu Xing
Abstract:
Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation…
▽ More
Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the \sys model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to the community.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Stochastic Thermodynamics of Micromagnetics with Spin Torque
Authors:
Mingnan Ding,
Jun Wu,
Xiangjun Xing
Abstract:
In this work, we study the stochastic dynamics of micro-magnetics interacting with a spin-current torque. We extend the previously constructed stochastic Landau-Lifshitz equation to the case with spin-current torque, and verify the conditions of detailed balance. Then we construct various thermodynamics quantities such as work and heat, and prove the second law of thermodynamics. Due to the existe…
▽ More
In this work, we study the stochastic dynamics of micro-magnetics interacting with a spin-current torque. We extend the previously constructed stochastic Landau-Lifshitz equation to the case with spin-current torque, and verify the conditions of detailed balance. Then we construct various thermodynamics quantities such as work and heat, and prove the second law of thermodynamics. Due to the existence of spin-torque and the asymmetry of the kinetic matrix, a novel effect of entropy pum** shows up. As a consequence, the system may behave as a heat engine which constantly transforms heat into magnetic work. Finally, we derive a fluctuation theorem for the joint probability density function of the pumped entropy and the total work, and verify it using numerical simulations.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
ShadowBound: Efficient Heap Memory Protection Through Advanced Metadata Management and Customized Compiler Optimization
Authors:
Zheng Yu,
Ganxiang Yang,
Xinyu Xing
Abstract:
In software development, the prevalence of unsafe languages such as C and C++ introduces potential vulnerabilities, especially within the heap, a pivotal component for dynamic memory allocation. Despite its significance, heap management complexities have made heap corruption pervasive, posing severe threats to system security. While prior solutions aiming for temporal and spatial memory safety exh…
▽ More
In software development, the prevalence of unsafe languages such as C and C++ introduces potential vulnerabilities, especially within the heap, a pivotal component for dynamic memory allocation. Despite its significance, heap management complexities have made heap corruption pervasive, posing severe threats to system security. While prior solutions aiming for temporal and spatial memory safety exhibit overheads deemed impractical, we present ShadowBound, a unique heap memory protection design. At its core, ShadowBound is an efficient out-of-bounds defense that can work with various use-after-free defenses (e.g. MarkUs, FFMalloc, PUMM) without compatibility constraints. We harness a shadow memory-based metadata management mechanism to store heap chunk boundaries and apply customized compiler optimizations tailored for boundary checking. We implemented ShadowBound atop the LLVM framework and integrated three state-of-the-art use-after-free defenses. Our evaluations show that ShadowBound provides robust heap protection with minimal time and memory overhead, suggesting its effectiveness and efficiency in safeguarding real-world programs against prevalent heap vulnerabilities.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Decoupled Alignment for Robust Plug-and-Play Adaptation
Authors:
Haozheng Luo,
Jiahao Yu,
Wenxin Zhang,
Jialong Li,
Jerry Yao-Chieh Hu,
Xinyu Xing,
Han Liu
Abstract:
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we…
▽ More
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we employ delta debugging to identify the critical components of knowledge necessary for effective distillation. On the harmful question dataset, our method significantly enhances the average defense success rate by approximately 14.41%, reaching as high as 51.39%, in 17 unaligned pre-trained LLMs, without compromising performance.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Authors:
Jiahao Yu,
Haozheng Luo,
Jerry Yao-Chieh Hu,
Wenbo Guo,
Han Liu,
Xinyu Xing
Abstract:
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts…
▽ More
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts. In this paper, we introduce BOOST, a simple attack that leverages only the eos tokens. We demonstrate that rather than constructing complicated jailbreaking prompts, the attacker can simply append a few eos tokens to the end of a harmful question. It will bypass the safety alignment of LLMs and lead to successful jailbreaking attacks. We further apply BOOST to four representative jailbreak methods and show that the attack success rates of these methods can be significantly enhanced by simply adding eos tokens to the prompt. To understand this simple but novel phenomenon, we conduct empirical analyses. Our analysis reveals that adding eos tokens makes the target LLM believe the input is much less harmful, and eos tokens have low attention values and do not affect LLM's understanding of the harmful questions, leading the model to actually respond to the questions. Our findings uncover how fragile an LLM is against jailbreak attacks, motivating the development of strong safety alignment approaches.
△ Less
Submitted 4 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI
Authors:
Xiaodan Xing,
Fadong Shi,
Jiahao Huang,
Yinzhe Wu,
Yang Nan,
Sheng Zhang,
Yingying Fang,
Mike Roberts,
Carola-Bibiane Schönlieb,
Javier Del Ser,
Guang Yang
Abstract:
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effe…
▽ More
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes.
Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scra**, now mixed with unlabeled synthetic data. This trend portends a future where generative AI systems may increasingly rely blindly on consuming self-generated data, raising concerns about model performance and ethical issues. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects?
There is a significant gap in the scientific literature regarding the impact of synthetic data use in generative AI, particularly in terms of the fusion of multimodal information. To address this research gap, this review investigates the consequences of integrating synthetic data blindly on training generative AI on both image and text modalities and explores strategies to mitigate these effects. The goal is to offer a comprehensive view of synthetic data's role, advocating for a balanced approach to its use and exploring practices that promote the sustainable development of generative AI technologies in the era of large models.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
Authors:
Zelei Cheng,
Xian Wu,
Jiahao Yu,
Sabrina Yang,
Gang Wang,
Xinyu Xing
Abstract:
Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for re…
▽ More
Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.
△ Less
Submitted 5 June, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes
Authors:
Juncheng Hu,
Ximing Xing,
Zhengqi Zhang,
**g Zhang,
Qian Yu
Abstract:
We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovativ…
▽ More
We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovatively, we conceptualize the stylization process as the rearrangement of vectorized strokes extracted from the reference image. VectorPainter employs an optimization-based pipeline. It begins by extracting vectorized strokes from the reference image, which are then used to initialize the synthesis process. To ensure fidelity to the reference style, a novel style preservation loss is introduced. Extensive experiments have been conducted to demonstrate that our method is capable of aligning with the text description while remaining faithful to the reference image.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Stochastic thermodynamics of Brownian motion in a flowing fluid
Authors:
Jun Wu,
Mingnan Ding,
Xiangjun Xing
Abstract:
We study stochastic thermodynamics of over-damped Brownian motion in a flowing fluid. Unlike some previous works, we treat the effects of the flow field as a non-conservational driving force acting on the Brownian particle. This allows us to apply the theoretical formalism developed in a recent work for general non-conservative Langevin dynamics. We define heat and work both at the trajectory leve…
▽ More
We study stochastic thermodynamics of over-damped Brownian motion in a flowing fluid. Unlike some previous works, we treat the effects of the flow field as a non-conservational driving force acting on the Brownian particle. This allows us to apply the theoretical formalism developed in a recent work for general non-conservative Langevin dynamics. We define heat and work both at the trajectory level and at the ensemble level, and prove the second law of thermodynamics explicitly. The entropy production (EP) is decomposed into a housekee** part and an excess part, both of which are non-negative at the ensemble level. Fluctuation theorems are derived for the housekee** work, the excess work, and the total work, which are further verified using numerical simulations. A comparison between our theory and an earlier theory by Speck et. al. is also carried out.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Stochastic Thermodynamics of Micromagnetics
Authors:
Mingnan Ding,
Jun Wu,
Xiangjun Xing
Abstract:
In this work, we study the stochastic thermodynamics of micro-magnetic systems. We first formulate the stochastic dynamics of micro-magnetic systems by incorporating noises into Landau-Lifshitz (LL) equation, which describes the irreversible and deterministic dynamics of magnetic moments. The resulting stochastic Landau-Lifshitz (sLL) equation obeys detailed balance, which guarantees that, with th…
▽ More
In this work, we study the stochastic thermodynamics of micro-magnetic systems. We first formulate the stochastic dynamics of micro-magnetic systems by incorporating noises into Landau-Lifshitz (LL) equation, which describes the irreversible and deterministic dynamics of magnetic moments. The resulting stochastic Landau-Lifshitz (sLL) equation obeys detailed balance, which guarantees that, with the external field fixed, the system converges to thermodynamic equilibrium with vanishing entropy production and with non-vanishing probability current. We then discuss various thermodynamic variables both at the trajectory level and at the ensemble level, and further establish both the first and the second laws of thermodynamics. Finally, we establish fluctuation theorems, and verify them using numerical simulations.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Picturing the Gap Between the Performance and US-DOE's Hydrogen Storage Target: A Data-Driven Model for MgH2 Dehydrogenation
Authors:
Chaoqun Li,
Weijie Yang,
Hao Liu,
Xinyuan Liu,
Xiu**g Xing,
Zhengyang Gao,
Shuai Dong,
Hao Li
Abstract:
Develo** solid-state hydrogen storage materials is as pressing as ever, which requires a comprehensive understanding of the dehydrogenation chemistry of a solid-state hydride. Transition state search and kinetics calculations are essential to understanding and designing high-performance solid-state hydrogen storage materials by filling in the knowledge gap that current experimental techniques ca…
▽ More
Develo** solid-state hydrogen storage materials is as pressing as ever, which requires a comprehensive understanding of the dehydrogenation chemistry of a solid-state hydride. Transition state search and kinetics calculations are essential to understanding and designing high-performance solid-state hydrogen storage materials by filling in the knowledge gap that current experimental techniques cannot measure. However, the ab initio analysis of these processes is computationally expensive and time-consuming. Searching for descriptors to accurately predict the energy barrier is urgently needed, to accelerate the prediction of hydrogen storage material properties and identify the opportunities and challenges in this field. Herein, we develop a data-driven model to describe and predict the dehydrogenation barriers of a typical solid-state hydrogen storage material, magnesium hydride (MgH2), based on the combination of the crystal Hamilton population orbital of Mg-H bond and the distance between atomic hydrogen. By deriving the distance energy ratio, this model elucidates the key chemistry of the reaction kinetics. All the parameters in this model can be directly calculated with significantly less computational cost than conventional transition state search, so that the dehydrogenation performance of hydrogen storage materials can be predicted efficiently. Finally, we found that this model leads to excellent agreement with typical experimental measurements reported to date and provides clear design guidelines on how to propel the performance of MgH2 closer to the target set by the United States Department of Energy (US-DOE).
△ Less
Submitted 29 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Ultralarge polarization in ferroelectric hafnia-based thin films
Authors:
Han Wu,
Kun Lin,
Qinghua Zhang,
Qian Yu,
Xiaoqian Fu,
Qiang Li,
Meera Cheviri,
Oswaldo Dieguez,
Shuai Xu,
Lin Gu,
Yili Cao,
Jiaou Wang,
Zhen Wang,
Yu Chen,
Huanhua Wang,
**xia Deng,
Jun Miao,
Xianran Xing
Abstract:
Hafnia-based ferroelectrics have become a valuable class of electronic functional materials at the nanoscale, showing great potential for next-generation memory and logic devices. However, more robust ferroelectric properties and better understanding of the polarization mechanisms are currently needed both in technology and science. Herein, we report the properties of oxygen-deficient Hf0.5Zr0.5O2…
▽ More
Hafnia-based ferroelectrics have become a valuable class of electronic functional materials at the nanoscale, showing great potential for next-generation memory and logic devices. However, more robust ferroelectric properties and better understanding of the polarization mechanisms are currently needed both in technology and science. Herein, we report the properties of oxygen-deficient Hf0.5Zr0.5O2 films with ultralarge remanent polarization (Pr) of 387 uC cm-2 at room temperature (1 kHz). Structure characterizations identify a new ferroelectric monoclinic Pc phase in these Hf0.5Zr0.5O2 films. The in-situ STEM measurements evidence polar displacements of the oxygen atoms, which move up and down in the Pc structure under applied DC bias fields, showing a huge displacement (1.6 A). DFT calculations optimized the Pc structure and also predicted a large polarization. The coexistence of the ferroelectric monoclinic (Pc) phases and orthorhombic (Pca21) is responsible for this superior ferroelectric properties. These findings are promising for hafnia-based ferroelectric applications in integrated ferroelectric devices, energy harvesting and actuators, etc.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Towards Zero-shot Human-Object Interaction Detection via Vision-Language Integration
Authors:
Weiying Xue,
Qi Liu,
Qiwei Xiong,
Yuxiao Wang,
Zhenao Wei,
Xiaofen Xing,
Xiangmin Xu
Abstract:
Human-object interaction (HOI) detection aims to locate human-object pairs and identify their interaction categories in images. Most existing methods primarily focus on supervised learning, which relies on extensive manual HOI annotations. In this paper, we propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of visual-language model to…
▽ More
Human-object interaction (HOI) detection aims to locate human-object pairs and identify their interaction categories in images. Most existing methods primarily focus on supervised learning, which relies on extensive manual HOI annotations. In this paper, we propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of visual-language model to improve zero-shot HOI detection. Specifically, the verb feature learning module is designed based on visual semantics, by employing the verb extraction decoder to convert corresponding verb queries into interaction-specific category representations. We develop an effective additive self-attention mechanism to generate more comprehensive visual representations. Moreover, the innovative interaction representation decoder effectively extracts informative regions by integrating spatial and visual feature information through a cross-attention mechanism. To deal with zero-shot learning in low-data, we leverage a priori knowledge from the CLIP text encoder to initialize the linear classifier for enhanced interaction understanding. Extensive experiments conducted on the mainstream HICO-DET and V-COCO datasets demonstrate that our model outperforms the previous methods in various zero-shot and full-supervised settings.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model
Authors:
Xiangyu Li,
Xinjie Shen,
Yawen Zeng,
Xiaofen Xing,
** Xu
Abstract:
The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is not easy for ordinary investors to mine factors and analyze news. On the other hand, although large language models in the financial field can serve users in the form of dialogue robots, it still requires users to have…
▽ More
The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is not easy for ordinary investors to mine factors and analyze news. On the other hand, although large language models in the financial field can serve users in the form of dialogue robots, it still requires users to have financial knowledge to ask reasonable questions. To serve the user experience, we aim to build an automatic system, FinReport, for ordinary investors to collect information, analyze it, and generate reports after summarizing.
Specifically, our FinReport is based on financial news announcements and a multi-factor model to ensure the professionalism of the report. The FinReport consists of three modules: news factorization module, return forecasting module, risk assessment module. The news factorization module involves understanding news information and combining it with stock factors, the return forecasting module aim to analysis the impact of news on market sentiment, and the risk assessment module is adopted to control investment risk. Extensive experiments on real-world datasets have well verified the effectiveness and explainability of our proposed FinReport. Our codes and datasets are available at https://github.com/frinkleko/FinReport.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
PointCore: Efficient Unsupervised Point Cloud Anomaly Detector Using Local-Global Features
Authors:
Baozhu Zhao,
Qiwei Xiong,
Xiaohan Zhang,
**gfeng Guo,
Qi Liu,
Xiaofen Xing,
Xiangmin Xu
Abstract:
Three-dimensional point cloud anomaly detection that aims to detect anomaly data points from a training set serves as the foundation for a variety of applications, including industrial inspection and autonomous driving. However, existing point cloud anomaly detection methods often incorporate multiple feature memory banks to fully preserve local and global representations, which comes at the high…
▽ More
Three-dimensional point cloud anomaly detection that aims to detect anomaly data points from a training set serves as the foundation for a variety of applications, including industrial inspection and autonomous driving. However, existing point cloud anomaly detection methods often incorporate multiple feature memory banks to fully preserve local and global representations, which comes at the high cost of computational complexity and mismatches between features. To address that, we propose an unsupervised point cloud anomaly detection framework based on joint local-global features, termed PointCore. To be specific, PointCore only requires a single memory bank to store local (coordinate) and global (PointMAE) representations and different priorities are assigned to these local-global features, thereby reducing the computational cost and mismatching disturbance in inference. Furthermore, to robust against the outliers, a normalization ranking method is introduced to not only adjust values of different scales to a notionally common scale, but also transform densely-distributed data into a uniform distribution. Extensive experiments on Real3D-AD dataset demonstrate that PointCore achieves competitive inference time and the best performance in both detection and localization as compared to the state-of-the-art Reg3D-AD approach and several competitors.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
The Staggered Mesh Method: Accurate Exact Exchange towards the Thermodynamic Limit for Solids
Authors:
Stephen Jon Quiton,
Hamlin Wu,
Xin Xing,
Lin Lin,
Martin Head-Gordon
Abstract:
In periodic systems, the Hartree-Fock (HF) exchange energy exhibits the slowest convergence of all HF energy components as the system size approaches the thermodynamic limit. We demonstrate that the recently proposed staggered mesh method for Fock exchange energy [Xing, Li, and Lin, Math. Comp., 2024], which is specifically designed to sidestep certain singularities in exchange energy evaluation,…
▽ More
In periodic systems, the Hartree-Fock (HF) exchange energy exhibits the slowest convergence of all HF energy components as the system size approaches the thermodynamic limit. We demonstrate that the recently proposed staggered mesh method for Fock exchange energy [Xing, Li, and Lin, Math. Comp., 2024], which is specifically designed to sidestep certain singularities in exchange energy evaluation, can expedite the finite-size convergence rate for the exact exchange energy across a range of insulators and semiconductors when compared to the regular and truncated Coulomb methods. This remains true even for two computationally cheaper versions of this new method, which we call Non-SCF and Split-SCF staggered mesh. Additionally, a sequence of numerical tests on simple solids showcases the staggered mesh method's ability to improve convergence towards the thermodynamic limit for band gaps, bulk moduli, equilibrium lattice dimensions, energies, and phonon force constants.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
A spin-torque nano-oscillator based on interlayer-coupled meron-skyrmion pairs with a fixed orbit
Authors:
Qiyun Yi,
Ting Han,
**yi Jiang,
Xiangjun Xing
Abstract:
In recent years, magnetic skyrmion-based spin-torque nano-oscillators (STNOs) attract considerable interest for their prospect in future-generation communication and spintronic technologies. However, some critical issues, which hamper their practical applications, e.g., the long start-up time and variable skyrmion gyration orbit, remain to be resolved. Here, we numerically demonstrate a realizatio…
▽ More
In recent years, magnetic skyrmion-based spin-torque nano-oscillators (STNOs) attract considerable interest for their prospect in future-generation communication and spintronic technologies. However, some critical issues, which hamper their practical applications, e.g., the long start-up time and variable skyrmion gyration orbit, remain to be resolved. Here, we numerically demonstrate a realization of a fixed-orbit STNO, which is based on an interlayer-coupled meron-skyrmion (MS) pair other than a magnetic skyrmion. In this STNO, the MS pair possesses a structurally defined, fixed orbit within a broad range of driving current, even in the presence of random defects. The output frequency range of the STNO based on an MS pair far exceeds that of the STNO typically based on a single skyrmion. Moreover, the output frequency of this STNO can be further elevated if more MS pairs are incorporated. Our results reveal the nontrivial dynamics of the interlayer-coupled MS pair, opening perspectives for the design and optimization of fundamental spintronic devices.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images
Authors:
Xiaodan Xing,
Huiyu Zhou,
Yingying Fang,
Guang Yang
Abstract:
AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital…
▽ More
AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital watermarks aimed at facilitating the discernment of synthetic images' authenticity. These watermarks are embedded within the image pixels and are invisible to the human eye while remains their detectability. Nevertheless, a comprehensive investigation into the potential impact of these invisible watermarks on the utility of synthetic medical images has been lacking. In this study, we propose the incorporation of invisible watermarks into synthetic medical images and seek to evaluate their efficacy in the context of downstream classification tasks. Our goal is to pave the way for discussions on the viability of such watermarks in boosting the detectability of synthetic medical images, fortifying ethical standards, and safeguarding against data pollution and potential scams.
△ Less
Submitted 21 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Measurements of Normalized Differential Cross Sections of Inclusive $η$ Production in $e^{+}e^{-}$ Annihilation at Energy from 2.0000 to 3.6710 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
D. Anderle,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (641 additional authors not shown)
Abstract:
Using data samples collected with the BESIII detector operating at the BEPCII storage ring, the cross section of the inclusive process $e^{+}e^{-} \to η+ X$, normalized by the total cross section of $e^{+}e^{-} \to \text{hadrons}$, is measured at eight center-of-mass energy points from 2.0000 GeV to 3.6710 GeV. These are the first measurements with momentum dependence in this energy region. Our me…
▽ More
Using data samples collected with the BESIII detector operating at the BEPCII storage ring, the cross section of the inclusive process $e^{+}e^{-} \to η+ X$, normalized by the total cross section of $e^{+}e^{-} \to \text{hadrons}$, is measured at eight center-of-mass energy points from 2.0000 GeV to 3.6710 GeV. These are the first measurements with momentum dependence in this energy region. Our measurement shows a significant discrepancy from calculations with the existing fragmentation functions. To address this discrepancy, a new QCD analysis is performed at the next-to-next-to-leading order with hadron mass corrections and higher twist effects, which can explain both the established high-energy data and our measurements reasonably well.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Unusually weak irradiation effects in anisotropic iron-based superconductor RbCa2Fe4As4F2
Authors:
Daniele Torsello,
Erik Piatti,
Michela Fracasso,
Roberto Gerbaldo,
Laura Gozzelino,
Xiaolei Yi,
Xiangzhuo Xing,
Zhixiang Shi,
Dario Daghero,
Gianluca Ghigo
Abstract:
We report on the effects of 3.5 MeV proton irradiation in RbCa$_2$Fe$_4$As$_4$F$_2$, an iron-based superconductor with unusual properties in between those of the pnictides and of the cuprate high-temperature superconductors. We studied how structural disorder introduced by ion bombardment affects the critical temperature, superfluid density and gap values by combining a coplanar waveguide resonato…
▽ More
We report on the effects of 3.5 MeV proton irradiation in RbCa$_2$Fe$_4$As$_4$F$_2$, an iron-based superconductor with unusual properties in between those of the pnictides and of the cuprate high-temperature superconductors. We studied how structural disorder introduced by ion bombardment affects the critical temperature, superfluid density and gap values by combining a coplanar waveguide resonator technique, electric transport measurements and point-contact Andreev-reflection spectroscopy. We find an unusually weak dependence of the superconducting properties on the amount of disorder in this material when compared to other iron-based superconductors under comparable irradiation conditions. The nodal multigap state exhibited by pristine RbCa$_2$Fe$_4$As$_4$F$_2$ is also robust against proton irradiation, with a two-band $d-d$ model being the one that best fits the experimental data.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
A clean-label graph backdoor attack method in node classification task
Authors:
Xiaogang Xing,
Ming Xu,
Yu**g Bai,
Dongdong Yang
Abstract:
Backdoor attacks in the traditional graph neural networks (GNNs) field are easily detectable due to the dilemma of confusing labels. To explore the backdoor vulnerability of GNNs and create a more stealthy backdoor attack method, a clean-label graph backdoor attack method(CGBA) in the node classification task is proposed in this paper. Differently from existing backdoor attack methods, CGBA requir…
▽ More
Backdoor attacks in the traditional graph neural networks (GNNs) field are easily detectable due to the dilemma of confusing labels. To explore the backdoor vulnerability of GNNs and create a more stealthy backdoor attack method, a clean-label graph backdoor attack method(CGBA) in the node classification task is proposed in this paper. Differently from existing backdoor attack methods, CGBA requires neither modification of node labels nor graph structure. Specifically, to solve the problem of inconsistency between the contents and labels of the samples, CGBA selects poisoning samples in a specific target class and uses the label of sample as the target label (i.e., clean-label) after injecting triggers into the target samples. To guarantee the similarity of neighboring nodes, the raw features of the nodes are elaborately picked as triggers to further improve the concealment of the triggers. Extensive experiments results show the effectiveness of our method. When the poisoning rate is 0.04, CGBA can achieve an average attack success rate of 87.8%, 98.9%, 89.1%, and 98.5%, respectively.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
SVGDreamer: Text Guided SVG Generation with Diffusion Model
Authors:
Ximing Xing,
Haitao Zhou,
Chuang Wang,
**g Zhang,
Dong Xu,
Qian Yu
Abstract:
Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven…
▽ More
Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduces attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to address issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence of the existing text-to-SVG generation methods by modeling SVGs as distributions of control points and colors. Furthermore, VPSD leverages a reward model to re-weight vector particles, which improves aesthetic appeal and accelerates convergence. Extensive experiments are conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. Project page: \href{https://ximinng.github.io/SVGDreamer-project/}{https://ximinng.github.io/SVGDreamer-project/}
△ Less
Submitted 2 April, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge
Authors:
Yang Nan,
Xiaodan Xing,
Shiyi Wang,
Zeyu Tang,
Federico N Felder,
Sheng Zhang,
Roberta Eufrasia Ledda,
Xiaoliu Ding,
Ruiqi Yu,
Wei** Liu,
Feng Shi,
Tianyang Sun,
Zehong Cao,
Minghui Zhang,
Yun Gu,
Hanxiao Zhang,
Jian Gao,
**yu Wang,
Wen Tang,
Pengxin Yu,
Han Kang,
Junqiang Chen,
Xing Lu,
Boyu Zhang,
Michail Mamalakis
, et al. (16 additional authors not shown)
Abstract:
Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric…
▽ More
Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers.
△ Less
Submitted 16 April, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
A novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm
Authors:
Yong Niu,
Xing Xing,
Zhichun Jia,
Ruidi Liu,
Mindong Xin
Abstract:
Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedd…
▽ More
Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedding may weaken the modeling capability of the system, and the global dynamics and local saliency exhibited by user preferences need to be distinguished. To address these issues, this paper proposes a novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm (AREAL). We introduce diffusion models into the recommend system, representing items as probability distributions instead of fixed vectors. This approach enables adaptive reflection of multiple aspects of the items and generates item distributions in a denoising manner. We use multi-scale cnn and residual lstm methods to extract the local and global dependency features of user history interactions, and use attention mechanism to distinguish weights as the guide features of reverse diffusion recovery. The effectiveness of the proposed method is validated through experiments conducted on two real-world datasets. Specifically, AREAL obtains improvements over the best baselines by 2.63% and 4.25% in terms of HR@20 and 5.05% and 3.94% in terms of NDCG@20 on all datasets.
△ Less
Submitted 20 December, 2023; v1 submitted 17 December, 2023;
originally announced December 2023.
-
BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials
Authors:
Xingrun Xing,
Li Du,
Xinyuan Wang,
Xianlin Zeng,
Yequan Wang,
Zheng Zhang,
Jiajun Zhang
Abstract:
Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Bina…
▽ More
Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Binary Pretrained Foundation Transformer (BiPFT) for natural language understanding (NLU) tasks, which remarkably saves 56 times operations and 28 times memory. In contrast to previous task-specific binary transformers, BiPFT exhibits a substantial enhancement in the learning capabilities of binary neural networks (BNNs), promoting BNNs into the era of pre-training. Benefiting from extensive pretraining data, we further propose a data-driven binarization method. Specifically, we first analyze the binarization error in self-attention operations and derive the polynomials of binarization error. To simulate full-precision self-attention, we define binarization error as binarization residual polynomials, and then introduce low-rank estimators to model these polynomials. Extensive experiments validate the effectiveness of BiPFTs, surpassing task-specific baseline by 15.4% average performance on the GLUE benchmark. BiPFT also demonstrates improved robustness to hyperparameter changes, improved optimization efficiency, and reduced reliance on downstream distillation, which consequently generalize on various NLU tasks and simplify the downstream pipeline of BNNs. Our code and pretrained models are publicly available at https://github.com/Xingrun-Xing/BiPFT.
△ Less
Submitted 20 June, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
Authors:
Srikumar Sastry,
Xin Xing,
Aayush Dhakal,
Subash Khanal,
Adeel Ahmad,
Nathan Jacobs
Abstract:
We focus on the problem of species distribution modeling using global-scale presence-only data. Most previous studies have mapped the range of a given species using geographical and environmental features alone. To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model. This enables range map** for any taxonomic rank an…
▽ More
We focus on the problem of species distribution modeling using global-scale presence-only data. Most previous studies have mapped the range of a given species using geographical and environmental features alone. To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model. This enables range map** for any taxonomic rank and unseen species without additional supervision. Further, we propose a novel proximity-aware evaluation metric that enables evaluating species distribution models using any pixel-level representation of ground-truth species range map. The proposed metric penalizes the predictions of a model based on its proximity to the ground truth. We describe the effectiveness of our model by systematically evaluating on the task of species range prediction, zero-shot prediction and geo-feature regression against the state-of-the-art. Results show our model outperforms the strong baselines when trained with a variety of multi-label learning losses.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
AI-driven emergence of frequency information non-uniform distribution via THz metasurface spectrum prediction
Authors:
Xiaohua Xing,
Yuqi Ren,
Die Zou,
Qiankun Zhang,
Bingxuan Mao,
Jianquan Yao,
Deyi Xiong,
Shuang Zhang,
Liang Wu
Abstract:
Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of auton…
▽ More
Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of autonomously learning intrinsic data features, thereby transcending the limitations of traditional experience to a certain extent. Here, we unveil previously unreported information characteristics pertaining to different frequencies emerged during our work on predicting the terahertz spectral modulation effects of metasurfaces based on AI-prediction. Moreover, we have substantiated that our proposed methodology of simply adding supplementary multi-frequency inputs to the existing dataset during the target spectral prediction process can significantly enhance the predictive accuracy of the network. This approach effectively optimizes the utilization of existing datasets and paves the way for interdisciplinary research and applications in artificial intelligence, chemistry, composite material design, biomedicine, and other fields.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
Authors:
Lei Fan,
Jianxiong Zhou,
Xiaoying Xing,
Ying Wu
Abstract:
Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as gras**, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing…
▽ More
Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as gras**, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing active open-vocabulary recognition, empowering embodied agents to actively perceive and classify arbitrary objects. However, directly adopting recent open-vocabulary classification models, like Contrastive Language Image Pretraining (CLIP), poses its unique challenges. Specifically, we observe that CLIP's performance is heavily affected by the viewpoint and occlusions, compromising its reliability in unconstrained embodied perception scenarios. Further, the sequential nature of observations in agent-environment interactions necessitates an effective method for integrating features that maintains discriminative strength for open-vocabulary classification. To address these issues, we introduce a novel agent for active open-vocabulary recognition. The proposed method leverages inter-frame and inter-concept similarities to navigate agent movements and to fuse features, without relying on class-specific knowledge. Compared to baseline CLIP model with 29.6% accuracy on ShapeNet dataset, the proposed agent could achieve 53.3% accuracy for open-vocabulary recognition, without any fine-tuning to the equipped CLIP model. Additional experiments conducted with the Habitat simulator further affirm the efficacy of our method.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
LM-Cocktail: Resilient Tuning of Language Models via Model Merging
Authors:
Shitao Xiao,
Zheng Liu,
Peitian Zhang,
Xingrun Xing
Abstract:
The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model mergin…
▽ More
The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.
△ Less
Submitted 8 December, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Assessing Prompt Injection Risks in 200+ Custom GPTs
Authors:
Jiahao Yu,
Yuhang Wu,
Dong Shu,
Mingyu **,
Sabrina Yang,
Xinyu Xing
Abstract:
In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testi…
▽ More
In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.
△ Less
Submitted 25 May, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Post-COVID Highlights: Challenges and Solutions of AI Techniques for Swift Identification of COVID-19
Authors:
Yingying Fang,
Xiaodan Xing,
Shiyi Wang,
Simon Walsh,
Guang Yang
Abstract:
Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retros…
▽ More
Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retrospectively evaluate these proposed studies and offer a review of the techniques employed in AI diagnostic models, with a focus on the solutions proposed for different challenges. This review endeavors to provide insights into the diverse solutions designed to address the multifaceted challenges that arose during the pandemic. By doing so, we aim to prepare the AI community for the development of AI tools tailored to address public health emergencies effectively.
△ Less
Submitted 24 November, 2023; v1 submitted 24 September, 2023;
originally announced November 2023.
-
Dynamic Multimodal Information Bottleneck for Multimodality Classification
Authors:
Yingying Fang,
Shuang Wu,
Sheng Zhang,
Chaoyan Huang,
Tieyong Zeng,
Xiaodan Xing,
Simon Walsh,
Guang Yang
Abstract:
Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These appro…
▽ More
Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These approaches are generally not optimal for clinical settings, which pose the additional challenges of limited training data, as well as being rife with redundant data or noisy modality channels, leading to subpar performance. To address this gap, we study the robustness of existing methods to data redundancy and noise and propose a generalized dynamic multimodal information bottleneck framework for attaining a robust fused feature representation. Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature, and we further introduce a sufficiency loss to prevent drop** of task-relevant information, thus explicitly preserving the sufficiency of prediction information in the distilled feature. We validate our model on an in-house and a public COVID19 dataset for mortality prediction as well as two public biomedical datasets for diagnostic tasks. Extensive experiments show that our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist. Our code is publicly available at https://github.com/ayanglab/DMIB.
△ Less
Submitted 25 November, 2023; v1 submitted 2 November, 2023;
originally announced November 2023.
-
SoulChat: Improving LLMs' Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations
Authors:
Yirong Chen,
Xiaofen Xing,
**gkai Lin,
Huimin Zheng,
Zhenyu Wang,
Qi Liu,
Xiangmin Xu
Abstract:
Large language models (LLMs) have been widely applied in various fields due to their excellent capability for memorizing knowledge and chain of thought (CoT). When these language models are applied in the field of psychological counseling, they often rush to provide universal advice. However, when users seek psychological support, they need to gain empathy, trust, understanding and comfort, rather…
▽ More
Large language models (LLMs) have been widely applied in various fields due to their excellent capability for memorizing knowledge and chain of thought (CoT). When these language models are applied in the field of psychological counseling, they often rush to provide universal advice. However, when users seek psychological support, they need to gain empathy, trust, understanding and comfort, rather than just reasonable advice. To this end, we constructed a multi-turn empathetic conversation dataset of more than 2 million samples, in which the input is the multi-turn conversation context, and the target is empathetic responses that cover expressions such as questioning, comfort, recognition, listening, trust, emotional support, etc. Experiments have shown that the empathy ability of LLMs can be significantly enhanced when finetuning by using multi-turn dialogue history and responses that are closer to the expression of a psychological consultant.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning
Authors:
Xin Xing,
Zhexiao Xiong,
Abby Stylianou,
Srikumar Sastry,
Liyu Gong,
Nathan Jacobs
Abstract:
This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specificall…
▽ More
This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT
Authors:
Yirong Chen,
Zhenyu Wang,
Xiaofen Xing,
huimin zheng,
Zhipei Xu,
Kai Fang,
Junhong Wang,
Sihang Li,
Jieling Wu,
Qi Liu,
Xiangmin Xu
Abstract:
Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independen…
▽ More
Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health.
△ Less
Submitted 4 December, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation
Authors:
Zhaojie Chu,
Kailing Guo,
Xiaofen Xing,
Yilin Lan,
Bolun Cai,
Xiangmin Xu
Abstract:
Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly map** single-level speech features to the entire facial animation, which…
▽ More
Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly map** single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity metric is defined to distinguish between strong and weak facial activity, obtained by computing the short-time Fourier transform of facial vertex displacements. Based on the variances in facial activity, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Genome hybridization: A universal way for the origin and diversification of organelles as well as the origin and speciation of eukaryotes
Authors:
Qing-lin Dong,
Xiang-ying Xing
Abstract:
The origin of organelles (mitochondrion, chloroplast and nucleus) remains enigmatic. The endosymbiotic hypothesis that chloroplasts, mitochondria and nuclei descend from the endosymbiotic cyanobacterium, bacterium and archaebacterium respectively is dominant yet uncompelling, while our discovery of de novo organelle biogenesis in the cyanobacterium TDX16 that had acquired the genome of its green a…
▽ More
The origin of organelles (mitochondrion, chloroplast and nucleus) remains enigmatic. The endosymbiotic hypothesis that chloroplasts, mitochondria and nuclei descend from the endosymbiotic cyanobacterium, bacterium and archaebacterium respectively is dominant yet uncompelling, while our discovery of de novo organelle biogenesis in the cyanobacterium TDX16 that had acquired the genome of its green algal host Haematococcus pluvialis overturns this hypothesis. In light of organelle biogenesis in the cyanobacterium TDX16 in combination with the relevant cellular and molecular evidence, we propose genome hybridization hypothesis (GHH) that the origin of organelles and origin of eukaryotes as well as the diversification of organelles and speciation of eukaryotes are unified and achieved by genome hybridization: the endosymbiotic cyanobacteria or bacteria obtain genomes of their archaebacterial or eukaryotic hosts and hybridize with their own ones resulting in expanded genomes containing a mixture of hybrid prokaryotic genes and eukaryotic genes, and thus the cyanobacteria or bacteria have to compartmentalize to accommodate different genes for specialized function of photosynthesis (chloroplast), respiration (mitochondrion) and DNA preservation (nucleus), and consequently turn into photosynthetic or heterotrophic eukaryotes. Accordingly, eukaryotes and their organelles are of multiple origin, while the formation of cancer cells is the speciation of eukaryotes as cancer cells are new species of unicellular eukaryotes arising from bacteria. Therefore, GHH provides a theoretical framework unifying evolutionary biology, cancer biology and cell biology and directing the integrated multidisciplinary research.
△ Less
Submitted 7 May, 2024; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Multi-Ship Tracking by Robust Similarity metric
Authors:
Hongyu Zhao,
Gongming Wei,
Yang Xiao,
Xianglei Xing
Abstract:
Multi-ship tracking (MST) as a core technology has been proven to be applied to situational awareness at sea and the development of a navigational system for autonomous ships. Despite impressive tracking outcomes achieved by multi-object tracking (MOT) algorithms for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets. Intersection of…
▽ More
Multi-ship tracking (MST) as a core technology has been proven to be applied to situational awareness at sea and the development of a navigational system for autonomous ships. Despite impressive tracking outcomes achieved by multi-object tracking (MOT) algorithms for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets. Intersection of Union (IoU) is the most popular metric for computing similarity used in object tracking. The low frame rates and severe image shake caused by wave turbulence in ship datasets often result in minimal, or even zero, Intersection of Union (IoU) between the predicted and detected bounding boxes. This issue contributes to frequent identity switches of tracked objects, undermining the tracking performance. In this paper, we address the weaknesses of IoU by incorporating the smallest convex shapes that enclose both the predicted and detected bounding boxes. The calculation of the tracking version of IoU (TIoU) metric considers not only the size of the overlap** area between the detection bounding box and the prediction box, but also the similarity of their shapes. Through the integration of the TIoU into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, we consistently achieve improvements in the tracking performance of these frameworks.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Style Transfer and Self-Supervised Learning Powered Myocardium Infarction Super-Resolution Segmentation
Authors:
Lichao Wang,
Jiahao Huang,
Xiaodan Xing,
Yinzhe Wu,
Ramyah Rajakulasingam,
Andrew D. Scott,
Pedro F Ferreira,
Ranil De Silva,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) a…
▽ More
This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) areas. Subsequently, the segmentation task is performed on the LGE style image. An end-to-end super-resolution segmentation model is introduced to generate high-resolution mask from low-resolution LGE style DTI image. Further, to enhance the performance of the model, a multi-task self-supervised learning strategy is employed to pre-train the super-resolution segmentation model, allowing it to acquire more representative knowledge and improve its segmentation performance after fine-tuning. https: github.com/wlc2424762917/Med_Img
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch
Authors:
Pucheng Zhai,
Kailing Guo,
Fang Liu,
Xiaofen Xing,
Xiangmin Xu
Abstract:
Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to assign different pruning rates across different layers in CNN or cannot control the compression rate explicitly. Since too narrow network blocks information flow…
▽ More
Structured pruning is a commonly used convolutional neural network (CNN) compression approach. Pruning rate setting is a fundamental problem in structured pruning. Most existing works introduce too many additional learnable parameters to assign different pruning rates across different layers in CNN or cannot control the compression rate explicitly. Since too narrow network blocks information flow for training, automatic pruning rate setting cannot explore a high pruning rate for a specific layer. To overcome these limitations, we propose a novel framework named Layer Adaptive Progressive Pruning (LAPP), which gradually compresses the network during initial training of a few epochs from scratch. In particular, LAPP designs an effective and efficient pruning strategy that introduces a learnable threshold for each layer and FLOPs constraints for network. Guided by both task loss and FLOPs constraints, the learnable thresholds are dynamically and gradually updated to accommodate changes of importance scores during training. Therefore the pruning strategy can gradually prune the network and automatically determine the appropriate pruning rates for each layer. What's more, in order to maintain the expressive power of the pruned layer, before training starts, we introduce an additional lightweight bypass for each convolutional layer to be pruned, which only adds relatively few additional burdens. Our method demonstrates superior performance gains over previous compression methods on various datasets and backbone architectures. For example, on CIFAR-10, our method compresses ResNet-20 to 40.3% without accuracy drop. 55.6% of FLOPs of ResNet-18 are reduced with 0.21% top-1 accuracy increase and 0.40% top-5 accuracy increase on ImageNet.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Characterizing the temporally stable structure of community evolution in intra-urban origin-destination networks
Authors:
Xiao-Jian Chen,
Yuhui Zhao,
Chaogui Kang,
Xiaoyue Xing,
Quanhua Dong,
Yu Liu
Abstract:
Intra-urban origin-destination (OD) network communities evolve throughout the day, indicating changing groups of closely connected regions. Under this variation, groups of regions with high consistency of community affiliation characterize the temporally stable structure of the evolution process, aiding in comprehending urban dynamics. However, how to quantify this consistency and identify these g…
▽ More
Intra-urban origin-destination (OD) network communities evolve throughout the day, indicating changing groups of closely connected regions. Under this variation, groups of regions with high consistency of community affiliation characterize the temporally stable structure of the evolution process, aiding in comprehending urban dynamics. However, how to quantify this consistency and identify these groups are open questions. In this study, we introduce the consensus OD network to quantify the consistency of community affiliation among regions. Furthermore, the temporally stable community decomposition method is proposed to identify groups of regions with high internal and low external consistency (named "stable groups"), where each group consists of temporally stable cores and attaching peripheries. Wuhan taxi data is used to verify our methods. On the hourly time scale, eleven stable groups containing 82.9% of regions are identified. This high percentage suggests that dynamic communities can be well organized via cores. Moreover, stable groups are spatially closed and more likely to distribute within a single district and separated by water bodies. Cores exhibit higher POI entropy and more healthcare and shop** services than peripheries. Our methods and empirical findings contribute to some practical issues, such as urban area division, polycentric evaluation and construction, and infectious disease control.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Authors:
Jiahao Yu,
Xingwei Lin,
Zheng Yu,
Xinyu Xing
Abstract:
Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial jailbreak attacks can st…
▽ More
Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial jailbreak attacks can still exploit LLMs to produce harmful content. These jailbreak templates are typically manually crafted, making large-scale testing challenging.
In this paper, we introduce GPTFuzz, a novel black-box jailbreak fuzzing framework inspired by the AFL fuzzing framework. Instead of manual engineering, GPTFuzz automates the generation of jailbreak templates for red-teaming LLMs. At its core, GPTFuzz starts with human-written templates as initial seeds, then mutates them to produce new templates. We detail three key components of GPTFuzz: a seed selection strategy for balancing efficiency and variability, mutate operators for creating semantically equivalent or similar sentences, and a judgment model to assess the success of a jailbreak attack.
We evaluate GPTFuzz against various commercial and open-source LLMs, including ChatGPT, LLaMa-2, and Vicuna, under diverse attack scenarios. Our results indicate that GPTFuzz consistently produces jailbreak templates with a high success rate, surpassing human-crafted templates. Remarkably, GPTFuzz achieves over 90% attack success rates against ChatGPT and Llama-2 models, even with suboptimal initial seed templates. We anticipate that GPTFuzz will be instrumental for researchers and practitioners in examining LLM robustness and will encourage further exploration into enhancing LLM safety.
△ Less
Submitted 27 June, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.