-
Federated Learning with Only Positive Labels by Exploring Label Correlations
Authors:
Xuming An,
Dui Wang,
Li Shen,
Yong Luo,
Han Hu,
Bo Du,
Yonggang Wen,
Dacheng Tao
Abstract:
Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue ca…
▽ More
Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue can be addressed by adding a specially designed regularizer on the server-side. Although effective sometimes, the label correlations are simply ignored and thus sub-optimal performance may be obtained. Besides, it is expensive and unsafe to exchange user's private embeddings between server and clients frequently, especially when training model in the contrastive way. To remedy these drawbacks, we propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC). Specifically, FedALC estimates the label correlations in the class embedding learning for different label pairs and utilizes it to improve the model training. To further improve the safety and also reduce the communication overhead, we propose a variant to learn fixed class embedding for each client, so that the server and clients only need to exchange class embeddings once. Extensive experiments on multiple popular datasets demonstrate that our FedALC can significantly outperform existing counterparts.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
GLDPC-PC Codes: Channel Coding Towards 6G Communications
Authors:
Li Shen,
Yongpeng Wu,
Yin Xu,
Xiaohu You,
Xiqi Gao,
Wenjun Zhang
Abstract:
The sixth generation (6G) wireless communication system will improve the key technical indicators by one to two orders of magnitude, and come with some new features. As a crucial technique to enhance the reliability and efficiency of data transmission, the next generation channel coding is not only required to satisfy the stringent requirements of 6G, but also expected to be backward compatible to…
▽ More
The sixth generation (6G) wireless communication system will improve the key technical indicators by one to two orders of magnitude, and come with some new features. As a crucial technique to enhance the reliability and efficiency of data transmission, the next generation channel coding is not only required to satisfy the stringent requirements of 6G, but also expected to be backward compatible to avoid imposing additional burden on the crowded baseband chip. This article provides an overview of the potential channel codes for 6G communications. In addition, we explore to develop next-generation channel codes based on low-density parity-check (LDPC) and polar frameworks, introducing a novel concept called generalized LDPC with polar-like component (GLDPC-PC) codes. The codes have exhibited promising error correction performance and manageable complexity, which can be further optimized by specific code design. The opportunities and challenges of GLDPC-PC codes are also discussed, considering the practical applications to 6G communication systems.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Achieving High Yield of Perpendicular SOT-MTJ Manufactured on 300 mm Wafers
Authors:
Wenlong Yang,
Zhenghui Ji,
Yang Gao,
Kaiyuan Zhou,
Qijun Guo,
Dinggui Zeng,
Shasha Wang,
Ming Wang,
Lijie Shen,
Guilin Chen,
Yihui Sun,
Enlong Liu,
Shikun He
Abstract:
The large-scale fabrication of three-terminal magnetic tunnel junctions (MTJs) with high yield is becoming increasingly crucial, especially with the growing interest in spin-orbit torque (SOT) magnetic random access memory (MRAM) as the next generation of MRAM technology. To achieve high yield and consistent device performance in MTJs with perpendicular magnetic anisotropy, an integration flow has…
▽ More
The large-scale fabrication of three-terminal magnetic tunnel junctions (MTJs) with high yield is becoming increasingly crucial, especially with the growing interest in spin-orbit torque (SOT) magnetic random access memory (MRAM) as the next generation of MRAM technology. To achieve high yield and consistent device performance in MTJs with perpendicular magnetic anisotropy, an integration flow has been developed that incorporates special MTJ etching technique and other CMOS-compatible processes on a 300 mm wafer manufacturing platform. Systematic studies have been conducted on device performance and statistical uniformity, encompassing magnetic properties, electrical switching behavior, and reliability. Achievements include a switching current of 680 uA at 2 ns, a TMR as high as 119%, ultra-high endurance (over 1012 cycles), and excellent uniformity in the fabricated SOT-MTJ devices, with a yield of up to 99.6%. The proposed integration process, featuring high yield, is anticipated to streamline the mass production of SOT-MRAM.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Authors:
Zihan Wang,
Siyang Song,
Cheng Luo,
Songhe Deng,
Weicheng Xie,
Linlin Shen
Abstract:
Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper propos…
▽ More
Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper proposes to comprehensively model multi-scale AU-related dynamic and hierarchical spatio-temporal relationship among AUs for their occurrences recognition. Specifically, we first propose a novel multi-scale temporal differencing network with an adaptive weighting block to explicitly capture facial dynamics across frames at different spatial scales, which specifically considers the heterogeneity of range and magnitude in different AUs' activation. Then, a two-stage strategy is introduced to hierarchically model the relationship among AUs based on their spatial distribution (i.e., local and cross-region AU relationship modelling). Experimental results achieved on BP4D and DISFA show that our approach is the new state-of-the-art in the field of AU occurrence recognition. Our code is publicly available at https://github.com/CVI-SZU/MDHR.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models
Authors:
Zhaohui Chen,
Elyas Asadi Shamsabadi,
Sheng Jiang,
Luming Shen,
Daniel Dias-da-Costa
Abstract:
Vision-based crack detection faces deployment challenges due to the size of robust models and edge device limitations. These can be addressed with lightweight models trained with knowledge distillation (KD). However, state-of-the-art (SOTA) KD methods compromise anti-noise robustness. This paper develops Robust Feature Knowledge Distillation (RFKD), a framework to improve robustness while retainin…
▽ More
Vision-based crack detection faces deployment challenges due to the size of robust models and edge device limitations. These can be addressed with lightweight models trained with knowledge distillation (KD). However, state-of-the-art (SOTA) KD methods compromise anti-noise robustness. This paper develops Robust Feature Knowledge Distillation (RFKD), a framework to improve robustness while retaining the precision of light models for crack segmentation. RFKD distils knowledge from a teacher model's logit layers and intermediate feature maps while leveraging mixed clean and noisy images to transfer robust patterns to the student model, improving its precision, generalisation, and anti-noise performance. To validate the proposed RFKD, a lightweight crack segmentation model, PoolingCrack Tiny (PCT), with only 0.5 M parameters, is also designed and used as the student to run the framework. The results show a significant enhancement in noisy images, with RFKD reaching a 62% enhanced mean Dice score (mDS) compared to SOTA KD methods.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement
Authors:
Xu Wu,
XianXu Hou,
Zhihui Lai,
Jie Zhou,
Ya-nan Zhang,
Witold Pedrycz,
Linlin Shen
Abstract:
Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refin…
▽ More
Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement to address these challenges. In particular, we reframe LLIE as learning an image-to-code map** from low-light images to discrete codebook, which has been learned from high-quality images. To enhance this process, a Semantic Embedding Module (SEM) is introduced to integrate semantic information with low-level features, and a Codebook Shift (CS) mechanism, designed to adapt the pre-learned codebook to better suit the distinct characteristics of our low-light dataset. Additionally, we present an Interactive Feature Transformation (IFT) module to refine texture and color information during image reconstruction, allowing for interactive enhancement based on user preferences. Extensive experiments on both real-world and synthetic benchmarks demonstrate that the incorporation of prior knowledge and controllable information transfer significantly enhances LLIE performance in terms of quality and fidelity. The proposed CodeEnhance exhibits superior robustness to various degradations, including uneven illumination, noise, and color distortion.
△ Less
Submitted 30 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Proximity-Induced Exchange Interaction: a New Pathway for Quantum Sensing using Spin Centers in Hexagonal Boron Nitride
Authors:
Lingnan Shen,
Di Xiao,
Ting Cao
Abstract:
Defects in hexagonal boron nitride (hBN), a two-dimensional van der Waals material, have raised wide range interest for its potential in various quantum applications. Due to hBN's 2D nature, spin center in hBN can be engineered in close proximity to target material, providing advantages over their 3D counterparts, such as nitrogen-vacancy (NV) center in diamond. Here we propose a novel quantum sen…
▽ More
Defects in hexagonal boron nitride (hBN), a two-dimensional van der Waals material, have raised wide range interest for its potential in various quantum applications. Due to hBN's 2D nature, spin center in hBN can be engineered in close proximity to target material, providing advantages over their 3D counterparts, such as nitrogen-vacancy (NV) center in diamond. Here we propose a novel quantum sensing protocol driven by exchange interaction between spin center in hBN and the underlying magnetic substrate induced by magnetic proximity effect. By first-principle calculation, we demonstrate the induced exchange interaction dominates over dipole-dipole interaction by orders of magnitude when in proximity. The interaction remains antiferromagnetic across all stacking configuration between the spin center in hBN and the target van der Waals magnets. Additionally, we explored the scaling behavior of the exchange field as a function of the spatial separation between the spin center and the targets.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Align as Ideal: Cross-Modal Alignment Binding for Federated Medical Vision-Language Pre-training
Authors:
Zitao Shuai,
Liyue Shen
Abstract:
Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for medical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for medical VLP while protecting data privacy. However, clien…
▽ More
Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for medical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for medical VLP while protecting data privacy. However, client data are often heterogeneous in real-world scenarios, and we observe that local training on heterogeneous client data would distort the multimodal representation learning and lead to biased cross-modal alignment. To address this challenge, we propose a Federated Align as IDeal (FedAID) framework for federated VLP with robustness to data heterogeneity, to bind local clients with an ideal crossmodal alignment. Specifically, to reduce distortions on global-aggregated features while learning diverse semantics from client datasets during local training, we propose to bind the cross-model aligned representation space learned by local models with an unbiased one via guidance-based regularization. Moreover, we employ a distribution-based min-max optimization to learn the unbiased cross-modal alignment at each communication turn of federated pre-training. The experiments on real-world datasets demonstrate our method successfully promotes efficient federated multimodal learning for medical VLP with data heterogeneity.
△ Less
Submitted 24 May, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Continuous Spiking Graph Neural Networks
Authors:
Nan Yin,
Mengzhu Wan,
Li Shen,
Hitesh Laxmichand Patel,
Baopu Li,
Bin Gu,
Huan Xiong
Abstract:
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs req…
▽ More
Continuous graph neural networks (CGNNs) have garnered significant attention due to their ability to generalize existing discrete graph neural networks (GNNs) by introducing continuous dynamics. They typically draw inspiration from diffusion-based methods to introduce a novel propagation scheme, which is analyzed using ordinary differential equations (ODE). However, the implementation of CGNNs requires significant computational power, making them challenging to deploy on battery-powered devices. Inspired by recent spiking neural networks (SNNs), which emulate a biological inference process and provide an energy-efficient neural architecture, we incorporate the SNNs with CGNNs in a unified framework, named Continuous Spiking Graph Neural Networks (COS-GNN). We employ SNNs for graph node representation at each time step, which are further integrated into the ODE process along with time. To enhance information preservation and mitigate information loss in SNNs, we introduce the high-order structure of COS-GNN, which utilizes the second-order ODE for spiking representation and continuous propagation. Moreover, we provide the theoretical proof that COS-GNN effectively mitigates the issues of exploding and vanishing gradients, enabling us to capture long-range dependencies between nodes. Experimental results on graph-based learning tasks demonstrate the effectiveness of the proposed COS-GNN over competitive baselines.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization
Authors:
Qi Zhang,
Yi Zhou,
Ashley Prater-Bennette,
Lixin Shen,
Shaofeng Zou
Abstract:
Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural netw…
▽ More
Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $χ^2$-divergences as a special case. We prove that our algorithm finds an $ε$-stationary point with a computational complexity of $\mathcal O(ε^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Scaling Crystal Structure Relaxation with a Universal Trustworthy Deep Generative Model
Authors:
Ziduo Yang,
Yiming Zhao,
Xiaoqing Liu,
Xiuying Zhang,
Yifan Li,
Qiujie Lyu,
Calvin Yu-Chian Chen,
Lei Shen
Abstract:
The evolution of AI and high-throughput technologies has boosted a rapid increase in the number of new materials, challenging our computational ability to comprehensively analyze their properties. Relaxed crystal structures often serve as the foundational basis for further property calculations. However, determining equilibrium structures traditionally involves computationally expensive iterative…
▽ More
The evolution of AI and high-throughput technologies has boosted a rapid increase in the number of new materials, challenging our computational ability to comprehensively analyze their properties. Relaxed crystal structures often serve as the foundational basis for further property calculations. However, determining equilibrium structures traditionally involves computationally expensive iterative calculations. Here, we develop DeepRelax, an efficient deep generative model designed for rapid structural relaxation without any iterative process. DeepRelax learns the equilibrium structural distribution, enabling it to predict relaxed structures directly from their unrelaxed counterparts. The ability to perform structural relaxation in just a few hundred milliseconds per structure, combined with the scalability of parallel processing, makes DeepRelax particularly useful for large-scale virtual screening. To demonstrate the universality of DeepRelax, we benchmark it against three different databases of X-Mn-O oxides, Materials Project, and Computational 2D Materials Database with various types of materials. In these tests, DeepRelax exhibits both high accuracy and efficiency in structural relaxation, as further validated by DFT calculations. Finally, we integrate DeepRelax with an implementation of uncertainty quantification, enhancing its reliability and trustworthiness in material discovery. This work provides an efficient and trustworthy method to significantly accelerate large-scale computations, offering substantial advancements in the field of computational materials science.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Sparse Recovery: The Square of $\ell_1/\ell_2$ Norms
Authors:
Jianqing Jia,
Ashley Prater-Bennette,
Lixin Shen,
Erin E. Tripp
Abstract:
This paper introduces a nonconvex approach to the problem of recovering sparse signals. We propose a novel model, termed the $τ_2$-model, which utilizes the square of $\ell_1/\ell_2$ norms for sparse recovery. This model is an advancement over the $\ell_0$ norm, which is often computationally intractable and less effective in handling practical scenarios. Our approach is grounded in the concept of…
▽ More
This paper introduces a nonconvex approach to the problem of recovering sparse signals. We propose a novel model, termed the $τ_2$-model, which utilizes the square of $\ell_1/\ell_2$ norms for sparse recovery. This model is an advancement over the $\ell_0$ norm, which is often computationally intractable and less effective in handling practical scenarios. Our approach is grounded in the concept of effective sparsity, which robustly measures the number of effective coordinates in a signal. We demonstrate that our model is a powerful alternative for sparse signal estimation, with the $τ_2$-model offering computational advantages and practical applicability. The model's formulation and the accompanying algorithm, based on Dinkelbach's procedure combined with a difference of convex functions strategy, are detailed. We further explore the properties of our model, including the existence of solutions under certain conditions, and discuss the algorithm's convergence properties. Numerical experiments with various sensing matrices are conducted to validate the effectiveness of our model.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Computing Proximity Operators of Scale and Signed Permutation Invariant Functions
Authors:
Jianqing Jia,
Ashley Prater-Bennette,
Lixin Shen
Abstract:
This paper investigates the computation of proximity operators for scale and signed permutation invariant functions. A scale-invariant function remains unchanged under uniform scaling, while a signed permutation invariant function retains its structure despite permutations and sign changes applied to its input variables. Noteworthy examples include the $\ell_0$ function and the ratios of…
▽ More
This paper investigates the computation of proximity operators for scale and signed permutation invariant functions. A scale-invariant function remains unchanged under uniform scaling, while a signed permutation invariant function retains its structure despite permutations and sign changes applied to its input variables. Noteworthy examples include the $\ell_0$ function and the ratios of $\ell_1/\ell_2$ and its square, with their proximity operators being particularly crucial in sparse signal recovery. We delve into the properties of scale and signed permutation invariant functions, delineating the computation of their proximity operators into three sequential steps: the $\mathbf{w}$-step, $r$-step, and $d$-step. These steps collectively form a procedure termed as WRD, with the $\mathbf{w}$-step being of utmost importance and requiring careful treatment. Leveraging this procedure, we present a method for explicitly computing the proximity operator of $(\ell_1/\ell_2)^2$ and introduce an efficient algorithm for the proximity operator of $\ell_1/\ell_2$.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Mistake, Manipulation and Margin Guarantees in Online Strategic Classification
Authors:
Lingqing Shen,
Nam Ho-Nguyen,
Khanh-Hung Giang-Tran,
Fatma Kılınç-Karzan
Abstract:
We consider an online strategic classification problem where each arriving agent can manipulate their true feature vector to obtain a positive predicted label, while incurring a cost that depends on the amount of manipulation. The learner seeks to predict the agent's true label given access to only the manipulated features. After the learner releases their prediction, the agent's true label is rev…
▽ More
We consider an online strategic classification problem where each arriving agent can manipulate their true feature vector to obtain a positive predicted label, while incurring a cost that depends on the amount of manipulation. The learner seeks to predict the agent's true label given access to only the manipulated features. After the learner releases their prediction, the agent's true label is revealed. Previous algorithms such as the strategic perceptron guarantee finitely many mistakes under a margin assumption on agents' true feature vectors. However, these are not guaranteed to encourage agents to be truthful. Promoting truthfulness is intimately linked to obtaining adequate margin on the predictions, thus we provide two new algorithms aimed at recovering the maximum margin classifier in the presence of strategic agent behavior. We prove convergence, finite mistake and finite manipulation guarantees for a variety of agent cost structures. We also provide generalized versions of the strategic perceptron with mistake guarantees for different costs. Our numerical study on real and synthetic data demonstrates that the new algorithms outperform previous ones in terms of margin, number of manipulation and number of mistakes.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Generalized Chern-Simons-Schrodinger system with critical exponential growth: the zero mass case
Authors:
Liejun Shen,
Marco Squassina
Abstract:
We consider the existence of ground state solutions for a class of zero-mass Chern-Simons-Schrödinger systems \[
\left\{
\begin{array}{ll} \displaystyle -Δu +A_0 u+\sum\limits_{j=1}^2A_j^2 u=f(u)-a(x)|u|^{p-2}u, \newline
\displaystyle \partial_1A_2-\partial_2A_1=-\frac{1}{2}|u|^2,~\partial_1A_1+\partial_2A_2=0, \newline
\displaystyle \partial_1A_0=A_2|u|^2,~ \partial_2A_0=-A_1|u|^2,
\end…
▽ More
We consider the existence of ground state solutions for a class of zero-mass Chern-Simons-Schrödinger systems \[
\left\{
\begin{array}{ll} \displaystyle -Δu +A_0 u+\sum\limits_{j=1}^2A_j^2 u=f(u)-a(x)|u|^{p-2}u, \newline
\displaystyle \partial_1A_2-\partial_2A_1=-\frac{1}{2}|u|^2,~\partial_1A_1+\partial_2A_2=0, \newline
\displaystyle \partial_1A_0=A_2|u|^2,~ \partial_2A_0=-A_1|u|^2,
\end{array} \right. \] where $a:\mathbb R^2\to\mathbb R^+$ is an external potential, $p\in(1,2)$ and $f\in \mathcal{C}(\mathbb R)$ denotes a nonlinearity that fulfills the critical exponential growth in the Trudinger-Moser sense at infinity. By introducing an improvement of the version of Trudinger-Moser inequality, we are able to investigate the existence of positive ground state solutions for the given system using variational method.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Compensating for charge sharing by a deep-learning method: a preliminary experimental study
Authors:
Shengzi Zhao,
Le Shen,
Yuxing Xing
Abstract:
Photon counting detectors (PCDs) bring valuable advantages to diagnostic computed tomography (CT), including lower noise and higher resolution than energy integrating detectors. However, there are still several nonideal factors preventing PCDs from meeting people's expectations, for example, charge sharing and pile up. In this paper, we did some preliminary work on charge sharing and conducted an…
▽ More
Photon counting detectors (PCDs) bring valuable advantages to diagnostic computed tomography (CT), including lower noise and higher resolution than energy integrating detectors. However, there are still several nonideal factors preventing PCDs from meeting people's expectations, for example, charge sharing and pile up. In this paper, we did some preliminary work on charge sharing and conducted an experimental study using an XCounter PCD to compare the effects of no anti-coincidence, anti-coincidence by hardware and charge sharing compensation by a deep learning method. In our results, a smaller bias and standard deviation are obtained from deep learning method than directly from no-anti-coincidence mode of the detector. Our network also outperforms the anti-coincidence mode of the detector in the low energy bin and has smaller standard deviation in the high energy bin. The results validate that a deep learning method is suitable to compensate for charge sharing.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging
Authors:
Lingdong Shen,
Fangxin Shang,
Xiaoshuang Huang,
Yehui Yang,
Haifeng Huang,
Shiming Xiang
Abstract:
In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of mod…
▽ More
In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.
△ Less
Submitted 29 May, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Heterogeneous Federated Learning with Splited Language Model
Authors:
Yifan Shi,
Yuhui Zhang,
Ziyue Huang,
Xiaofeng Yang,
Li Shen,
Wei Chen,
Xueqian Wang
Abstract:
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice, which gathers the strengths of both Federated Learning (FL) and Split Learning (SL) paradigms, to ensure model privacy while diminishing the resource overhead of each client, especially on large transformer models in a resource-constrained environment, e.g., Internet of Things (IoT). However, almost all works…
▽ More
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice, which gathers the strengths of both Federated Learning (FL) and Split Learning (SL) paradigms, to ensure model privacy while diminishing the resource overhead of each client, especially on large transformer models in a resource-constrained environment, e.g., Internet of Things (IoT). However, almost all works merely investigate the performance with simple neural network models in FSL. Despite the minor efforts focusing on incorporating Vision Transformers (ViT) as model architectures, they train ViT from scratch, thereby leading to enormous training overhead in each device with limited resources. Therefore, in this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness. Furthermore, we propose FedVZ to hinder the gradient inversion attack, especially having the capability compatible with black-box scenarios, where the gradient information is unavailable. Concretely, FedVZ approximates the server gradient by utilizing a zeroth-order (ZO) optimization, which replaces the backward propagation with just one forward process. Empirically, we are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits. Our experiments verify the effectiveness of our algorithms.
△ Less
Submitted 19 April, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning
Authors:
Changtong Zan,
Liang Ding,
Li Shen,
Yibing Zhen,
Weifeng Liu,
Dacheng Tao
Abstract:
Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from develo** accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the pe…
▽ More
Translation-tailored Large language models (LLMs) exhibit remarkable translation capabilities, even competing with supervised-trained commercial translation systems. However, off-target translation remains an unsolved problem, especially for low-resource languages, hindering us from develo** accurate LLMs-based translation models. To mitigate the off-target translation problem and enhance the performance of LLMs on translation, recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs by feeding few-shot demonstrations. However, these methods essentially do not improve LLM's ability to follow translation instructions, especially the language direction information. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs. Specifically, we first tune LLMs with the maximum likelihood estimation loss on the translation dataset to elicit the basic translation capabilities. In the second stage, we construct instruction-conflicting samples by randomly replacing the translation directions with a wrong one within the instruction, and then introduce an extra unlikelihood loss to learn those samples. Experiments on IWSLT and WMT benchmarks upon the LLaMA model spanning 16 zero-shot directions show that, compared to the competitive baseline -- translation-finetuned LLama, our method could effectively reduce the off-target translation ratio (averagely -53.3\%), thus improving translation quality with average +5.7 SacreBLEU and +16.4 BLEURT. Analysis shows that our method could preserve the model's general task performance on AlpacaEval. Code and models will be released at \url{https://github.com/alphadl/LanguageAware_Tuning}.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
A Unified and General Framework for Continual Learning
Authors:
Zhenyi Wang,
Yan Li,
Li Shen,
Heng Huang
Abstract:
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their…
▽ More
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their approaches. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies. Notably, this new framework is capable of encompassing established CL approaches as special instances within a unified and general optimization objective. An intriguing finding is that despite their diverse origins, these methods share common mathematical structures. This observation highlights the compatibility of these seemingly distinct techniques, revealing their interconnectedness through a shared underlying optimization objective. Moreover, the proposed general framework introduces an innovative concept called refresh learning, specifically designed to enhance the CL performance. This novel approach draws inspiration from neuroscience, where the human brain often sheds outdated information to improve the retention of crucial knowledge and facilitate the acquisition of new information. In essence, refresh learning operates by initially unlearning current data and subsequently relearning it. It serves as a versatile plug-in that seamlessly integrates with existing CL methods, offering an adaptable and effective enhancement to the learning process. Extensive experiments on CL benchmarks and theoretical analysis demonstrate the effectiveness of the proposed refresh learning. Code is available at \url{https://github.com/joey-wang123/CL-refresh-learning}.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems
Authors:
Xian Wang,
Luyao Shen,
Lik-Hang Lee
Abstract:
The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a sy…
▽ More
The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a systematic search strategy based on the PRISMA methodology. From the initial 2,561 articles selected, 100 research papers that met our inclusion criteria were included. We categorized and summarized the domain in detail, delving into XR technologies, including augmented reality (AR), virtual reality (VR), and mixed reality (MR), and their applications in facilitating intuitive and effective remote control and interaction with robotic systems. The survey highlights existing articles on the application of XR technologies, user experience enhancement, and various interaction designs for XR in remote HRI, providing insights into current trends and future directions. We also identified potential gaps and opportunities for future research to improve remote HRI systems through XR technology to guide and inform future XR and robotics research.
△ Less
Submitted 26 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Authors:
Huiqiang Sun,
Xingyi Li,
Liao Shen,
Xinyi Ye,
Ke Xian,
Zhiguo Cao
Abstract:
Recent advancements in dynamic neural radiance field methods have yielded remarkable outcomes. However, these approaches rely on the assumption of sharp input images. When faced with motion blur, existing dynamic NeRF methods often struggle to generate high-quality novel views. In this paper, we propose DyBluRF, a dynamic radiance field approach that synthesizes sharp novel views from a monocular…
▽ More
Recent advancements in dynamic neural radiance field methods have yielded remarkable outcomes. However, these approaches rely on the assumption of sharp input images. When faced with motion blur, existing dynamic NeRF methods often struggle to generate high-quality novel views. In this paper, we propose DyBluRF, a dynamic radiance field approach that synthesizes sharp novel views from a monocular video affected by motion blur. To account for motion blur in input images, we simultaneously capture the camera trajectory and object Discrete Cosine Transform (DCT) trajectories within the scene. Additionally, we employ a global cross-time rendering approach to ensure consistent temporal coherence across the entire scene. We curate a dataset comprising diverse dynamic scenes that are specifically tailored for our task. Experimental results on our dataset demonstrate that our method outperforms existing approaches in generating sharp novel views from motion-blurred inputs while maintaining spatial-temporal consistency of the scene.
△ Less
Submitted 19 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Existence and concentration of normalized solutions for $p$-Laplacian equations with logarithmic nonlinearity
Authors:
Liejun Shen,
Marco Squassina
Abstract:
We investigate the existence and concentration of normalized solutions for a $p$-Laplacian problem with logarithmic nonlinearity of type \[
\left\{
\begin{array}{ll}
\displaystyle -\varepsilon^pΔ_p u+V(x)|u|^{p-2}u=λ|u|^{p-2}u+|u|^{p-2}u\log|u|^p ~\text{in}~\mathbb R^N,\newline
\displaystyle \int_{\mathbb R^N}|u|^pdx=a^p\varepsilon^N,
\end{array}
\right. \] where $a,\varepsilon> 0$,…
▽ More
We investigate the existence and concentration of normalized solutions for a $p$-Laplacian problem with logarithmic nonlinearity of type \[
\left\{
\begin{array}{ll}
\displaystyle -\varepsilon^pΔ_p u+V(x)|u|^{p-2}u=λ|u|^{p-2}u+|u|^{p-2}u\log|u|^p ~\text{in}~\mathbb R^N,\newline
\displaystyle \int_{\mathbb R^N}|u|^pdx=a^p\varepsilon^N,
\end{array}
\right. \] where $a,\varepsilon> 0$, $λ\in\mathbb R$ is known as the Lagrange multiplier, $Δ_p\cdot =\text{div} (|\nabla \cdot|^{p-2}\nabla \cdot)$ denotes the usual $p$-Laplacian operator with $2\leq p < N$ and $V \in \mathcal{C}^0(\mathbb R^N)$ is the potential which satisfies some suitable assumptions. We prove that the number of positive solutions depends on the profile of $V$ and each solution concentrates around its corresponding global minimum point of $V$ in the semiclassical limit when $\varepsilon\to0^+$ using variational method.
Moreover, we also get the existence of normalized solutions for some logarithmic $p$-Laplacian equations involving mass-supercritical nonlinearities.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Rediscovering BCE Loss for Uniform Classification
Authors:
Qiufu Li,
Xi Jia,
Jiancan Zhou,
Linlin Shen,
**ming Duan
Abstract:
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable…
▽ More
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples rather than adaptive threshold classifying each individual sample. We also propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification. Furthermore, begin with a naive loss, we mathematically derive a loss function suitable for the uniform classification, which is the BCE function integrated with a unified bias. We demonstrate the unified threshold could be learned via the bias. The extensive experiments on six classification datasets and three feature extraction models show that, compared to the SoftMax loss, the models trained with the BCE loss not only exhibit higher uniform classification accuracy but also higher sample-wise classification accuracy. In addition, the learned bias from BCE loss is very close to the unified threshold used in the uniform classification. The features extracted by the models trained with BCE loss not only possess uniformity but also demonstrate better intra-class compactness and inter-class distinctiveness, yielding superior performance on open-set tasks such as face recognition.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation
Authors:
Chenhui Zhao,
Liyue Shen
Abstract:
Precision medicine, such as patient-adaptive treatments utilizing medical images, poses new challenges for image segmentation algorithms due to (1) the large variability across different patients and (2) the limited availability of annotated data for each patient. In this work, we propose a data-efficient segmentation method to address these challenges, namely Part-aware Personalized Segment Anyth…
▽ More
Precision medicine, such as patient-adaptive treatments utilizing medical images, poses new challenges for image segmentation algorithms due to (1) the large variability across different patients and (2) the limited availability of annotated data for each patient. In this work, we propose a data-efficient segmentation method to address these challenges, namely Part-aware Personalized Segment Anything Model (P^2SAM). Without any model fine-tuning, P^2SAM enables seamless adaptation to any new patients relying only on one-shot patient-specific data. We introduce a novel part-aware prompt mechanism to select multiple-point prompts based on part-level features of the one-shot data. To further promote the robustness of the selected prompt, we propose a retrieval approach to handle outlier prompts. Extensive experiments demonstrate that P^2SAM improves the performance by +8.0% and +2.0% mean Dice score within two patient-specific segmentation settings, and exhibits impressive generality across different application domains, e.g., +6.4% mIoU on the PerSeg benchmark. Code will be released upon acceptance.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Topological skyrmions in monolayer multiferroic MoPtGe2S6
Authors:
Zuxin Fu,
Kuanrong Hao,
Min Guo,
**g**g He,
Xiaohong Yan,
Yangbo Zhou,
Lei Shen,
Jiaren Yuan
Abstract:
Two-dimensional (2D) multiferroic materials with coexisting ferroelectricity and ferromagnetism have garnered substantial attention for their intriguing physical properties and diverse promising applications in spintronics. For example, multiferroic materials with electronically controlled broken central symmetry provide a versatile platform for designing and manipulating topological skyrmions and…
▽ More
Two-dimensional (2D) multiferroic materials with coexisting ferroelectricity and ferromagnetism have garnered substantial attention for their intriguing physical properties and diverse promising applications in spintronics. For example, multiferroic materials with electronically controlled broken central symmetry provide a versatile platform for designing and manipulating topological skyrmions and diverse spintronic applications. Here, we investigate the complex magnetic properties of room-temerature multiferroic material MoPtGe2S6 and its electrical control of topological skyrmions using first-principles calculations and atomistic micromagnetic simulations. A sizable Dzyaloshinskii-Moriya interaction (DMI) (2.1 meV) is found in the multiferroic material MoPtGe2S6 with an electrically polarized ground state. The magnetic skyrmions can be stabilized in monolayer MoPtGe2S6 under zero magnetic field, and the chirality of skyrmions can be reversed with electric field-induced flip** of electrical polarization due to the reversed chirality of the DMI. Furthermore, an external magnetic fielc can reverse the magnetization direction and topological charge of the skyrmions as well as tune the size of skyrmions. These results demonstrate that the monolayer MoPtGe2S6 can enrich the 2D skyrmion community and pave the way for electronically controlled spintronic devices.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Fingerprint Presentation Attack Detector Using Global-Local Model
Authors:
Haozhe Liu,
Wentian Zhang,
Feng Liu,
Haoqian Wu,
Linlin Shen
Abstract:
The vulnerability of automated fingerprint recognition systems (AFRSs) to presentation attacks (PAs) promotes the vigorous development of PA detection (PAD) technology. However, PAD methods have been limited by information loss and poor generalization ability, resulting in new PA materials and fingerprint sensors. This paper thus proposes a global-local model-based PAD (RTK-PAD) method to overcome…
▽ More
The vulnerability of automated fingerprint recognition systems (AFRSs) to presentation attacks (PAs) promotes the vigorous development of PA detection (PAD) technology. However, PAD methods have been limited by information loss and poor generalization ability, resulting in new PA materials and fingerprint sensors. This paper thus proposes a global-local model-based PAD (RTK-PAD) method to overcome those limitations to some extent. The proposed method consists of three modules, called: 1) the global module; 2) the local module; and 3) the rethinking module. By adopting the cut-out-based global module, a global spoofness score predicted from nonlocal features of the entire fingerprint images can be achieved. While by using the texture in-painting-based local module, a local spoofness score predicted from fingerprint patches is obtained. The two modules are not independent but connected through our proposed rethinking module by localizing two discriminative patches for the local module based on the global spoofness score. Finally, the fusion spoofness score by averaging the global and local spoofness scores is used for PAD. Our experimental results evaluated on LivDet 2017 show that the proposed RTK-PAD can achieve an average classification error (ACE) of 2.28% and a true detection rate (TDR) of 91.19% when the false detection rate (FDR) equals 1.0%, which significantly outperformed the state-of-the-art methods by $\sim$10% in terms of TDR (91.19% versus 80.74%).
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies
Authors:
Xiao Ye,
Andrew Wang,
Jacob Choi,
Yining Lu,
Shreya Sharma,
Lingfeng Shen,
Vijay Tiyyala,
Nicholas Andrews,
Daniel Khashabi
Abstract:
Humans regularly engage in analogical thinking, relating personal experiences to current situations ($X$ is analogous to $Y$ because of $Z$). Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively. Can language models (LMs) do the same? To answer this question, we propose ANALOBENCH, a benchmark to determine analogical…
▽ More
Humans regularly engage in analogical thinking, relating personal experiences to current situations ($X$ is analogous to $Y$ because of $Z$). Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively. Can language models (LMs) do the same? To answer this question, we propose ANALOBENCH, a benchmark to determine analogical reasoning ability in LMs. Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. We test a broad collection of proprietary models (e.g., GPT family, Claude V2) and open source models such as LLaMA2. As in prior results, scaling up LMs results in some performance boosts. Surprisingly, scale offers minimal gains when, (i) analogies involve lengthy scenarios, or (ii) recalling relevant scenarios from a large pool of information, a process analogous to finding a needle in a haystack. We hope these observations encourage further research in this field.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Infinitely many solutions for a class of fractional Schrodinger equations coupled with neutral scalar field
Authors:
Liejun Shen,
Marco Squassina,
Xiaoyu Zeng
Abstract:
We study the fractional Schrödinger equations coupled with a neutral scalar field
$$
(-Δ)^s u+V(x)u=K(x)φu +g(x)|u|^{q-2}u, \quad x\in \mathbb{R}^3,\qquad
(I-Δ)^t φ=K(x)u^2, \quad x\in \mathbb{R}^3,
$$ where $(-Δ)^s$ and $(I-Δ)^t$ denote the fractional Laplacian and Bessel operators with $\frac{3}{4} <s<1$ and $0<t<1$, respectively. Under some suitable assumptions for the external potentia…
▽ More
We study the fractional Schrödinger equations coupled with a neutral scalar field
$$
(-Δ)^s u+V(x)u=K(x)φu +g(x)|u|^{q-2}u, \quad x\in \mathbb{R}^3,\qquad
(I-Δ)^t φ=K(x)u^2, \quad x\in \mathbb{R}^3,
$$ where $(-Δ)^s$ and $(I-Δ)^t$ denote the fractional Laplacian and Bessel operators with $\frac{3}{4} <s<1$ and $0<t<1$, respectively. Under some suitable assumptions for the external potentials $V$, $K$ and $g$, given $q\in(1,2)\cup(2,2_s^*)$ with $2_s^*:= \frac{6}{3-2s}$, with the help of an improved Fountain theorem dealing with a class of strongly indefinite variational problems approached by Gu-Zhou [Adv. Nonlinear Stud., {\bf 17} (2017), 727--738], we show that the system admits infinitely many nontrivial solutions.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Revisiting Knowledge Distillation for Autoregressive Language Models
Authors:
Qihuang Zhong,
Liang Ding,
Li Shen,
Juhua Liu,
Bo Du,
Dacheng Tao
Abstract:
Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that di…
▽ More
Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we empirically find that larger teacher LMs might dramatically result in a poorer student. In response to this problem, we conduct a series of analyses and reveal that different tokens have different teaching modes, neglecting which will lead to performance degradation. Motivated by this, we propose a simple yet effective adaptive teaching approach (ATKD) to improve the KD. The core of ATKD is to reduce rote learning and make teaching more diverse and flexible. Extensive experiments on 8 LM tasks show that, with the help of ATKD, various baseline KD methods can achieve consistent and significant performance gains (up to +3.04% average score) across all model types and sizes. More encouragingly, ATKD can improve the student model generalization effectively.
△ Less
Submitted 16 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Authors:
Yifei Cheng,
Li Shen,
Linli Xu,
Xun Qian,
Shiwei Wu,
Yiming Zhou,
Tie Zhang,
Dacheng Tao,
Enhong Chen
Abstract:
Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional compression in one iteration with higher communication cost, or bidirectional compression with slower convergence rate. In this work, we propose the Local Immed…
▽ More
Gradient compression with error compensation has attracted significant attention with the target of reducing the heavy communication overhead in distributed learning. However, existing compression methods either perform only unidirectional compression in one iteration with higher communication cost, or bidirectional compression with slower convergence rate. In this work, we propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm to break the above bottlenecks based on bidirectional compression and carefully designed compensation approaches. Specifically, the bidirectional compression technique is to reduce the communication cost, and the compensation technique compensates the local compression error to the model update immediately while only maintaining the global error variable on the server throughout the iterations to boost its efficacy. Theoretically, we prove that LIEC-SGD is superior to previous works in either the convergence rate or the communication cost, which indicates that LIEC-SGD could inherit the dual advantages from unidirectional compression and bidirectional compression. Finally, experiments of training deep neural networks validate the effectiveness of the proposed LIEC-SGD algorithm.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
Authors:
Wenxuan Wang,
Yihang Su,
**gyuan Huan,
Jie Liu,
Wenting Chen,
Yudi Zhang,
Cheng-Yi Li,
Kao-Jung Chang,
Xiaohan Xin,
Linlin Shen,
Michael R. Lyu
Abstract:
The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the intricate nature of the real-world diagnostic frameworks, which encompass diverse medical specialties and involve com…
▽ More
The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the intricate nature of the real-world diagnostic frameworks, which encompass diverse medical specialties and involve complex clinical decisions. Moreover, these benchmarks are susceptible to data leakage, since Med-MLLMs are trained on large assemblies of publicly available data. Thus, an isolated and clinically representative benchmark is highly desirable for credible Med-MLLMs evaluation. To this end, we introduce Asclepius, a novel Med-MLLM benchmark that rigorously and comprehensively assesses model capability in terms of: distinct medical specialties (cardiovascular, gastroenterology, etc.) and different diagnostic capacities (perception, disease analysis, etc.). Grounded in 3 proposed core principles, Asclepius ensures a comprehensive evaluation by encompassing 15 medical specialties, stratifying into 3 main categories and 8 sub-categories of clinical tasks, and exempting from train-validate contamination. We further provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists, providing insights into their competencies and limitations in various medical contexts. Our work not only advances the understanding of Med-MLLMs' capabilities but also sets a precedent for future evaluations and the safe deployment of these models in clinical environments. We launch and maintain a leaderboard for community assessment of Med-MLLM capabilities (https://asclepius-med.github.io/).
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrap**
Authors:
Haoyu Wang,
Guozheng Ma,
Ziqiao Meng,
Zeyu Qin,
Li Shen,
Zhong Zhang,
Bingzhe Wu,
Liu Liu,
Yatao Bian,
Tingyang Xu,
Xueqian Wang,
Peilin Zhao
Abstract:
Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, most current methods complete the data collection and training steps in a single round, which may overlook the continuously improving ability of self-aligned models. This gives rise to a key query: What if we do multi-time bootstrap** self-alignment? Does this strategy en…
▽ More
Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, most current methods complete the data collection and training steps in a single round, which may overlook the continuously improving ability of self-aligned models. This gives rise to a key query: What if we do multi-time bootstrap** self-alignment? Does this strategy enhance model performance or lead to rapid degradation? In this paper, our pioneering exploration delves into the impact of bootstrap** self-alignment on large language models. Our findings reveal that bootstrap** self-alignment markedly surpasses the single-round approach, by guaranteeing data diversity from in-context learning. To further exploit the capabilities of bootstrap**, we investigate and adjust the training order of data, which yields improved performance of the model. Drawing on these findings, we propose Step-On-Feet Tuning (SOFT) which leverages model's continuously enhanced few-shot ability to boost zero or one-shot performance. Based on easy-to-hard training recipe, we propose SOFT+ which further boost self-alignment's performance. Our experiments demonstrate the efficiency of SOFT (SOFT+) across various classification and generation tasks, highlighting the potential of bootstrap** self-alignment on continually enhancing model alignment performance.
△ Less
Submitted 27 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial Imagery
Authors:
Wenhui Chang,
Hongming Chen,
Xin He,
Xiang Chen,
Liangduo Shen
Abstract:
Raindrops adhering to the lens of UAVs can obstruct visibility of the background scene and degrade image quality. Despite recent progress in image deraining methods and datasets, there is a lack of focus on raindrop removal from UAV aerial imagery due to the unique challenges posed by varying angles and rapid movement during drone flight. To fill the gap in this research, we first construct a new…
▽ More
Raindrops adhering to the lens of UAVs can obstruct visibility of the background scene and degrade image quality. Despite recent progress in image deraining methods and datasets, there is a lack of focus on raindrop removal from UAV aerial imagery due to the unique challenges posed by varying angles and rapid movement during drone flight. To fill the gap in this research, we first construct a new benchmark dataset for removing raindrops from UAV images, called UAV-Rain1k. In this letter, we provide a dataset generation pipeline, which includes modeling raindrop shapes using Blender, collecting background images from various UAV angles, random sampling of rain masks and etc. Based on the proposed benchmark, we further present a comprehensive evaluation of existing representative image deraining algorithms, and reveal future research opportunities worth exploring. The proposed dataset is publicly available at https://github.com/cschenxiang/UAV-Rain1k.
△ Less
Submitted 12 April, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained War**
Authors:
Qinliang Lin,
Cheng Luo,
Zenghao Niu,
Xilin He,
Weicheng Xie,
Yuanbo Hou,
Linlin Shen,
Siyang Song
Abstract:
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propos…
▽ More
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained War** Attack (DeCoWA), that can be effectively applied to cross model genus attack. Specifically, DeCoWA firstly augments input examples via an elastic deformation, namely Deformation-Constrained War** (DeCoW), to obtain rich local details of the augmented input. To avoid severe distortion of global semantics led by random deformation, DeCoW further constrains the strength and direction of the war** transformation by a novel adaptive control strategy. Extensive experiments demonstrate that the transferable examples crafted by our DeCoWA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at https://github.com/LinQinLiang/DeCoWA.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Representation Surgery for Multi-Task Model Merging
Authors:
Enneng Yang,
Li Shen,
Zhenyi Wang,
Guibing Guo,
Xiaojun Chen,
Xingwei Wang,
Dacheng Tao
Abstract:
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution o…
▽ More
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias. That is, there is a significant discrepancy in the representation distribution between the merged and individual models, resulting in poor performance of merged MTL. In this paper, we propose a representation surgery solution called "Surgery" to reduce representation bias in the merged model. Specifically, Surgery is a lightweight task-specific module that takes the representation of the merged model as input and attempts to output the biases contained in the representation from the merged model. We then designed an unsupervised optimization objective that updates the Surgery module by minimizing the distance between the merged model's representation and the individual model's representation. Extensive experiments demonstrate significant MTL performance improvements when our Surgery module is applied to state-of-the-art (SOTA) model merging schemes.
△ Less
Submitted 28 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning
Authors:
Yaning Zhang,
Zitong Yu,
Xiaobin Huang,
Linlin Shen,
Jianfeng Ren
Abstract:
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Thus, benchmarking and advancing techniques detecting digital manipulation become an urgent issue. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated us…
▽ More
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Thus, benchmarking and advancing techniques detecting digital manipulation become an urgent issue. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology, which does not involve the most recent technologies like diffusion. The diversity and quality of images generated by diffusion models have been significantly improved and thus a much more challenging face forgery dataset shall be used to evaluate SOTA forgery detection literature. In this paper, we propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators. In addition to evaluating SOTA approaches on our benchmark, we design an innovative cross appearance-edge learning (CAEL) detector to capture multi-grained appearance and edge global representations, and detect discriminative and general forgery traces. Moreover, we devise an appearance-edge cross-attention (AECA) module to explore the various integrations across two domains. Extensive experiment results and visualizations show that our detection model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations. Code and datasets will be available at \url{https://github.com/Jenine-321/GenFace
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
Authors:
Anke Tang,
Li Shen,
Yong Luo,
Nan Yin,
Lefei Zhang,
Dacheng Tao
Abstract:
Merging various task-specific Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. Existing methods have primarily focused on seeking a static optimal solution within the original model parameter space. A notable challenge is mitig…
▽ More
Merging various task-specific Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. Existing methods have primarily focused on seeking a static optimal solution within the original model parameter space. A notable challenge is mitigating the interference between parameters of different models, which can substantially deteriorate performance. In this paper, we propose to merge most of the parameters while upscaling the MLP of the Transformer layers to a weight-ensembling mixture of experts (MoE) module, which can dynamically integrate shared and task-specific knowledge based on the input, thereby providing a more flexible solution that can adapt to the specific needs of each instance. Our key insight is that by identifying and separating shared knowledge and task-specific knowledge, and then dynamically integrating them, we can mitigate the parameter interference problem to a great extent. We conduct the conventional multi-task model merging experiments and evaluate the generalization and robustness of our method. The results demonstrate the effectiveness of our method and provide a comprehensive understanding of our method. The code is available at https://github.com/tanganke/weight-ensembling_MoE
△ Less
Submitted 7 June, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Multimodal Neurodegenerative Disease Subty** Explained by ChatGPT
Authors:
Diego Machado Reyes,
Hanqing Chao,
Juergen Hahn,
Li Shen,
**kun Yan
Abstract:
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stop** disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to cl…
▽ More
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stop** disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to classify the subtypes at later stages of AD or related disorders, but struggle when predicting at the asymptomatic or prodromal stage. Moreover, most existing models either lack explainability behind the classification or only use a single modality for the assessment, limiting scope of its analysis. Thus, we propose a multimodal framework that uses early-stage indicators such as imaging, genetics and clinical assessments to classify AD patients into subtypes at early stages. Similarly, we build prompts and use large language models, such as ChatGPT, to interpret the findings of our model. In our framework, we propose a tri-modal co-attention mechanism (Tri-COAT) to explicitly learn the cross-modal feature associations. Our proposed model outperforms baseline models and provides insight into key cross-modal feature associations supported by known biological mechanisms.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation
Authors:
Lulan Shen,
Ali Edalati,
Brett Meyer,
Warren Gross,
James J. Clark
Abstract:
This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses ad…
▽ More
This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses adversarial attacks to correct incorrect training-set classifications. The incorrectly classified samples of the training set are removed and replaced with the adversarially corrected samples to form a new training set, and then, in the second stage, domain adaptation is performed back to the original training set. Extensive experimental validations show significant accuracy boosts of over 5% on the CIFAR-100 dataset. The technique can be straightforwardly applied to refinement of weight-quantized neural networks, where experiments show substantial enhancement in performance over the baseline. The adversarial correction technique also results in enhanced robustness to adversarial attacks.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
Authors:
Lingfeng Shen,
Weiting Tan,
Sihao Chen,
Yunmo Chen,
**gyu Zhang,
Haoran Xu,
Boyuan Zheng,
Philipp Koehn,
Daniel Khashabi
Abstract:
As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research. This paper examines the variations in safety challenges faced by LLMs across different languages and discusses approaches to alleviating such concerns. By comparing how state-of-the-art LLMs respond to the same set of malicious…
▽ More
As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research. This paper examines the variations in safety challenges faced by LLMs across different languages and discusses approaches to alleviating such concerns. By comparing how state-of-the-art LLMs respond to the same set of malicious prompts written in higher- vs. lower-resource languages, we observe that (1) LLMs tend to generate unsafe responses much more often when a malicious prompt is written in a lower-resource language, and (2) LLMs tend to generate more irrelevant responses to malicious prompts in lower-resource languages. To understand where the discrepancy can be attributed, we study the effect of instruction tuning with reinforcement learning from human feedback (RLHF) or supervised finetuning (SFT) on the HH-RLHF dataset. Surprisingly, while training with high-resource languages improves model alignment, training in lower-resource languages yields minimal improvement. This suggests that the bottleneck of cross-lingual alignment is rooted in the pretraining stage. Our findings highlight the challenges in cross-lingual LLM safety, and we hope they inform future research in this direction.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Characterizing the Average Interstellar Medium Conditions of Galaxies at $z\sim$ 5.6-9 with UV and Optical Nebular Lines
Authors:
Weida Hu,
Casey Papovich,
Mark Dickinson,
Robert Kennicutt,
Lu Shen,
Ricardo O. Amorín,
Pablo Arrabal Haro,
Micaela B. Bagley,
Rachana Bhatawdekar,
Nikko J. Cleri,
Justin W. Cole,
Avishai Dekel,
Alexander de la Vega,
Steven L. Finkelstein,
Norman A. Grogin,
Nimish P. Hathi,
Michaela Hirschmann,
Benne W. Holwerda,
Taylor A. Hutchison,
Intae Jung,
Anton M. Koekemoer,
Jeyhan S. Kartaltepe,
Ray A. Lucas,
Mario Llerena,
S. Mascia
, et al. (8 additional authors not shown)
Abstract:
Ultraviolet (UV; rest-frame $\sim1200-2000$ A) spectra provide a wealth of diagnostics to characterize fundamental galaxy properties, such as their chemical enrichment, the nature of their stellar populations, and their amount of Lyman-continuum (LyC) radiation. In this work, we leverage publicly released JWST data to construct the rest-frame UV-to-optical composite spectrum of a sample of 63 gala…
▽ More
Ultraviolet (UV; rest-frame $\sim1200-2000$ A) spectra provide a wealth of diagnostics to characterize fundamental galaxy properties, such as their chemical enrichment, the nature of their stellar populations, and their amount of Lyman-continuum (LyC) radiation. In this work, we leverage publicly released JWST data to construct the rest-frame UV-to-optical composite spectrum of a sample of 63 galaxies at $5.6<z<9$, spanning the wavelength range from 1500 to 5200 A. Based on the composite spectrum, we derive an average dust attenuation $E(B-V)_\mathrm{gas}=0.16^{+0.10}_{-0.11}$ from \hb/\hg, electron density $n_e = 570^{+510}_{-290}$ cm$^{-3}$ from the [O II] doublet ratio, electron temperature $T_e = 17000^{+1500}_{-1500}$ K from the [O III] $\lambda4363$/ [O III] $\lambda5007$ ratio, and an ionization parameter $\log(U)=-2.18^{+0.03}_{-0.03}$ from the [O III]/[O II] ratio. Using a direct $T_e$ method, we calculate an oxygen abundance $12+\log\mathrm{(O/H)}=7.67\pm0.08$ and the carbon-to-oxygen (C/O) abundance ratio $\log\mathrm{(C/O)}=-0.87^{+0.13}_{-0.10}$. This C/O ratio is smaller than compared to $z=0$ and $z=2$ - 4 star-forming galaxies, albeit with moderate significance. This indicates the reionization-era galaxies might be undergoing a rapid build-up of stellar mass with high specific star-formation rates. A UV diagnostic based on the ratios of C III] $λ\lambda1907,1909$/He II $\lambda1640$ versus O III] $\lambda1666$/He II $\lambda1640$ suggests that the star formation is the dominant source of ionization, similar to the local extreme dwarf galaxies and $z\sim2$ - 4 He II-detected galaxies. The [O III]/[O II] and C IV/C III] ratios of the composite spectrum are marginally larger than the criteria used to select galaxies as LyC leakers, suggesting that some of the galaxies in our sample are strong contributors to the reionizing radiation.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Robustness to distribution shifts of compressed networks for edge devices
Authors:
Lulan Shen,
Ali Edalati,
Brett Meyer,
Warren Gross,
James J. Clark
Abstract:
It is necessary to develop efficient DNNs deployed on edge devices with limited computation resources. However, the compressed networks often execute new tasks in the target domain, which is different from the source domain where the original network is trained. It is important to investigate the robustness of compressed networks in two types of data distribution shifts: domain shifts and adversar…
▽ More
It is necessary to develop efficient DNNs deployed on edge devices with limited computation resources. However, the compressed networks often execute new tasks in the target domain, which is different from the source domain where the original network is trained. It is important to investigate the robustness of compressed networks in two types of data distribution shifts: domain shifts and adversarial perturbations. In this study, we discover that compressed models are less robust to distribution shifts than their original networks. Interestingly, larger networks are more vulnerable to losing robustness than smaller ones, even when they are compressed to a similar size as the smaller networks. Furthermore, compact networks obtained by knowledge distillation are much more robust to distribution shifts than pruned networks. Finally, post-training quantization is a reliable method for achieving significant robustness to distribution shifts, and it outperforms both pruned and distilled models in terms of robustness.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Planar Schrödinger-Poisson system with steep potential well: supercritical exponential case
Authors:
Liejun Shen,
Marco Squassina
Abstract:
We study a class of planar Schrödinger-Poisson systems
$$
-Δu+λV(x)u+φu=f(u) , \quad x\in{\mathbb R}^2,\qquad
Δφ=u^2, \quad x\in{\mathbb R}^2, $$ where $λ>0$ is a parameter, $V\in C({\mathbb R}^2,{\mathbb R}^+)$ has a potential well $Ω\triangleq\text{int}\, V^{-1}(0)$ and the nonlinearity $f$ fulfills the supercritical exponential growth at infinity in the Trudinger-Moser sense. By exploitin…
▽ More
We study a class of planar Schrödinger-Poisson systems
$$
-Δu+λV(x)u+φu=f(u) , \quad x\in{\mathbb R}^2,\qquad
Δφ=u^2, \quad x\in{\mathbb R}^2, $$ where $λ>0$ is a parameter, $V\in C({\mathbb R}^2,{\mathbb R}^+)$ has a potential well $Ω\triangleq\text{int}\, V^{-1}(0)$ and the nonlinearity $f$ fulfills the supercritical exponential growth at infinity in the Trudinger-Moser sense. By exploiting the mountain-pass theorem and elliptic regular theory, we establish the existence and concentrating behavior of ground state solutions for sufficiently large $λ$.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Interference Cancellation for UWA Random Access Data Packet Transmission
Authors:
Yuriy Zakharov,
Lu Shen,
Benjamin Henson,
Nils Morozs,
Paul D. Mitchell
Abstract:
In underwater acoustic (UWA) random access communication networks with multiple users and data packet transmissions, the packet collisions are the main cause of the network performance degradation. The aim of this paper is to investigate interference cancellation (IC) techniques capable of resolving such collisions in a low-complexity modem with single-carrier modulation and single transducer. Mor…
▽ More
In underwater acoustic (UWA) random access communication networks with multiple users and data packet transmissions, the packet collisions are the main cause of the network performance degradation. The aim of this paper is to investigate interference cancellation (IC) techniques capable of resolving such collisions in a low-complexity modem with single-carrier modulation and single transducer. More specifically, in this modem, the IC is used at multiple stages of the receiver. Firstly, the IC is performed for cancelling the multipath interference to improve the equalization performance in comparison with the linear equalization and Rake combining. Secondly, the IC removes the interference from collided data packets within extracted signal segments after identifying the collisions. Finally, the IC is applied to the received baseband signal to improve the data packet detection. The modem performance is investigated in a lake experiment with intensive multipath channels. The experimental results demonstrate high detection performance of the proposed modem design and show that the proposed IC techniques can significantly improve the throughput of random access UWA networks.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Performance Evaluation of a Full-Duplex UWA System in Lake Experiments
Authors:
Lu Shen,
Benjamin Henson,
Long Shi,
Yuriy Zakharov
Abstract:
In this work we present a full-duplex (FD) underwater acoustic (UWA) communication system simultaneously transmitting and receiving acoustic signals in the same frequency bandwidth. To simplify the FD hardware, the system exploits a recently designed transducer capable of simultaneously transmitting and receiving signals. The key challenge of implementing an FD system is to cancel at the near-end…
▽ More
In this work we present a full-duplex (FD) underwater acoustic (UWA) communication system simultaneously transmitting and receiving acoustic signals in the same frequency bandwidth. To simplify the FD hardware, the system exploits a recently designed transducer capable of simultaneously transmitting and receiving signals. The key challenge of implementing an FD system is to cancel at the near-end receiver the strong self-interference (SI) from the near-end transmitter. By using advanced adaptive filtering algorithms providing high accuracy channel estimates, a high level of SI cancellation can be achieved when the far-end signal is absent. However, the SI channel estimation performance is limited in FD scenarios since the far-end signal acts as an interference when estimating the near-end SI channel. In this paper, we propose an FD UWA communication system which alternates between the SI cancellation and far-end data demodulation. An adaptive Rake combiner with multipath interference cancellation is implemented to improve the demodulation performance in time-varying multipath channels. The performance of the FD UWA system is evaluated in lake experiments. It is shown that the proposed adaptive Rake combiner with multipath interference cancellation significantly outperforms the conventional Rake combiner in all the experiments. The experimental results demonstrate that, with the new Rake combiner, the detection performance of the proposed FD UWA system is comparable with that of the half-duplex system.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Authors:
Songhe Deng,
Wei Zhuo,
**heng Xie,
Linlin Shen
Abstract:
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation (WSSS), allowing the localization of object regions in an image using only image-level labels. However, existing CAM methods suffer from under-activation of target object regions and false-activation of background regions due to the fact that a lack of detailed supervision can hinder the model's ab…
▽ More
Class Activation Map (CAM) has emerged as a popular tool for weakly supervised semantic segmentation (WSSS), allowing the localization of object regions in an image using only image-level labels. However, existing CAM methods suffer from under-activation of target object regions and false-activation of background regions due to the fact that a lack of detailed supervision can hinder the model's ability to understand the image as a whole. In this paper, we propose a novel Question-Answer Cross-Language-Image Matching framework for WSSS (QA-CLIMS), leveraging the vision-language foundation model to maximize the text-based understanding of images and guide the generation of activation maps. First, a series of carefully designed questions are posed to the VQA (Visual Question Answering) model with Question-Answer Prompt Engineering (QAPE) to generate a corpus of both foreground target objects and backgrounds that are adaptive to query images. We then employ contrastive learning in a Region Image Text Contrastive (RITC) network to compare the obtained foreground and background regions with the generated corpus. Our approach exploits the rich textual information from the open vocabulary as additional supervision, enabling the model to generate high-quality CAMs with a more complete object region and reduce false-activation of background regions. We conduct extensive analysis to validate the proposed method and show that our approach performs state-of-the-art on both PASCAL VOC 2012 and MS COCO datasets. Code is available at: https://github.com/CVI-SZU/QA-CLIMS
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Solving Continual Offline Reinforcement Learning with Decision Transformer
Authors:
Kaixin Huang,
Li Shen,
Chen Zhao,
Chun Yuan,
Dacheng Tao
Abstract:
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning, enabling agents to learn multiple tasks from static datasets without forgetting prior tasks. However, CORL faces challenges in balancing stability and plasticity. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and…
▽ More
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning, enabling agents to learn multiple tasks from static datasets without forgetting prior tasks. However, CORL faces challenges in balancing stability and plasticity. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We aim to investigate whether Decision Transformer (DT), another offline RL paradigm, can serve as a more suitable offline continuous learner to address these issues. We first compare AC-based offline algorithms with DT in the CORL framework. DT offers advantages in learning efficiency, distribution shift mitigation, and zero-shot generalization but exacerbates the forgetting problem during supervised parameter updates. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem. MH-DT stores task-specific knowledge using multiple heads, facilitating knowledge sharing with common components. It employs distillation and selective rehearsal to enhance current task learning when a replay buffer is available. In buffer-unavailable scenarios, LoRA-DT merges less influential weights and fine-tunes DT's decisive MLP layer to adapt to the current task. Extensive experiments on MoJuCo and Meta-World benchmarks demonstrate that our methods outperform SOTA CORL baselines and showcase enhanced learning capabilities and superior memory efficiency.
△ Less
Submitted 7 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Authors:
Haoran Xu,
Amr Sharaf,
Yunmo Chen,
Weiting Tan,
Lingfeng Shen,
Benjamin Van Durme,
Kenton Murray,
Young ** Kim
Abstract:
Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We…
▽ More
Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.
△ Less
Submitted 2 June, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge
Authors:
Wenbin Wang,
Liang Ding,
Li Shen,
Yong Luo,
Han Hu,
Dacheng Tao
Abstract:
Sentiment analysis is rapidly advancing by utilizing various data modalities (e.g., text, image). However, most previous works relied on superficial information, neglecting the incorporation of contextual world knowledge (e.g., background information derived from but beyond the given image and text pairs) and thereby restricting their ability to achieve better multimodal sentiment analysis (MSA).…
▽ More
Sentiment analysis is rapidly advancing by utilizing various data modalities (e.g., text, image). However, most previous works relied on superficial information, neglecting the incorporation of contextual world knowledge (e.g., background information derived from but beyond the given image and text pairs) and thereby restricting their ability to achieve better multimodal sentiment analysis (MSA). In this paper, we proposed a plug-in framework named WisdoM, to leverage the contextual world knowledge induced from the large vision-language models (LVLMs) for enhanced MSA. WisdoM utilizes LVLMs to comprehensively analyze both images and corresponding texts, simultaneously generating pertinent context. To reduce the noise in the context, we also introduce a training-free contextual fusion mechanism. Experiments across diverse granularities of MSA tasks consistently demonstrate that our approach has substantial improvements (brings an average +1.96% F1 score among five advanced methods) over several state-of-the-art methods.
△ Less
Submitted 20 February, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.