Search | arXiv e-print repository

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

Authors: Yanlin Wang, Tianyue Jiang, Mingwei Liu, Jiachi Chen, Zibin Zheng

Abstract: Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style betw… ▽ More Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by Code LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers have different coding styles. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 13pages, 14 figures

arXiv:2406.17286 [pdf]

Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it with the prioritized experience replay method to solve the problem that traditional path planning algorithms often fall into dead zones. A series of simulation experiment results prove that the path planning algorithm based on DDQN is significantly better than other methods in terms of speed and accuracy, especially the ability to break through dead zones in extreme environments. Research shows that the path planning algorithm based on DDQN performs well in terms of path quality and safety. These research results provide an important reference for the research on automatic navigation of autonomous vehicles. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

arXiv:2406.03695 [pdf, other]

FACOS: Enabling Privacy Protection Through Fine-Grained Access Control with On-chain and Off-chain System

Authors: Chao Liu, Cankun Hou, Tianyu Jiang, Jianting Ning, Hui Qiao, Yusen Wu

Abstract: Data-driven landscape across finance, government, and healthcare, the continuous generation of information demands robust solutions for secure storage, efficient dissemination, and fine-grained access control. Blockchain technology emerges as a significant tool, offering decentralized storage while upholding the tenets of data security and accessibility. However, on-chain and off-chain strategies… ▽ More Data-driven landscape across finance, government, and healthcare, the continuous generation of information demands robust solutions for secure storage, efficient dissemination, and fine-grained access control. Blockchain technology emerges as a significant tool, offering decentralized storage while upholding the tenets of data security and accessibility. However, on-chain and off-chain strategies are still confronted with issues such as untrusted off-chain data storage, absence of data ownership, limited access control policy for clients, and a deficiency in data privacy and auditability. To solve these challenges, we propose a permissioned blockchain-based privacy-preserving fine-grained access control on-chain and off-chain system, namely FACOS. We applied three fine-grained access control solutions and comprehensively analyzed them in different aspects, which provides an intuitive perspective for system designers and clients to choose the appropriate access control method for their systems. Compared to similar work that only stores encrypted data in centralized or non-fault-tolerant IPFS systems, we enhanced off-chain data storage security and robustness by utilizing a highly efficient and secure asynchronous Byzantine fault tolerance (BFT) protocol in the off-chain environment. As each of the clients needs to be verified and authorized before accessing the data, we involved the Trusted Execution Environment (TEE)-based solution to verify the credentials of clients. Additionally, our evaluation results demonstrated that our system offers better scalability and practicality than other state-of-the-art designs. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01595 [pdf, other]

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Authors: Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song

Abstract: We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering i… ▽ More We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering intricate and complete 3D human shapes from short video sequences, intensifying the level of difficulty. To tackle these challenges, we first define a layered neural representation for the entire scene, composited by individual human and background models. We learn the layered neural representation from videos via our layer-wise differentiable volume rendering. This learning process is further enhanced by our hybrid instance segmentation approach which combines the self-supervised 3D segmentation and the promptable 2D segmentation module, yielding reliable instance segmentation supervision even under close human interaction. A confidence-guided optimization formulation is introduced to optimize the human poses and shape/appearance alternately. We incorporate effective objectives to refine human poses via photometric information and impose physically plausible constraints on human dynamics, leading to temporally consistent 3D reconstructions with high fidelity. The evaluation of our method shows the superiority over prior art on publicly available datasets and in-the-wild videos. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Project page: https://eth-ait.github.io/MultiPly/

arXiv:2405.16094 [pdf, other]

PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus

Authors: Zhaochen Liu, Limeng Qiao, Xiangxiang Chu, Tingting Jiang

Abstract: Aiming to predict the complete shapes of partially occluded objects, amodal segmentation is an important step towards visual intelligence. With crucial significance, practical prior knowledge derives from sufficient training, while limited amodal annotations pose challenges to achieve better performance. To tackle this problem, utilizing the mighty priors accumulated in the foundation model, we pr… ▽ More Aiming to predict the complete shapes of partially occluded objects, amodal segmentation is an important step towards visual intelligence. With crucial significance, practical prior knowledge derives from sufficient training, while limited amodal annotations pose challenges to achieve better performance. To tackle this problem, utilizing the mighty priors accumulated in the foundation model, we propose the first SAM-based amodal segmentation approach, PLUG. Methodologically, a novel framework with hierarchical focus is presented to better adapt the task characteristics and unleash the potential capabilities of SAM. In the region level, due to the association and division in visible and occluded areas, inmodal and amodal regions are assigned as the focuses of distinct branches to avoid mutual disturbance. In the point level, we introduce the concept of uncertainty to explicitly assist the model in identifying and focusing on ambiguous points. Guided by the uncertainty map, a computation-economic point loss is applied to improve the accuracy of predicted boundaries. Experiments are conducted on several prominent datasets, and the results show that our proposed method outperforms existing methods with large margins. Even with fewer total parameters, our method still exhibits remarkable advantages. △ Less

Submitted 3 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15544 [pdf, other]

Knowledge-enhanced Relation Graph and Task Sampling for Few-shot Molecular Property Prediction

Authors: Zeyu Wang, Tianyi Jiang, Yao Lu, Xiaoze Bao, Shanqing Yu, Bin Wei, Qi Xuan

Abstract: Recently, few-shot molecular property prediction (FSMPP) has garnered increasing attention. Despite impressive breakthroughs achieved by existing methods, they often overlook the inherent many-to-many relationships between molecules and properties, which limits their performance. For instance, similar substructures of molecules can inspire the exploration of new compounds. Additionally, the relati… ▽ More Recently, few-shot molecular property prediction (FSMPP) has garnered increasing attention. Despite impressive breakthroughs achieved by existing methods, they often overlook the inherent many-to-many relationships between molecules and properties, which limits their performance. For instance, similar substructures of molecules can inspire the exploration of new compounds. Additionally, the relationships between properties can be quantified, with high-related properties providing more information in exploring the target property than those low-related. To this end, this paper proposes a novel meta-learning FSMPP framework (KRGTS), which comprises the Knowledge-enhanced Relation Graph module and the Task Sampling module. The knowledge-enhanced relation graph module constructs the molecule-property multi-relation graph (MPMRG) to capture the many-to-many relationships between molecules and properties. The task sampling module includes a meta-training task sampler and an auxiliary task sampler, responsible for scheduling the meta-training process and sampling high-related auxiliary tasks, respectively, thereby achieving efficient meta-knowledge learning and reducing noise introduction. Empirically, extensive experiments on five datasets demonstrate the superiority of KRGTS over a variety of state-of-the-art methods. The code is available in https://github.com/Vencent-Won/KRGTS-public. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.12130 [pdf, other]

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Authors: Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

Abstract: Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs… ▽ More Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Work in Progress

arXiv:2405.11344 [pdf]

Improved Content Understanding With Effective Use of Multi-task Contrastive Learning

Authors: Akanksha Bindal, Sudarshan Ramanujam, Dave Golland, TJ Hazen, Tina Jiang, Fengyu Zhang, Peng Yan

Abstract: In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling… ▽ More In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. We observe positive transfer, leading to superior performance across all tasks when compared to training independently on each. Our model outperforms the baseline on zero shot learning and offers improved multilingual support, highlighting its potential for broader application. The specialized content embeddings produced by our model outperform generalized embeddings offered by OpenAI on Linkedin dataset and tasks. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications. Our work offers insights and best practices for the field to build on. △ Less

Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.08039 [pdf]

doi 10.1109/TIV.2024.3412130

Vehicles Swarm Intelligence: Cooperation in both Longitudinal and Lateral Dimensions

Authors: Jia Hu, Nuoheng Zhang, Haoran Wang, Tenglong Jiang, Junnian Zheng, Feilong Liu

Abstract: Longitudinal-only platooning methods are facing great challenges on running mobility, since they may be impeded by slow-moving vehicles from time to time. To address this issue, this paper proposes a vehicles swarming method coupled both longitudinal and lateral cooperation. The proposed method bears the following contributions: i) enhancing driving mobility by swarming like a bee colony; ii) ensu… ▽ More Longitudinal-only platooning methods are facing great challenges on running mobility, since they may be impeded by slow-moving vehicles from time to time. To address this issue, this paper proposes a vehicles swarming method coupled both longitudinal and lateral cooperation. The proposed method bears the following contributions: i) enhancing driving mobility by swarming like a bee colony; ii) ensuring the success rate of overtaking; iii) cruising as a string of platoon to preserve sustainability. Evaluations indicate that the proposed method is capable of maneuvering a vehicle swarm to overtake slow-moving vehicles safely and successfully. The proposed method is confirmed to improve running mobility by 12.04%. Swarming safety is ensured by a safe following distance. The proposed method's influence on traffic is limited within five upstream vehicles. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.03129 [pdf, other]

Active Sensing for Multiuser Beam Tracking with Reconfigurable Intelligent Surface

Authors: Han Han, Tao Jiang, Wei Yu

Abstract: This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channe… ▽ More This paper studies a beam tracking problem in which an access point (AP), in collaboration with a reconfigurable intelligent surface (RIS), dynamically adjusts its downlink beamformers and the reflection pattern at the RIS in order to maintain reliable communications with multiple mobile user equipments (UEs). Specifically, the mobile UEs send uplink pilots to the AP periodically during the channel sensing intervals, the AP then adaptively configures the beamformers and the RIS reflection coefficients for subsequent data transmission based on the received pilots. This is an active sensing problem, because channel sensing involves configuring the RIS coefficients during the pilot stage and the optimal sensing strategy should exploit the trajectory of channel state information (CSI) from previously received pilots. Analytical solution to such an active sensing problem is very challenging. In this paper, we propose a deep learning framework utilizing a recurrent neural network (RNN) to automatically summarize the time-varying CSI obtained from the periodically received pilots into state vectors. These state vectors are then mapped to the AP beamformers and RIS reflection coefficients for subsequent downlink data transmissions, as well as the RIS reflection coefficients for the next round of uplink channel sensing. The map**s from the state vectors to the downlink beamformers and the RIS reflection coefficients for both channel sensing and downlink data transmission are performed using graph neural networks (GNNs) to account for the interference among the UEs. Simulations demonstrate significant and interpretable performance improvement of the proposed approach over the existing data-driven methods with nonadaptive channel sensing schemes. △ Less

Submitted 31 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.15454 [pdf, ps, other]

Prediction from compression for models with infinite memory, with applications to hidden Markov and renewal processes

Authors: Yanjun Han, Tianze Jiang, Yihong Wu

Abstract: Consider the problem of predicting the next symbol given a sample path of length n, whose joint distribution belongs to a distribution class that may have long-term memory. The goal is to compete with the conditional predictor that knows the true model. For both hidden Markov models (HMMs) and renewal processes, we determine the optimal prediction risk in Kullback- Leibler divergence up to univers… ▽ More Consider the problem of predicting the next symbol given a sample path of length n, whose joint distribution belongs to a distribution class that may have long-term memory. The goal is to compete with the conditional predictor that knows the true model. For both hidden Markov models (HMMs) and renewal processes, we determine the optimal prediction risk in Kullback- Leibler divergence up to universal constant factors. Extending existing results in finite-order Markov models [HJW23] and drawing ideas from universal compression, the proposed estimator has a prediction risk bounded by redundancy of the distribution class and a memory term that accounts for the long-range dependency of the model. Notably, for HMMs with bounded state and observation spaces, a polynomial-time estimator based on dynamic programming is shown to achieve the optimal prediction risk Θ(log n/n); prior to this work, the only known result of this type is O(1/log n) obtained using Markov approximation [Sha+18]. Matching minimax lower bounds are obtained by making connections to redundancy and mutual information via a reduction argument. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 37 Pages

arXiv:2404.14162 [pdf, other]

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

Abstract: Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventiona… ▽ More Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details. △ Less

Submitted 19 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.13277 [pdf, other]

Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two Perspectives

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Yan Zhong, Tingting Jiang

Abstract: Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecti… ▽ More Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models. △ Less

Submitted 24 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: Submitted to a conference

arXiv:2404.11996 [pdf, other]

DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Authors: Songtao Huang, Hong** Song, Tianqi Jiang, Akbar Telikani, Jun Shen, Qingguo Zhou, Binbin Yong, Qiang Wu

Abstract: Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel… ▽ More Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel in-depth feature representation, called Dynamic Spatio-Temporal (Dyn-ST) features, is introduced, which encapsulates spatial characteristics across varying times. Moreover, a Dynamic Spatio-Temporal Graph Transformer Network (DST-GTN) is proposed by capturing Dyn-ST features and other dynamic adjacency relations between intersections. The DST-GTN can model dynamic ST relationships between nodes accurately and refine the representation of global and local ST characteristics by adopting adaptive weights in low-pass and all-pass filters, enabling the extraction of Dyn-ST features from traffic time-series data. Through numerical experiments on public datasets, the DST-GTN achieves state-of-the-art performance for a range of traffic forecasting tasks and demonstrates enhanced stability. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10358 [pdf, other]

Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation

Authors: Wenjie Lin, Zhen Liu, Chengzhi Jiang, Mingyan Han, Ting Jiang, Shuaicheng Liu

Abstract: In this paper, we address the Bracket Image Restoration and Enhancement (BracketIRE) task using a novel framework, which requires restoring a high-quality high dynamic range (HDR) image from a sequence of noisy, blurred, and low dynamic range (LDR) multi-exposure RAW inputs. To overcome this challenge, we present the IREANet, which improves the multiple exposure alignment and aggregation with a Fl… ▽ More In this paper, we address the Bracket Image Restoration and Enhancement (BracketIRE) task using a novel framework, which requires restoring a high-quality high dynamic range (HDR) image from a sequence of noisy, blurred, and low dynamic range (LDR) multi-exposure RAW inputs. To overcome this challenge, we present the IREANet, which improves the multiple exposure alignment and aggregation with a Flow-guide Feature Alignment Module (FFAM) and an Enhanced Feature Aggregation Module (EFAM). Specifically, the proposed FFAM incorporates the inter-frame optical flow as guidance to facilitate the deformable alignment and spatial attention modules for better feature alignment. The EFAM further employs the proposed Enhanced Residual Block (ERB) as a foundational component, wherein a unidirectional recurrent network aggregates the aligned temporal features to better reconstruct the results. To improve model generalization and performance, we additionally employ the Bayer preserving augmentation (BayerAug) strategy to augment the multi-exposure RAW inputs. Our experimental evaluations demonstrate that the proposed IREANet shows state-of-the-art performance compared with previous methods. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.06365 [pdf, other]

Dynamic Resolution Guidance for Facial Expression Recognition

Authors: Jie Ou, Xu Li, Tianxiang Jiang, Yuanlun Xie

Abstract: Facial expression recognition (FER) is vital for human-computer interaction and emotion analysis, yet recognizing expressions in low-resolution images remains challenging. This paper introduces a practical method called Dynamic Resolution Guidance for Facial Expression Recognition (DRGFER) to effectively recognize facial expressions in images with varying resolutions without compromising FER model… ▽ More Facial expression recognition (FER) is vital for human-computer interaction and emotion analysis, yet recognizing expressions in low-resolution images remains challenging. This paper introduces a practical method called Dynamic Resolution Guidance for Facial Expression Recognition (DRGFER) to effectively recognize facial expressions in images with varying resolutions without compromising FER model accuracy. Our framework comprises two main components: the Resolution Recognition Network (RRN) and the Multi-Resolution Adaptation Facial Expression Recognition Network (MRAFER). The RRN determines image resolution, outputs a binary vector, and the MRAFER assigns images to suitable facial expression recognition networks based on resolution. We evaluated DRGFER on widely-used datasets RAFDB and FERPlus, demonstrating that our method retains optimal model performance at each resolution and outperforms alternative resolution approaches. The proposed framework exhibits robustness against resolution variations and facial expressions, offering a promising solution for real-world applications. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.04895 [pdf, other]

Tensorized Ant Colony Optimization for GPU Acceleration

Authors: Luming Yang, Tao Jiang, Ran Cheng

Abstract: Ant Colony Optimization (ACO) is renowned for its effectiveness in solving Traveling Salesman Problems, yet it faces computational challenges in CPU-based environments, particularly with large-scale instances. In response, we introduce a Tensorized Ant Colony Optimization (TensorACO) to utilize the advancements of GPU acceleration. As the core, TensorACO fully transforms ant system and ant path in… ▽ More Ant Colony Optimization (ACO) is renowned for its effectiveness in solving Traveling Salesman Problems, yet it faces computational challenges in CPU-based environments, particularly with large-scale instances. In response, we introduce a Tensorized Ant Colony Optimization (TensorACO) to utilize the advancements of GPU acceleration. As the core, TensorACO fully transforms ant system and ant path into tensor forms, a process we refer to as tensorization. For the tensorization of ant system, we propose a preprocessing method to reduce the computational overhead by calculating the probability transition matrix. In the tensorization of ant path, we propose an index map** method to accelerate the update of pheromone matrix by replacing the mechanism of sequential path update with parallel matrix operations. Additionally, we introduce an Adaptive Independent Roulette (AdaIR) method to overcome the challenges of parallelizing ACO's selection mechanism on GPUs. Comprehensive experiments demonstrate the superior performance of TensorACO achieving up to 1921$\times$ speedup over standard ACO. Moreover, the AdaIR method further improves TensorACO's convergence speed by 80% and solution quality by 2%. Source codes are available at https://github.com/EMI-Group/tensoraco. △ Less

Submitted 12 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

arXiv:2404.01159 [pdf, other]

doi 10.1145/3638529.3654223

GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEA

Authors: Zhenyu Liang, Tao Jiang, Kebin Sun, Ran Cheng

Abstract: Evolutionary multiobjective optimization has witnessed remarkable progress during the past decades. However, existing algorithms often encounter computational challenges in large-scale scenarios, primarily attributed to the absence of hardware acceleration. In response, we introduce a Tensorized Reference Vector Guided Evolutionary Algorithm (TensorRVEA) for harnessing the advancements of GPU acce… ▽ More Evolutionary multiobjective optimization has witnessed remarkable progress during the past decades. However, existing algorithms often encounter computational challenges in large-scale scenarios, primarily attributed to the absence of hardware acceleration. In response, we introduce a Tensorized Reference Vector Guided Evolutionary Algorithm (TensorRVEA) for harnessing the advancements of GPU acceleration. In TensorRVEA, the key data structures and operators are fully transformed into tensor forms for leveraging GPU-based parallel computing. In numerical benchmark tests involving large-scale populations and problem dimensions, TensorRVEA consistently demonstrates high computational performance, achieving up to over 1000$\times$ speedups. Then, we applied TensorRVEA to the domain of multiobjective neuroevolution for addressing complex challenges in robotic control tasks. Furthermore, we assessed TensorRVEA's extensibility by altering several tensorized reproduction operators. Experimental results demonstrate promising scalability and robustness of TensorRVEA. Source codes are available at \url{https://github.com/EMI-Group/tensorrvea}. △ Less

Submitted 11 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

arXiv:2403.20045 [pdf]

Blockchain for Energy Market: A Comprehensive Survey

Authors: Tianqi Jiang, Haoxiang Luo, Kun Yang, Gang Sun, Hongfang Yu, Qi Huang, Athanasios V. Vasilakos

Abstract: The energy market encompasses the behavior of energy supply and trading within a platform system. By utilizing centralized or distributed trading, energy can be effectively managed and distributed across different regions, thereby achieving market equilibrium and satisfying both producers and consumers. However, recent years have presented unprecedented challenges and difficulties for the developm… ▽ More The energy market encompasses the behavior of energy supply and trading within a platform system. By utilizing centralized or distributed trading, energy can be effectively managed and distributed across different regions, thereby achieving market equilibrium and satisfying both producers and consumers. However, recent years have presented unprecedented challenges and difficulties for the development of the energy market. These challenges include regional energy imbalances, volatile energy pricing, high computing costs, and issues related to transaction information disclosure. Researchers widely acknowledge that the security features of blockchain technology can enhance the efficiency of energy transactions and establish the fundamental stability and robustness of the energy market. This type of blockchain-enabled energy market is commonly referred to as an energy blockchain. Currently, there is a burgeoning amount of research in this field, encompassing algorithm design, framework construction, and practical application. It is crucial to organize and compare these research efforts to facilitate the further advancement of energy blockchain. This survey aims to comprehensively review the fundamental characteristics of blockchain and energy markets, highlighting the significant advantages of combining the two. Moreover, based on existing research outcomes, we will categorize and compare the current energy market research supported by blockchain in terms of algorithm design, market framework construction, and the policies and practical applications adopted by different countries. Finally, we will address current issues and propose potential future directions for improvement, to provide guidance for the practical implementation of blockchain in the energy market. △ Less

Submitted 5 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.17297 [pdf, other]

InternLM2 Technical Report

Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15377 [pdf, other]

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Authors: Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang

Abstract: We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue. Our approach employs a progressive training paradigm that unifies the different self- or weakly-supervised learning frameworks of masked video token reconstruction, cross-modal contrastive learning, and next token predict… ▽ More We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue. Our approach employs a progressive training paradigm that unifies the different self- or weakly-supervised learning frameworks of masked video token reconstruction, cross-modal contrastive learning, and next token prediction. Different training stages would guide our model to capture different levels of structure and semantic information through different pretext tasks. At the data level, we prioritize the spatiotemporal consistency by semantically segmenting videos and generating video-audio-speech captions. This improves the alignment between video and text. We scale both data and model size for our InternVideo2. Through extensive experiments, we validate our designs and demonstrate the state-of-the-art performance on over 60 video and audio tasks. Notably, our model outperforms others on various video-related captioning, dialogue, and long video understanding benchmarks, highlighting its ability to reason and comprehend long temporal contexts. Code and models are available at https://github.com/OpenGVLab/InternVideo2/. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: a technical report about video understanding

arXiv:2403.11397 [pdf, other]

Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization

Authors: Yujia Liu, Chenxi Yang, Dingquan Li, Jianhao Ding, Tingting Jiang

Abstract: The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulti… ▽ More The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulting in significant changes in predicted scores. In this paper, we propose a defense method to improve the stability in predicted scores when attacked by small perturbations, thus enhancing the adversarial robustness of NR-IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the $\ell_1$ norm of the model's gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the $\ell_1$ norm of the gradient, thereby boosting the robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: accepted by CVPR 2024

arXiv:2403.09004 [pdf, ps, other]

Meta-Learning-Based Fronthaul Compression for Cloud Radio Access Networks

Authors: Ruihua Qiao, Tao Jiang, Wei Yu

Abstract: This paper investigates the fronthaul compression problem in a user-centric cloud radio access network, in which single-antenna users are served by a central processor (CP) cooperatively via a cluster of remote radio heads (RRHs). To satisfy the fronthaul capacity constraint, this paper proposes a transform-compress-forward scheme, which consists of well-designed transformation matrices and unifor… ▽ More This paper investigates the fronthaul compression problem in a user-centric cloud radio access network, in which single-antenna users are served by a central processor (CP) cooperatively via a cluster of remote radio heads (RRHs). To satisfy the fronthaul capacity constraint, this paper proposes a transform-compress-forward scheme, which consists of well-designed transformation matrices and uniform quantizers. The transformation matrices perform dimension reduction in the uplink and dimension expansion in the downlink. To reduce the communication overhead for designing the transformation matrices, this paper further proposes a deep learning framework to first learn a suboptimal transformation matrix at each RRH based on the local channel state information (CSI), and then to refine it iteratively. To facilitate the refinement process, we propose an efficient signaling scheme that only requires the transmission of low-dimensional effective CSI and its gradient between the CP and RRH, and further, a meta-learning based gated recurrent unit network to reduce the number of signaling transmission rounds. For the sum-rate maximization problem, simulation results show that the proposed two-stage neural network can perform close to the fully cooperative global CSI based benchmark with significantly reduced communication overhead for both the uplink and the downlink. Moreover, using the first stage alone can already outperform the existing local CSI based benchmark. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 15 Pages, 13 Figures; accepted in IEEE Transactions on Wireless Communications

arXiv:2403.01413 [pdf, other]

Exploring the Design of Generative AI in Supporting Music-based Reminiscence for Older Adults

Authors: Yucheng **, Wanling Cai, Li Chen, Yizhe Zhang, Gavin Doherty, Tonglin Jiang

Abstract: Music-based reminiscence has the potential to positively impact the psychological well-being of older adults. However, the aging process and physiological changes, such as memory decline and limited verbal communication, may impede the ability of older adults to recall their memories and life experiences. Given the advanced capabilities of generative artificial intelligence (AI) systems, such as g… ▽ More Music-based reminiscence has the potential to positively impact the psychological well-being of older adults. However, the aging process and physiological changes, such as memory decline and limited verbal communication, may impede the ability of older adults to recall their memories and life experiences. Given the advanced capabilities of generative artificial intelligence (AI) systems, such as generated conversations and images, and their potential to facilitate the reminiscing process, this study aims to explore the design of generative AI to support music-based reminiscence in older adults. This study follows a user-centered design approach incorporating various stages, including detailed interviews with two social workers and two design workshops (involving ten older adults). Our work contributes to an in-depth understanding of older adults' attitudes toward utilizing generative AI for supporting music-based reminiscence and identifies concrete design considerations for the future design of generative AI to enhance the reminiscence experience of older adults. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.00134 [pdf, other]

Active Sensing for Reciprocal MIMO Channels

Authors: Tao Jiang, Wei Yu

Abstract: This paper addresses the design of transmit precoder and receive combiner matrices to support $N_{\rm s}$ independent data streams over a time-division duplex (TDD) point-to-point massive multiple-input multiple-output (MIMO) channel with either a fully digital or a hybrid structure. The optimal precoder and combiner design amounts to finding the top-$N_{\rm s}$ singular vectors of the channel mat… ▽ More This paper addresses the design of transmit precoder and receive combiner matrices to support $N_{\rm s}$ independent data streams over a time-division duplex (TDD) point-to-point massive multiple-input multiple-output (MIMO) channel with either a fully digital or a hybrid structure. The optimal precoder and combiner design amounts to finding the top-$N_{\rm s}$ singular vectors of the channel matrix, but the explicit estimation of the entire high-dimensional channel would require significant pilot overhead. Alternatively, prior works suggest to find the precoding and combining matrices directly by exploiting channel reciprocity and by using the power iteration method, but its performance degrades in the low SNR regime. To tackle this challenging problem, this paper proposes a learning-based active sensing framework, where the transmitter and the receiver send pilots alternately using sensing beamformers that are actively designed as functions of previously received pilots. This is accomplished by using recurrent neural networks to summarize information from the historical observations into hidden state vectors, then using fully connected neural networks to learn the appropriate sensing beamformers in the next pilot stage and finally the transmit precoding and receive combiner matrices for data communications. Simulations demonstrate that the learning-based method outperforms existing approaches significantly and maintains superior performance even in the low SNR regime for both the fully digital and hybrid MIMO scenarios. △ Less

Submitted 6 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: This paper is accepted in IEEE Transactions on Signal Processing

arXiv:2402.17363 [pdf, other]

CGGM: A conditional graph generation model with adaptive sparsity for node anomaly detection in IoT networks

Authors: Xianshi Su, Munan Li, Tongbang Jiang, Hao Long

Abstract: Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a… ▽ More Dynamic graphs are extensively employed for detecting anomalous behavior in nodes within the Internet of Things (IoT). Generative models are often used to address the issue of imbalanced node categories in dynamic graphs. Nevertheless, the constraints it faces include the monotonicity of adjacency relationships, the difficulty in constructing multi-dimensional features for nodes, and the lack of a method for end-to-end generation of multiple categories of nodes. This paper presents a novel graph generation model, called CGGM, designed specifically to generate a larger number of nodes belonging to the minority class. The mechanism for generating an adjacency matrix, through adaptive sparsity, enhances flexibility in its structure. The feature generation module, called multidimensional features generator (MFG) to generate node features along with topological information. Labels are transformed into embedding vectors, serving as conditional constraints to control the generation of synthetic data across multiple categories. Using a multi-stage loss, the distribution of synthetic data is adjusted to closely resemble that of real data. In extensive experiments, we show that CGGM's synthetic data outperforms state-of-the-art methods across various metrics. Our results demonstrate efficient generation of diverse data categories, robustly enhancing multi-category classification model performance. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 13 pages, 19 figures

arXiv:2402.16153 [pdf, other]

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

arXiv:2402.14299 [pdf, other]

We Choose to Go to Space: Agent-driven Human and Multi-Robot Collaboration in Microgravity

Authors: Miao Xin, Zhongrui You, Zihan Zhang, Taoran Jiang, Tingjia Xu, Haotian Liang, Guo**g Ge, Yuchen Ji, Shentong Mo, Jian Cheng

Abstract: We present SpaceAgents-1, a system for learning human and multi-robot collaboration (HMRC) strategies under microgravity conditions. Future space exploration requires humans to work together with robots. However, acquiring proficient robot skills and adept collaboration under microgravity conditions poses significant challenges within ground laboratories. To address this issue, we develop a microg… ▽ More We present SpaceAgents-1, a system for learning human and multi-robot collaboration (HMRC) strategies under microgravity conditions. Future space exploration requires humans to work together with robots. However, acquiring proficient robot skills and adept collaboration under microgravity conditions poses significant challenges within ground laboratories. To address this issue, we develop a microgravity simulation environment and present three typical configurations of intra-cabin robots. We propose a hierarchical heterogeneous multi-agent collaboration architecture: guided by foundation models, a Decision-Making Agent serves as a task planner for human-robot collaboration, while individual Skill-Expert Agents manage the embodied control of robots. This mechanism empowers the SpaceAgents-1 system to execute a range of intricate long-horizon HMRC tasks. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.15321 [pdf]

doi 10.1016/j.apenergy.2024.122736

Localization of Dummy Data Injection Attacks in Power Systems Considering Incomplete Topological Information: A Spatio-Temporal Graph Wavelet Convolutional Neural Network Approach

Authors: Zhaoyang Qu, Yunchang Dong, Yang Li, Siqi Song, Tao Jiang, Min Li, Qiming Wang, Lei Wang, Xiaoyong Bo, Jiye Zang, Qi Xu

Abstract: The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing researc… ▽ More The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing research predominantly focuses on various machine learning techniques, often analyzing the temporal data sequences post-attack or relying solely on Euclidean spatial characteristics. Unfortunately, this approach tends to overlook the inherent topological correlations within the non-Euclidean spatial attributes of power grid data, consequently leading to diminished accuracy in attack localization. To address this issue, this study takes a comprehensive approach. Initially, it examines the underlying principles of these new DDIAs on power systems. Here, an intricate mathematical model of the DDIA is designed, accounting for incomplete topological knowledge and alternating current (AC) state estimation from an attacker's perspective. Subsequently, by integrating a priori knowledge of grid topology and considering the temporal correlations within measurement data and the topology-dependent attributes of the power grid, this study introduces temporal and spatial attention matrices. These matrices adaptively capture the spatio-temporal correlations within the attacks. Leveraging gated stacked causal convolution and graph wavelet sparse convolution, the study jointly extracts spatio-temporal DDIA features. Finally, the research proposes a DDIA localization method based on spatio-temporal graph neural networks. The accuracy and effectiveness of the DDIA model are rigorously demonstrated through comprehensive analytical cases. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: Accepted by Applied Energy

Journal ref: Applied Energy 360 (2024) 122736

arXiv:2401.12578 [pdf, other]

ToDA: Target-oriented Diffusion Attacker against Recommendation System

Authors: Xiaohao Liu, Zhulin Tao, Ting Jiang, He Chang, Yunshan Ma, Xianglin Huang, Xiang Wang

Abstract: Recommendation systems (RS) have become indispensable tools for web services to address information overload, thus enhancing user experiences and bolstering platforms' revenues. However, with their increasing ubiquity, security concerns have also emerged. As the public accessibility of RS, they are susceptible to specific malicious attacks where adversaries can manipulate user profiles, leading to… ▽ More Recommendation systems (RS) have become indispensable tools for web services to address information overload, thus enhancing user experiences and bolstering platforms' revenues. However, with their increasing ubiquity, security concerns have also emerged. As the public accessibility of RS, they are susceptible to specific malicious attacks where adversaries can manipulate user profiles, leading to biased recommendations. Recent research often integrates additional modules using generative models to craft these deceptive user profiles, ensuring them are imperceptible while causing the intended harm. Albeit their efficacy, these models face challenges of unstable training and the exploration-exploitation dilemma, which can lead to suboptimal results. In this paper, we pioneer to investigate the potential of diffusion models (DMs), for shilling attacks. Specifically, we propose a novel Target-oriented Diffusion Attack model (ToDA). It incorporates a pre-trained autoencoder that transforms user profiles into a high dimensional space, paired with a Latent Diffusion Attacker (LDA)-the core component of ToDA. LDA introduces noise into the profiles within this latent space, adeptly steering the approximation towards targeted items through cross-attention mechanisms. The global horizon, implemented by a bipartite graph, is involved in LDA and derived from the encoded user profile feature. This makes LDA possible to extend the generation outwards the on-processing user feature itself, and bridges the gap between diffused user features and target item features. Extensive experiments compared to several SOTA baselines demonstrate ToDA's effectiveness. Specific studies exploit the elaborative design of ToDA and underscore the potency of advanced generative models in such contexts. △ Less

Submitted 16 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.12480 [pdf, other]

Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation

Authors: Kexin Li, Tao Jiang, Zongxin Yang, Yi Yang, Yueting Zhuang, Jun Xiao

Abstract: Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction. To improve the user experience, it is important to consider the user's input habits, segmentation quality, running time and memory consumption.However, existing methods compromise user experience with single input mode and slow running speed. Specifically, these methods only allow… ▽ More Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction. To improve the user experience, it is important to consider the user's input habits, segmentation quality, running time and memory consumption.However, existing methods compromise user experience with single input mode and slow running speed. Specifically, these methods only allow the user to interact with one single frame, which limits the expression of the user's intent.To overcome these limitations and better align with people's usage habits, we propose a framework that can accept multiple frames simultaneously and explore synergistic interaction across frames (SIAF). Concretely, we designed the Across-Frame Interaction Module that enables users to annotate different objects freely on multiple frames. The AFI module will migrate scribble information among multiple interactive frames and generate multi-frame masks. Additionally, we employ the id-queried mechanism to process multiple objects in batches. Furthermore, for a more efficient propagation and lightweight model, we design a truncated re-propagation strategy to replace the previous multi-round fusion module, which employs an across-round memory that stores important interaction information. Our SwinB-SIAF achieves new state-of-the-art performance on DAVIS 2017 (89.6%, J&F@60). Moreover, our R50-SIAF is more than 3 faster than the state-of-the-art competitor under challenging multi-object scenarios. △ Less

Submitted 4 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.07284 [pdf, other]

Improving Domain Adaptation through Extended-Text Reading Comprehension

Authors: Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

Abstract: To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain… ▽ More To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at https://github.com/microsoft/LMOps. △ Less

Submitted 18 January, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: Work in Progress

arXiv:2401.06992 [pdf, other]

Progressive Feature Fusion Network for Enhancing Image Quality Assessment

Authors: Kaiqun Wu, Xiaoling Jiang, Rui Yu, Yonggang Luo, Tian Jiang, Xi Wu, Peng Wei

Abstract: Image compression has been applied in the fields of image storage and video broadcasting. However, it's formidably tough to distinguish the subtle quality differences between those distorted images generated by different algorithms. In this paper, we propose a new image quality assessment framework to decide which image is better in an image group. To capture the subtle differences, a fine-grained… ▽ More Image compression has been applied in the fields of image storage and video broadcasting. However, it's formidably tough to distinguish the subtle quality differences between those distorted images generated by different algorithms. In this paper, we propose a new image quality assessment framework to decide which image is better in an image group. To capture the subtle differences, a fine-grained network is adopted to acquire multi-scale features. Subsequently, we design a cross subtract block for separating and gathering the information within positive and negative image pairs. Enabling image comparison in feature space. After that, a progressive feature fusion block is designed, which fuses multi-scale features in a novel progressive way. Hierarchical spatial 2D features can thus be processed gradually. Experimental results show that compared with the current mainstream image quality assessment methods, the proposed network can achieve more accurate image quality assessment and ranks second in the benchmark of CLIC in the image perceptual model track. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: Data Compression Conference

arXiv:2401.05217 [pdf, other]

Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

Authors: Chenxi Yang, Yujia Liu, Dingquan Li, Tingting Jiang

Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack m… ▽ More No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack methods for NR-IQA provide a powerful instrument to test the robustness of NR-IQA. However, current attack methods of NR-IQA heavily rely on the gradient of the NR-IQA model, leading to limitations when the gradient information is unavailable. In this paper, we present a pioneering query-based black box attack against NR-IQA methods. We propose the concept of score boundary and leverage an adaptive iterative approach with multiple score boundaries. Meanwhile, the initial attack directions are also designed to leverage the characteristics of the Human Visual System (HVS). Experiments show our method outperforms all compared state-of-the-art attack methods and is far ahead of previous black-box methods. The effective NR-IQA model DBCNN suffers a Spearman's rank-order correlation coefficient (SROCC) decline of 0.6381 attacked by our method, revealing the vulnerability of NR-IQA models to black-box attacks. The proposed attack method also provides a potent tool for further exploration into NR-IQA robustness. △ Less

Submitted 25 April, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.03369 [pdf, other]

Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry

Authors: Zeyu Wang, Tianyi Jiang, **huan Wang, Qi Xuan

Abstract: Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process. Recently, with the advancement of machine learning, deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods, garnering significant attention. Among them,… ▽ More Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process. Recently, with the advancement of machine learning, deep learning-based molecular property prediction has emerged as a solution to the resource-intensive nature of traditional methods, garnering significant attention. Among them, molecular representation learning is the key factor for molecular property prediction performance. And there are lots of sequence-based, graph-based, and geometry-based methods that have been proposed. However, the majority of existing studies focus solely on one modality for learning molecular representations, failing to comprehensively capture molecular characteristics and information. In this paper, a novel multi-modal representation learning model, which integrates the sequence, graph, and geometry characteristics, is proposed for molecular property prediction, called SGGRL. Specifically, we design a fusion layer to fusion the representation of different modalities. Furthermore, to ensure consistency across modalities, SGGRL is trained to maximize the similarity of representations for the same molecule while minimizing similarity for different molecules. To verify the effectiveness of SGGRL, seven molecular datasets, and several baselines are used for evaluation and comparison. The experimental results demonstrate that SGGRL consistently outperforms the baselines in most cases. This further underscores the capability of SGGRL to comprehensively capture molecular information. Overall, the proposed SGGRL model showcases its potential to revolutionize molecular property prediction by leveraging multi-modal representation learning to extract diverse and comprehensive molecular insights. Our code is released at https://github.com/Vencent-Won/SGGRL. △ Less

Submitted 8 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: 8 pages, 3 figures

arXiv:2401.01642 [pdf, other]

doi 10.1609/aaai.v38i4.28176

BLADE: Box-Level Supervised Amodal Segmentation through Directed Expansion

Authors: Zhaochen Liu, Zhixuan Li, Tingting Jiang

Abstract: Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth boundin… ▽ More Perceiving the complete shape of occluded objects is essential for human and machine intelligence. While the amodal segmentation task is to predict the complete mask of partially occluded objects, it is time-consuming and labor-intensive to annotate the pixel-level ground truth amodal masks. Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision, thereby alleviating the need for exhaustive pixel-level annotations. Nevertheless, current box-level methodologies encounter limitations in generating low-resolution masks and imprecise boundaries, failing to meet the demands of practical real-world applications. We present a novel solution to tackle this problem by introducing a directed expansion approach from visible masks to corresponding amodal masks. Our approach involves a hybrid end-to-end network based on the overlap** region - the area where different instances intersect. Diverse segmentation strategies are applied for overlap** regions and non-overlap** regions according to distinct characteristics. To guide the expansion of visible masks, we introduce an elaborately-designed connectivity loss for overlap** regions, which leverages correlations with visible masks and facilitates accurate amodal segmentation. Experiments are conducted on several challenging datasets and the results show that our proposed method can outperform existing state-of-the-art methods with large margins. △ Less

Submitted 25 February, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI 2024

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 38, 4 (Mar. 2024), 3846-3854

arXiv:2401.01629 [pdf, ps, other]

Synthetic Data in AI: Challenges, Applications, and Ethical Implications

Authors: Shuang Hao, Wenfeng Han, Tao Jiang, Yi** Li, Haonan Wu, Chunlin Zhong, Zhangjun Zhou, He Tang

Abstract: In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical… ▽ More In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical models to advanced deep learning techniques, and examines their applications across diverse domains. The report also critically addresses the ethical considerations and legal implications associated with synthetic datasets, highlighting the urgent need for mechanisms to ensure fairness, mitigate biases, and uphold ethical standards in AI development. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.09295 [pdf, other]

Networking for the Metaverse: The Standardization Landscape

Authors: Cedric Westphal, Jungha Hong, Shin-Gak Kang, Leonardo Chiariglione, Tianji Jiang

Abstract: New applications are being supported by current and future networks. In particular, it is expected that Metaverse applications will be deployed in the near future, as 5G and 6G network provide sufficient bandwidth and sufficiently low latency to provide a satisfying end-user experience. However, networks still need to evolve to better support this type of application. We present here a basic taxon… ▽ More New applications are being supported by current and future networks. In particular, it is expected that Metaverse applications will be deployed in the near future, as 5G and 6G network provide sufficient bandwidth and sufficiently low latency to provide a satisfying end-user experience. However, networks still need to evolve to better support this type of application. We present here a basic taxonomy of the metaverse, which allows to identify some of the networking requirements for such an application; we also provide an overview of the current state of balthe standardization efforts in different standardization organizations, including ITU-T, 3GPP, IETF and MPAI. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: To appear in ITU Journal on Future and Evolving Technologies J-FET December 2023

arXiv:2312.09002 [pdf, other]

Localization with Reconfigurable Intelligent Surface: An Active Sensing Approach

Authors: Zhongze Zhang, Tao Jiang, Wei Yu

Abstract: This paper addresses an uplink localization problem in which a base station (BS) aims to locate a remote user with the help of reconfigurable intelligent surfaces (RISs). We propose a strategy in which the user transmits pilots sequentially and the BS adaptively adjusts the sensing vectors, including the BS beamforming vector and multiple RIS reflection coefficients based on the observations alrea… ▽ More This paper addresses an uplink localization problem in which a base station (BS) aims to locate a remote user with the help of reconfigurable intelligent surfaces (RISs). We propose a strategy in which the user transmits pilots sequentially and the BS adaptively adjusts the sensing vectors, including the BS beamforming vector and multiple RIS reflection coefficients based on the observations already made, to eventually produce an estimated user position. This is a challenging active sensing problem for which finding an optimal solution involves searching through a complicated functional space whose dimension increases with the number of measurements. We show that the long short-term memory (LSTM) network can be used to exploit the latent temporal correlation between measurements to automatically construct scalable state vectors. Subsequently, the state vector is mapped to the sensing vectors for the next time frame via a deep neural network (DNN). A final DNN is used to map the state vector to the estimated user position. Numerical result illustrates the advantage of the active sensing design as compared to non-active sensing methods. The proposed solution produces interpretable results and is generalizable in the number of sensing stages. Remarkably, we show that a network with one BS and multiple RISs can outperform a comparable setting with multiple BSs. △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted in IEEE Transactions on Wireless Communications. This is an extended version of the previous arXiv paper arXiv:2310.13160

arXiv:2312.07526 [pdf, other]

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Authors: Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang

Abstract: Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage methods often fail to simultaneously deliver high accuracy and real-time performance. This paper introduces RTMO, a one-stage pose estimation framework that seamlessly integrates coordi… ▽ More Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage methods often fail to simultaneously deliver high accuracy and real-time performance. This paper introduces RTMO, a one-stage pose estimation framework that seamlessly integrates coordinate classification by representing keypoints using dual 1-D heatmaps within the YOLO architecture, achieving accuracy comparable to top-down methods while maintaining high speed. We propose a dynamic coordinate classifier and a tailored loss function for heatmap learning, specifically designed to address the incompatibilities between coordinate classification and dense prediction models. RTMO outperforms state-of-the-art one-stage pose estimators, achieving 1.1% higher AP on COCO while operating about 9 times faster with the same backbone. Our largest model, RTMO-l, attains 74.8% AP on COCO val2017 and 141 FPS on a single V100 GPU, demonstrating its efficiency and accuracy. The code and models are available at https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo. △ Less

Submitted 8 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted at CVPR 2024. Project page: https://github.com/open-mmlab/mmpose/tree/main/projects/rtmo

arXiv:2311.12273 [pdf, other]

How AI-driven Digital Twins Can Empower Mobile Networks

Authors: Tong Li, Fenyu Jiang, Qiaohong Yu, Wenzhen Huang, Tao Jiang, Depeng **

Abstract: The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which… ▽ More The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.12079 [pdf, other]

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

Authors: Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang

Abstract: Knowledge distillation (KD) has been applied to various tasks successfully, and mainstream methods typically boost the student model via spatial imitation losses. However, the consecutive downsamplings induced in the spatial domain of teacher model is a type of corruption, hindering the student from analyzing what specific information needs to be imitated, which results in accuracy degradation. To… ▽ More Knowledge distillation (KD) has been applied to various tasks successfully, and mainstream methods typically boost the student model via spatial imitation losses. However, the consecutive downsamplings induced in the spatial domain of teacher model is a type of corruption, hindering the student from analyzing what specific information needs to be imitated, which results in accuracy degradation. To better understand the underlying pattern of corrupted feature maps, we shift our attention to the frequency domain. During frequency distillation, we encounter a new challenge: the low-frequency bands convey general but minimal context, while the high are more informative but also introduce noise. Not each pixel within the frequency bands contributes equally to the performance. To address the above problem: (1) We propose the Frequency Prompt plugged into the teacher model, absorbing the semantic frequency context during finetuning. (2) During the distillation period, a pixel-wise frequency mask is generated via Frequency Prompt, to localize those pixel of interests (PoIs) in various frequency bands. Additionally, we employ a position-aware relational frequency loss for dense prediction tasks, delivering a high-order spatial enhancement to the student model. We dub our Frequency Knowledge Distillation method as FreeKD, which determines the optimal localization and extent for the frequency distillation. Extensive experiments demonstrate that FreeKD not only outperforms spatial-based distillation methods consistently on dense prediction tasks (e.g., FreeKD brings 3.8 AP gains for RepPoints-R50 on COCO2017 and 4.55 mIoU gains for PSPNet-R18 on Cityscapes), but also conveys more robustness to the student. Notably, we also validate the generalization of our approach on large-scale vision models (e.g., DINO and SAM). △ Less

Submitted 22 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted by CVPR 2024

arXiv:2311.07582 [pdf]

Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions

Authors: Xinyu Gong, Jason Holmes, Yiwei Li, Zhengliang Liu, Qi Gan, Zihao Wu, Jianli Zhang, Yusong Zou, Yuxi Teng, Tian Jiang, Hongtu Zhu, Wei Liu, Tianming Liu, Yajun Yan

Abstract: Recent advances in Large Language Models (LLMs) have presented new opportunities for integrating Artificial General Intelligence (AGI) into biological research and education. This study evaluated the capabilities of leading LLMs, including GPT-4, GPT-3.5, PaLM2, Claude2, and SenseNova, in answering conceptual biology questions. The models were tested on a 108-question multiple-choice exam covering… ▽ More Recent advances in Large Language Models (LLMs) have presented new opportunities for integrating Artificial General Intelligence (AGI) into biological research and education. This study evaluated the capabilities of leading LLMs, including GPT-4, GPT-3.5, PaLM2, Claude2, and SenseNova, in answering conceptual biology questions. The models were tested on a 108-question multiple-choice exam covering biology topics in molecular biology, biological techniques, metabolic engineering, and synthetic biology. Among the models, GPT-4 achieved the highest average score of 90 and demonstrated the greatest consistency across trials with different prompts. The results indicated GPT-4's proficiency in logical reasoning and its potential to aid biology research through capabilities like data analysis, hypothesis generation, and knowledge integration. However, further development and validation are still required before the promise of LLMs in accelerating biological discovery can be realized. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2311.06062 [pdf, other]

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

Abstract: Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classifie… ▽ More Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classified into two categories: reference-free and reference-based attacks. They are both based on the hypothesis that training records consistently strike a higher probability of being sampled. Nevertheless, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. The reference-based attack seems to achieve promising effectiveness in LLMs, which measures a more reliable membership signal by comparing the probability discrepancy between the target model and the reference model. However, the performance of reference-based attack is highly dependent on a reference dataset that closely resembles the training dataset, which is usually inaccessible in the practical scenario. Overall, existing MIAs are unable to effectively unveil privacy leakage over practical fine-tuned LLMs that are overfitting-free and private. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal, probabilistic variation, which is based on memorization rather than overfitting. Furthermore, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs. △ Less

Submitted 25 June, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Repo: https://github.com/wjfu99/MIA-LLMs

arXiv:2311.06049 [pdf, other]

Privacy-Preserving Individual-Level COVID-19 Infection Prediction via Federated Graph Learning

Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

Abstract: Accurately predicting individual-level infection state is of great value since its essential role in reducing the damage of the epidemic. However, there exists an inescapable risk of privacy leakage in the fine-grained user mobility trajectories required by individual-level infection prediction. In this paper, we focus on develo** a framework of privacy-preserving individual-level infection pred… ▽ More Accurately predicting individual-level infection state is of great value since its essential role in reducing the damage of the epidemic. However, there exists an inescapable risk of privacy leakage in the fine-grained user mobility trajectories required by individual-level infection prediction. In this paper, we focus on develo** a framework of privacy-preserving individual-level infection prediction based on federated learning (FL) and graph neural networks (GNN). We propose Falcon, a Federated grAph Learning method for privacy-preserving individual-level infeCtion predictiON. It utilizes a novel hypergraph structure with spatio-temporal hyperedges to describe the complex interactions between individuals and locations in the contagion process. By organically combining the FL framework with hypergraph neural networks, the information propagation process of the graph machine learning is able to be divided into two stages distributed on the server and the clients, respectively, so as to effectively protect user privacy while transmitting high-level information. Furthermore, it elaborately designs a differential privacy perturbation mechanism as well as a plausible pseudo location generation approach to preserve user privacy in the graph structure. Besides, it introduces a cooperative coupling mechanism between the individual-level prediction model and an additional region-level model to mitigate the detrimental impacts caused by the injected obfuscation mechanisms. Extensive experimental results show that our methodology outperforms state-of-the-art algorithms and is able to protect user privacy against actual privacy attacks. Our code and datasets are available at the link: https://github.com/wjfu99/FL-epidemic. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: accepted by TOIS

arXiv:2311.01696 [pdf, other]

Universal Perturbation-based Secret Key-Controlled Data Hiding

Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Xiaoqian Chen

Abstract: Deep neural networks (DNNs) are demonstrated to be vulnerable to universal perturbation, a single quasi-perceptible perturbation that can deceive the DNN on most images. However, the previous works are focused on using universal perturbation to perform adversarial attacks, while the potential usability of universal perturbation as data carriers in data hiding is less explored, especially for the k… ▽ More Deep neural networks (DNNs) are demonstrated to be vulnerable to universal perturbation, a single quasi-perceptible perturbation that can deceive the DNN on most images. However, the previous works are focused on using universal perturbation to perform adversarial attacks, while the potential usability of universal perturbation as data carriers in data hiding is less explored, especially for the key-controlled data hiding method. In this paper, we propose a novel universal perturbation-based secret key-controlled data-hiding method, realizing data hiding with a single universal perturbation and data decoding with the secret key-controlled decoder. Specifically, we optimize a single universal perturbation, which serves as a data carrier that can hide multiple secret images and be added to most cover images. Then, we devise a secret key-controlled decoder to extract different secret images from the single container image constructed by the universal perturbation by using different secret keys. Moreover, a suppress loss function is proposed to prevent the secret image from leakage. Furthermore, we adopt a robust module to boost the decoder's capability against corruption. Finally, A co-joint optimization strategy is proposed to find the optimal universal perturbation and decoder. Extensive experiments are conducted on different datasets to demonstrate the effectiveness of the proposed method. Additionally, the physical test performed on platforms (e.g., WeChat and Twitter) verifies the usability of the proposed method in practice. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 18 pages, 8 tables, 10 figures

arXiv:2311.01473 [pdf, other]

Adversarial Examples in the Physical World: A Survey

Authors: Jiakai Wang, Donghua Wang, ** Hu, Siyang Wu, Tingsong Jiang, Wen Yao, Aishan Liu, Xianglong Liu

Abstract: Deep neural networks (DNNs) have demonstrated high vulnerability to adversarial examples. Besides the attacks in the digital world, the practical implications of adversarial examples in the physical world present significant challenges and safety concerns. However, current research on physical adversarial examples (PAEs) lacks a comprehensive understanding of their unique characteristics, leading… ▽ More Deep neural networks (DNNs) have demonstrated high vulnerability to adversarial examples. Besides the attacks in the digital world, the practical implications of adversarial examples in the physical world present significant challenges and safety concerns. However, current research on physical adversarial examples (PAEs) lacks a comprehensive understanding of their unique characteristics, leading to limited significance and understanding. In this paper, we address this gap by thoroughly examining the characteristics of PAEs within a practical workflow encompassing training, manufacturing, and re-sampling processes. By analyzing the links between physical adversarial attacks, we identify manufacturing and re-sampling as the primary sources of distinct attributes and particularities in PAEs. Leveraging this knowledge, we develop a comprehensive analysis and classification framework for PAEs based on their specific characteristics, covering over 100 studies on physical-world adversarial examples. Furthermore, we investigate defense strategies against PAEs and identify open challenges and opportunities for future research. We aim to provide a fresh, thorough, and systematic understanding of PAEs, thereby promoting the development of robust adversarial learning and its application in open-world scenarios. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Adversarial examples, physical-world scenarios, attacks and defenses

arXiv:2310.17649 [pdf, other]

6-DoF Stability Field via Diffusion Models

Authors: Takuma Yoneda, Tianchong Jiang, Gregory Shakhnarovich, Matthew R. Walter

Abstract: A core capability for robot manipulation is reasoning over where and how to stably place objects in cluttered environments. Traditionally, robots have relied on object-specific, hand-crafted heuristics in order to perform such reasoning, with limited generalizability beyond a small number of object instances and object interaction patterns. Recent approaches instead learn notions of physical inter… ▽ More A core capability for robot manipulation is reasoning over where and how to stably place objects in cluttered environments. Traditionally, robots have relied on object-specific, hand-crafted heuristics in order to perform such reasoning, with limited generalizability beyond a small number of object instances and object interaction patterns. Recent approaches instead learn notions of physical interaction, namely motion prediction, but require supervision in the form of labeled object information or come at the cost of high sample complexity, and do not directly reason over stability or object placement. We present 6-DoFusion, a generative model capable of generating 3D poses of an object that produces a stable configuration of a given scene. Underlying 6-DoFusion is a diffusion model that incrementally refines a randomly initialized SE(3) pose to generate a sample from a learned, context-dependent distribution over stable poses. We evaluate our model on different object placement and stacking tasks, demonstrating its ability to construct stable scenes that involve novel object classes as well as to improve the accuracy of state-of-the-art 3D pose estimation methods. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: In submission

arXiv:2310.13160 [pdf, other]

Active Sensing for Localization with Reconfigurable Intelligent Surface

Authors: Zhongze Zhang, Tao Jiang, Wei Yu

Abstract: This paper addresses an uplink localization problem in which the base station (BS) aims to locate a remote user with the aid of reconfigurable intelligent surface (RIS). This paper proposes a strategy in which the user transmits pilots over multiple time frames, and the BS adaptively adjusts the RIS reflection coefficients based on the observations already received so far in order to produce an ac… ▽ More This paper addresses an uplink localization problem in which the base station (BS) aims to locate a remote user with the aid of reconfigurable intelligent surface (RIS). This paper proposes a strategy in which the user transmits pilots over multiple time frames, and the BS adaptively adjusts the RIS reflection coefficients based on the observations already received so far in order to produce an accurate estimate of the user location at the end. This is a challenging active sensing problem for which finding an optimal solution involves a search through a complicated functional space whose dimension increases with the number of measurements. In this paper, we show that the long short-term memory (LSTM) network can be used to exploit the latent temporal correlation between measurements to automatically construct scalable information vectors (called hidden state) based on the measurements. Subsequently, the state vector can be mapped to the RIS configuration for the next time frame in a codebook-free fashion via a deep neural network (DNN). After all the measurements have been received, a final DNN can be used to map the LSTM cell state to the estimated user equipment (UE) position. Numerical result shows that the proposed active RIS design results in lower localization error as compared to existing active and nonactive methods. The proposed solution produces interpretable results and is generalizable to early stop** in the sequence of sensing stages. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted in IEEE International Conference on Communications (ICC) 2023

arXiv:2310.11044 [pdf, ps, other]

A Tutorial on Near-Field XL-MIMO Communications Towards 6G

Authors: Haiquan Lu, Yong Zeng, Changsheng You, Yu Han, Jiayi Zhang, Zhe Wang, Zhenjun Dong, Shi **, Cheng-Xiang Wang, Tao Jiang, Xiaohu You, Rui Zhang

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The evolution from massive MIMO to XL-MIMO is not simply an increase in the array size, but faces new design challenges, in terms of near-field channel modelling, performance analysis, channel estimation, and practical implementation. In this article, we give a comprehensive tutorial overview on near-field XL-MIMO communications, aiming to provide useful guidance for tackling the above challenges. First, the basic near-field modelling for XL-MIMO is established, by considering the new characteristics of non-uniform spherical wave (NUSW) and spatial non-stationarity. Next, based on the near-field modelling, the performance analysis of XL-MIMO is presented, including the near-field signal-to-noise ratio (SNR) scaling laws, beam focusing pattern, achievable rate, and degrees-of-freedom (DoF). Furthermore, various XL-MIMO design issues such as near-field beam codebook, beam training, channel estimation, and delay alignment modulation (DAM) transmission are elaborated. Finally, we point out promising directions to inspire future research on near-field XL-MIMO communications. △ Less

Submitted 3 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 42 pages

Showing 1–50 of 310 results for author: Jiang, T