Search | arXiv e-print repository

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

arXiv:2406.15031 [pdf, other]

New Upper Bounds for Noisy Permutation Channels

Authors: Lugaoze Feng, Baoji Wang, Guocheng Lv, Xvnan Li, Luhua Wang, Ye **

Abstract: The noisy permutation channel is a useful abstraction introduced by Makur for point-to-point communication networks and biological storage. While the asymptotic capacity results exist for this model, the characterization of the second-order asymptotics is not available. Therefore, we analyze the converse bounds for the noisy permutation channel in the finite blocklength regime. To do this, we pres… ▽ More The noisy permutation channel is a useful abstraction introduced by Makur for point-to-point communication networks and biological storage. While the asymptotic capacity results exist for this model, the characterization of the second-order asymptotics is not available. Therefore, we analyze the converse bounds for the noisy permutation channel in the finite blocklength regime. To do this, we present a modified minimax meta-converse for noisy permutation channels by symbol relaxation. To derive the second-order asymptotics of the converse bound, we propose a way to use divergence covering in analysis. It enables the observation of the second-order asymptotics and the strong converse via Berry-Esseen type bounds. These two conclusions hold for noisy permutation channels with strictly positive matrices (entry-wise). In addition, we obtain computable bounds for the noisy permutation channel with the binary symmetric channel (BSC), including the original computable converse bound based on the modified minimax meta-converse, the asymptotic expansion derived from our subset covering technique, and the ε-capacity result. We find that a smaller crossover probability provides a higher upper bound for a fixed finite blocklength, although the ε-capacity is agnostic to the BSC parameter. Finally, numerical results show that the normal approximation shows remarkable precision, and our new converse bound is stronger than previous bounds. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 24 Pages, Submitted to IEEE Transactions on Communications

arXiv:2406.14537 [pdf, other]

MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading

Authors: Chuqiao Zong, Chaojie Wang, Molei Qin, Lei Feng, Xinrun Wang, Bo An

Abstract: High-frequency trading (HFT) that executes algorithmic trading in short time scales, has recently occupied the majority of cryptocurrency market. Besides traditional quantitative trading methods, reinforcement learning (RL) has become another appealing approach for HFT due to its terrific ability of handling high-dimensional financial data and solving sophisticated sequential decision-making probl… ▽ More High-frequency trading (HFT) that executes algorithmic trading in short time scales, has recently occupied the majority of cryptocurrency market. Besides traditional quantitative trading methods, reinforcement learning (RL) has become another appealing approach for HFT due to its terrific ability of handling high-dimensional financial data and solving sophisticated sequential decision-making problems, \emph{e.g.,} hierarchical reinforcement learning (HRL) has shown its promising performance on second-level HFT by training a router to select only one sub-agent from the agent pool to execute the current transaction. However, existing RL methods for HFT still have some defects: 1) standard RL-based trading agents suffer from the overfitting issue, preventing them from making effective policy adjustments based on financial context; 2) due to the rapid changes in market conditions, investment decisions made by an individual agent are usually one-sided and highly biased, which might lead to significant loss in extreme markets. To tackle these problems, we propose a novel Memory Augmented Context-aware Reinforcement learning method On HFT, \emph{a.k.a.} MacroHFT, which consists of two training phases: 1) we first train multiple types of sub-agents with the market data decomposed according to various financial indicators, specifically market trend and volatility, where each agent owns a conditional adapter to adjust its trading policy according to market conditions; 2) then we train a hyper-agent to mix the decisions from these sub-agents and output a consistently profitable meta-policy to handle rapid market fluctuations, equipped with a memory mechanism to enhance the capability of decision-making. Extensive experiments on various cryptocurrency markets demonstrate that MacroHFT can achieve state-of-the-art performance on minute-level trading tasks. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted to KDD 2024

arXiv:2406.14359 [pdf, other]

Learning to Transfer for Evolutionary Multitasking

Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited number of evolution operators and insufficient utilization of evolutionary states for performing KT. This results in suboptimal exploitation of implicit KT's potential to tackle a variety of MTOPs. To overcome these limitations, we propose a novel Learning to Transfer (L2T) framework to automatically discover efficient KT policies for the MTOPs at hand. Our framework conceptualizes the KT process as a learning agent's sequence of strategic decisions within the EMT process. We propose an action formulation for deciding when and how to transfer, a state representation with informative features of evolution states, a reward formulation concerning convergence and transfer efficiency gain, and the environment for the agent to interact with MTOPs. We employ an actor-critic network structure for the agent and learn it via proximal policy optimization. This learned agent can be integrated with various evolutionary algorithms, enhancing their ability to address a range of new MTOPs. Comprehensive empirical studies on both synthetic and real-world MTOPs, encompassing diverse inter-task relationships, function classes, and task distributions are conducted to validate the proposed L2T framework. The results show a marked improvement in the adaptability and performance of implicit EMT when solving a wide spectrum of unseen MTOPs. △ Less

Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.11168 [pdf, other]

Two-Timescale Optimization Framework for Decentralized Linear-Quadratic Optimal Control

Authors: Lechen Feng, Yuan-Hua Ni, Xuebo Zhang

Abstract: This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Su… ▽ More This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Successive Upper-bound Minimization) framework and a differential equation solver. Second, a piecewise quadratic sparsity promoting function is introduced, and the induced optimization problem demonstrates an accelerated convergence rate by performing the same two-timescale algorithm. Finally, the optimization problem with $\ell_0$ sparsity promoting function is considered that is nonconvex and discontinuous, and can be approximated by successive coordinatewise convex optimization problems. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10502 [pdf, other]

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Authors: Jiahan Zhang, Qi Wei, Feng Liu, Lei Feng

Abstract: Fine-tuning vision-language models (VLMs) with abundant unlabeled data recently has attracted increasing attention. Existing methods that resort to the pseudolabeling strategy would suffer from heavily incorrect hard pseudolabels when VLMs exhibit low zero-shot performance in downstream tasks. To alleviate this issue, we propose a Candidate Pseudolabel Learning method, termed CPL, to fine-tune VLM… ▽ More Fine-tuning vision-language models (VLMs) with abundant unlabeled data recently has attracted increasing attention. Existing methods that resort to the pseudolabeling strategy would suffer from heavily incorrect hard pseudolabels when VLMs exhibit low zero-shot performance in downstream tasks. To alleviate this issue, we propose a Candidate Pseudolabel Learning method, termed CPL, to fine-tune VLMs with suitable candidate pseudolabels of unlabeled data in downstream tasks. The core of our method lies in the generation strategy of candidate pseudolabels, which progressively generates refined candidate pseudolabels by both intra- and inter-instance label selection, based on a confidence score matrix for all unlabeled data. This strategy can result in better performance in true label inclusion and class-balanced instance selection. In this way, we can directly apply existing loss functions to learn with generated candidate psueudolabels. Extensive experiments on nine benchmark datasets with three learning paradigms demonstrate the effectiveness of our method. Our code can be found at https://github.com/vanillaer/CPL-ICML2024. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted by ICML2024

arXiv:2406.09385 [pdf, other]

Towards Vision-Language Geo-Foundation Model: A Survey

Authors: Yue Zhou, Litong Feng, Yi** Ke, Xue Jiang, Junchi Yan, Xue Yang, Wayne Zhang

Abstract: Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs… ▽ More Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs fine-tuned on them have been proposed recently. These new approaches aim to leverage large-scale, multimodal geospatial data to build versatile intelligent models with diverse geo-perceptive capabilities, which we refer to as Vision-Language Geo-Foundation Models (VLGFMs). This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field. In particular, we introduce the background and motivation behind the rise of VLGFMs, highlighting their unique research significance. Then, we systematically summarize the core technologies employed in VLGFMs, including data construction, model architectures, and applications of various multimodal geospatial tasks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To the best of our knowledge, this is the first comprehensive literature review of VLGFMs. We keep tracing related works at https://github.com/zytx121/Awesome-VLGFM. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 18 pages, 4 figures

arXiv:2406.08987 [pdf, other]

Towards Next Era of Multi-objective Optimization: Large Language Models as Architects of Evolutionary Operators

Authors: Yuxiao Huang, Shenghao Wu, Wenjie Zhang, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Multi-objective optimization problems (MOPs) are prevalent in various real-world applications, necessitating sophisticated solutions that balance conflicting objectives. Traditional evolutionary algorithms (EAs), while effective, often rely on domain-specific expert knowledge and iterative tuning, which can impede innovation when encountering novel MOPs. Very recently, the emergence of Large Langu… ▽ More Multi-objective optimization problems (MOPs) are prevalent in various real-world applications, necessitating sophisticated solutions that balance conflicting objectives. Traditional evolutionary algorithms (EAs), while effective, often rely on domain-specific expert knowledge and iterative tuning, which can impede innovation when encountering novel MOPs. Very recently, the emergence of Large Language Models (LLMs) has revolutionized software engineering by enabling the autonomous development and refinement of programs. Capitalizing on this advancement, we propose a new LLM-based framework for evolving EA operators, designed to address a wide array of MOPs. This framework facilitates the production of EA operators without the extensive demands for expert intervention, thereby streamlining the design process. To validate the efficacy of our approach, we have conducted extensive empirical studies across various categories of MOPs. The results demonstrate the robustness and superior performance of our LLM-evolved operators. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 5 figures, 5 tables

arXiv:2406.08754 [pdf, other]

StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Encoded Structure

Authors: Bangxin Li, Hengrui Xing, Chao Huang, ** Qian, Huangqing Xiao, Linfeng Feng, Cong Tian

Abstract: Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on… ▽ More Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on studying how prompt structure contributes to the jailbreak attack. We introduce a novel structure-level attack method based on tail structures that are rarely used during LLM training, which we refer to as Uncommon Text-Encoded Structure (UTES). We extensively study 12 UTESs templates and 6 obfuscation methods to build an effective automated jailbreak tool named StructuralSleight that contains three escalating attack strategies: Structural Attack, Structural and Character/Context Obfuscation Attack, and Fully Obfuscated Structural Attack. Extensive experiments on existing LLMs show that StructuralSleight significantly outperforms baseline methods. In particular, the attack success rate reaches 94.62\% on GPT-4o, which has not been addressed by state-of-the-art techniques. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures

arXiv:2406.08079 [pdf, other]

A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

Authors: Lixian Zhang, Yi Zhao, Runmin Dong, **xiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu

Abstract: Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limita… ▽ More Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limitation persists: the inability to effectively integrate spatial, temporal, and spectral information within a single unified model. To unlock the potential of RS data, we construct a Spatial-Temporal-Spectral Structured Dataset (STSSD) characterized by the incorporation of multiple RS sources, diverse coverage, unified locations within image sets, and heterogeneity within images. Building upon this structured dataset, we propose an Anchor-Aware Masked AutoEncoder method (A$^{2}$-MAE), leveraging intrinsic complementary information from the different kinds of images and geo-information to reconstruct the masked patches during the pre-training phase. A$^{2}$-MAE integrates an anchor-aware masking strategy and a geographic encoding module to comprehensively exploit the properties of RS images. Specifically, the proposed anchor-aware masking strategy dynamically adapts the masking process based on the meta-information of a pre-selected anchor image, thereby facilitating the training on images captured by diverse types of RS sources within one model. Furthermore, we propose a geographic encoding method to leverage accurate spatial patterns, enhancing the model generalization capabilities for downstream applications that are generally location-related. Extensive experiments demonstrate our method achieves comprehensive improvements across various downstream tasks compared with existing RS pre-training methods, including image classification, semantic segmentation, and change detection tasks. △ Less

Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07069 [pdf, other]

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Authors: Xuezhi Niu, Kaige Tan, Lei Feng

Abstract: This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the propos… ▽ More This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in develo** sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07065 [pdf, other]

Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization

Authors: Kaige Tan, Xuezhi Niu, Qinglei Ji, Lei Feng, Martin Törngren

Abstract: This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of model… ▽ More This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of modeling discrepancies, we implement a multi-fidelity BO approach, combining data from both simulation and physical experiments throughout training and optimization. This strategy enables the adaptive refinement of the gait pattern and ensures a smooth transition from simulation to real-world deployment for the controller. Moreover, we integrate a computational task off-loading architecture by edge computing, which reduces the onboard computational and memory overhead, to improve real-time control performance and facilitate an effective online learning process. The proposed approach successfully achieves optimal walking gait design for physical deployment with high efficiency, effectively addressing challenges related to the reality gap in soft robotics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04609 [pdf, other]

Diverse Intra- and Inter-Domain Activity Style Fusion for Cross-Person Generalization in Activity Recognition

Authors: Junru Zhang, Lang Feng, Zhidan Liu, Yuhan Wu, Yang He, Yabo Dong, Duanqing Xu

Abstract: Existing domain generalization (DG) methods for cross-person generalization tasks often face challenges in capturing intra- and inter-domain style diversity, resulting in domain gaps with the target domain. In this study, we explore a novel perspective to tackle this problem, a process conceptualized as domain padding. This proposal aims to enrich the domain diversity by synthesizing intra- and in… ▽ More Existing domain generalization (DG) methods for cross-person generalization tasks often face challenges in capturing intra- and inter-domain style diversity, resulting in domain gaps with the target domain. In this study, we explore a novel perspective to tackle this problem, a process conceptualized as domain padding. This proposal aims to enrich the domain diversity by synthesizing intra- and inter-domain style data while maintaining robustness to class labels. We instantiate this concept using a conditional diffusion model and introduce a style-fused sampling strategy to enhance data generation diversity. In contrast to traditional condition-guided sampling, our style-fused sampling strategy allows for the flexible use of one or more random styles to guide data synthesis. This feature presents a notable advancement: it allows for the maximum utilization of possible permutations and combinations among existing styles to generate a broad spectrum of new style instances. Empirical evaluations on a broad range of datasets demonstrate that our generated data achieves remarkable diversity within the domain space. Both intra- and inter-domain generated data have proven to be significant and valuable, contributing to varying degrees of performance enhancements. Notably, our approach outperforms state-of-the-art DG methods in all human activity recognition tasks. △ Less

Submitted 28 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)

arXiv:2406.03150 [pdf, other]

Sample-specific Masks for Visual Reprogramming-based Prompting

Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

Abstract: Visual reprogramming (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a small-scale pattern added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask shared across… ▽ More Visual reprogramming (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a small-scale pattern added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask shared across all samples. In this paper, we show that the shared mask potentially limits VR's generalization and increases its approximation error due to the lack of sample-level adaptation. Motivated by this finding, we design a new framework for VR called sample-specific multi-channel masks (SMM). Specifically, SMM employs a lightweight ConvNet and patch-wise interpolation to generate sample-specific three-channel masks instead of a shared and pre-defined mask. Since we generate different masks for individual samples, SMM is theoretically shown to reduce approximation error for the target tasks compared with existing state-of-the-art VR methods. We also empirically demonstrate its performance gain on both ResNet and ViT. The success of SMM further highlights the broader applicability of VR in leveraging the latent knowledge of pre-trained models for various target tasks. Our code is available at https://github.com/tmlr-group/SMM. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02915 [pdf, other]

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Authors: **hao Li, Haopeng Li, Sarah Erfani, Lei Feng, James Bailey, Feng Liu

Abstract: It has recently been discovered that using a pre-trained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language model can significantly enhance zero-shot performance. However, in this paper, we empirically find that the finer descriptions tend to align more effectively with local areas of the query image rather than… ▽ More It has recently been discovered that using a pre-trained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language model can significantly enhance zero-shot performance. However, in this paper, we empirically find that the finer descriptions tend to align more effectively with local areas of the query image rather than the whole image, and then we theoretically validate this finding. Thus, we present a method called weighted visual-text cross alignment (WCA). This method begins with a localized visual prompting technique, designed to identify local visual areas within the query image. The local visual areas are then cross-aligned with the finer descriptions by creating a similarity matrix using the pre-trained VLM. To determine how well a query image aligns with each category, we develop a score function based on the weighted similarities in this matrix. Extensive experiments demonstrate that our method significantly improves zero-shot performance across various datasets, achieving results that are even comparable to few-shot learning methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 22 pages, 16 figures, published to ICML 2024

MSC Class: 68T45; 68T10 ACM Class: I.2.10; I.4.10

arXiv:2405.20705 [pdf, other]

ADESSE: Advice Explanations in Complex Repeated Decision-Making Environments

Authors: Sören Schleibaum, Lu Feng, Sarit Kraus, Jörg P. Müller

Abstract: In the evolving landscape of human-centered AI, fostering a synergistic relationship between humans and AI agents in decision-making processes stands as a paramount challenge. This work considers a problem setup where an intelligent agent comprising a neural network-based prediction component and a deep reinforcement learning component provides advice to a human decision-maker in complex repeated… ▽ More In the evolving landscape of human-centered AI, fostering a synergistic relationship between humans and AI agents in decision-making processes stands as a paramount challenge. This work considers a problem setup where an intelligent agent comprising a neural network-based prediction component and a deep reinforcement learning component provides advice to a human decision-maker in complex repeated decision-making environments. Whether the human decision-maker would follow the agent's advice depends on their beliefs and trust in the agent and on their understanding of the advice itself. To this end, we developed an approach named ADESSE to generate explanations about the adviser agent to improve human trust and decision-making. Computational experiments on a range of environments with varying model sizes demonstrate the applicability and scalability of ADESSE. Furthermore, an interactive game-based user study shows that participants were significantly more satisfied, achieved a higher reward in the game, and took less time to select an action when presented with explanations generated by ADESSE. These findings illuminate the critical role of tailored, human-centered explanations in AI-assisted decision-making. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.17879 [pdf, other]

Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree

Authors: Lang Feng, Pengjie Gu, Bo An, Gang Pan

Abstract: Diffusion planners have shown promise in handling long-horizon and sparse-reward tasks due to the non-autoregressive plan generation. However, their inherent stochastic risk of generating infeasible trajectories presents significant challenges to their reliability and stability. We introduce a novel approach, the Trajectory Aggregation Tree (TAT), to address this issue in diffusion planners. Compa… ▽ More Diffusion planners have shown promise in handling long-horizon and sparse-reward tasks due to the non-autoregressive plan generation. However, their inherent stochastic risk of generating infeasible trajectories presents significant challenges to their reliability and stability. We introduce a novel approach, the Trajectory Aggregation Tree (TAT), to address this issue in diffusion planners. Compared to prior methods that rely solely on raw trajectory predictions, TAT aggregates information from both historical and current trajectories, forming a dynamic tree-like structure. Each trajectory is conceptualized as a branch and individual states as nodes. As the structure evolves with the integration of new trajectories, unreliable states are marginalized, and the most impactful nodes are prioritized for decision-making. TAT can be deployed without modifying the original training and sampling pipelines of diffusion planners, making it a training-free, ready-to-deploy solution. We provide both theoretical analysis and empirical evidence to support TAT's effectiveness. Our results highlight its remarkable ability to resist the risk from unreliable trajectories, guarantee the performance boosting of diffusion planners in $100\%$ of tasks, and exhibit an appreciable tolerance margin for sample quality, thereby enabling planning with a more than $3\times$ acceleration. △ Less

Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: ICML 2024 (Spotlight)

arXiv:2405.15269 [pdf, other]

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection

Authors: Yuwei Niu, Shuo He, Qi Wei, Feng Liu, Lei Feng

Abstract: Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that coul… ▽ More Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that could be attacked by inserted triggers in downstream tasks with a high success rate. To defend against backdoor attacks on CLIP, existing defense methods focus on either the pre-training stage or the fine-tuning stage, which would unfortunately cause high computational costs due to numerous parameter updates. In this paper, we provide the first attempt at a computationally efficient backdoor detection method to defend against backdoored CLIP in the inference stage. We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts. Motivated by this observation, we propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting. Specifically, we first prompt the language model (e.g., GPT-4) to produce class-related description texts (benign) and class-perturbed random texts (malignant) by specially designed instructions. Then, the distribution difference in cosine similarity between images and the two types of class description texts can be used as the criterion to detect backdoor samples. Extensive experiments validate that our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14474 [pdf, other]

Time Cell Inspired Temporal Codebook in Spiking Neural Networks for Enhanced Image Generation

Authors: Linghao Feng, Dongcheng Zhao, Sicheng Shen, Yiting Dong, Guobin Shen, Yi Zeng

Abstract: This paper presents a novel approach leveraging Spiking Neural Networks (SNNs) to construct a Variational Quantized Autoencoder (VQ-VAE) with a temporal codebook inspired by hippocampal time cells. This design captures and utilizes temporal dependencies, significantly enhancing the generative capabilities of SNNs. Neuroscientific research has identified hippocampal "time cells" that fire sequentia… ▽ More This paper presents a novel approach leveraging Spiking Neural Networks (SNNs) to construct a Variational Quantized Autoencoder (VQ-VAE) with a temporal codebook inspired by hippocampal time cells. This design captures and utilizes temporal dependencies, significantly enhancing the generative capabilities of SNNs. Neuroscientific research has identified hippocampal "time cells" that fire sequentially during temporally structured experiences. Our temporal codebook emulates this behavior by triggering the activation of time cell populations based on similarity measures as input stimuli pass through it. We conducted extensive experiments on standard benchmark datasets, including MNIST, FashionMNIST, CIFAR10, CelebA, and downsampled LSUN Bedroom, to validate our model's performance. Furthermore, we evaluated the effectiveness of the temporal codebook on neuromorphic datasets NMNIST and DVS-CIFAR10, and demonstrated the model's capability with high-resolution datasets such as CelebA-HQ, LSUN Bedroom, and LSUN Church. The experimental results indicate that our method consistently outperforms existing SNN-based generative models across multiple datasets, achieving state-of-the-art performance. Notably, our approach excels in generating high-resolution and temporally consistent data, underscoring the crucial role of temporal information in SNN-based generative modeling. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14111 [pdf, other]

Improving Generalization of Deep Neural Networks by Optimum Shifting

Authors: Yuyan Zhou, Ye Li, Lei Feng, Sheng-Jun Huang

Abstract: Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method called \emph{optimum shifting}, which changes the parameters of a neural network from a sharp minimum to a flatter one while maintaining the same training loss va… ▽ More Recent studies showed that the generalization of neural networks is correlated with the sharpness of the loss landscape, and flat minima suggests a better generalization ability than sharp minima. In this paper, we propose a novel method called \emph{optimum shifting}, which changes the parameters of a neural network from a sharp minimum to a flatter one while maintaining the same training loss value. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations, enabling adjustment of parameters in the solution space, which can be simply accomplished by solving a constrained optimization problem. Furthermore, we introduce a practical stochastic optimum shifting technique utilizing the Neural Collapse theory to reduce computational costs and provide more degrees of freedom for optimum shifting. Extensive experiments (including classification and detection) with various deep neural network architectures on benchmark datasets demonstrate the effectiveness of our method. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13956 [pdf, other]

Attention as an RNN

Authors: Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori

Abstract: The advent of Transformers marked a significant breakthrough in sequence modelling, providing a highly performant architecture capable of leveraging GPU parallelism. However, Transformers are computationally expensive at inference time, limiting their applications, particularly in low-resource settings (e.g., mobile and embedded devices). Addressing this, we (1) begin by showing that attention can… ▽ More The advent of Transformers marked a significant breakthrough in sequence modelling, providing a highly performant architecture capable of leveraging GPU parallelism. However, Transformers are computationally expensive at inference time, limiting their applications, particularly in low-resource settings (e.g., mobile and embedded devices). Addressing this, we (1) begin by showing that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its \textit{many-to-one} RNN output efficiently. We then (2) show that popular attention-based models such as Transformers can be viewed as RNN variants. However, unlike traditional RNNs (e.g., LSTMs), these models cannot be updated efficiently with new tokens, an important property in sequence modelling. Tackling this, we (3) introduce a new efficient method of computing attention's \textit{many-to-many} RNN output based on the parallel prefix scan algorithm. Building on the new attention formulation, we (4) introduce \textbf{Aaren}, an attention-based module that can not only (i) be trained in parallel (like Transformers) but also (ii) be updated efficiently with new tokens, requiring only constant memory for inferences (like traditional RNNs). Empirically, we show Aarens achieve comparable performance to Transformers on $38$ datasets spread across four popular sequential problem settings: reinforcement learning, event forecasting, time series classification, and time series forecasting tasks while being more time and memory-efficient. △ Less

Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.09721 [pdf, other]

DP-RuL: Differentially-Private Rule Learning for Clinical Decision Support Systems

Authors: Josephine Lamp, Lu Feng, David Evans

Abstract: Serious privacy concerns arise with the use of patient data in rule-based clinical decision support systems (CDSS). The goal of a privacy-preserving CDSS is to learn a population ruleset from individual clients' local rulesets, while protecting the potentially sensitive information contained in the rulesets. We present the first work focused on this problem and develop a framework for learning pop… ▽ More Serious privacy concerns arise with the use of patient data in rule-based clinical decision support systems (CDSS). The goal of a privacy-preserving CDSS is to learn a population ruleset from individual clients' local rulesets, while protecting the potentially sensitive information contained in the rulesets. We present the first work focused on this problem and develop a framework for learning population rulesets with local differential privacy (LDP), suitable for use within a distributed CDSS and other distributed settings. Our rule discovery protocol uses a Monte-Carlo Tree Search (MCTS) method integrated with LDP to search a rule grammar in a structured way and find rule structures clients are likely to have. Randomized response queries are sent to clients to determine promising paths to search within the rule grammar. In addition, we introduce an adaptive budget allocation method which dynamically determines how much privacy loss budget to use at each query, resulting in better privacy-utility trade-offs. We evaluate our approach using three clinical datasets and find that we are able to learn population rulesets with high coverage (breadth of rules) and clinical utility even at low privacy loss budgets. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07777 [pdf, other]

GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images

Authors: Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Lin Feng

Abstract: Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-l… ▽ More Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-linear computational efficiency and superior performance, prompting our investigation into its potential for SR problem. To this end, we propose the Gradient-guided Mamba for Spectral Reconstruction from RGB Images, dubbed GMSR-Net. GMSR-Net is a lightweight model characterized by a global receptive field and linear computational complexity. Its core comprises multiple stacked Gradient Mamba (GM) blocks, each featuring a tri-branch structure. In addition to benefiting from efficient global feature representation by Mamba block, we further innovatively introduce spatial gradient attention and spectral gradient attention to guide the reconstruction of spatial and spectral cues. GMSR-Net demonstrates a significant accuracy-efficiency trade-off, achieving state-of-the-art performance while markedly reducing the number of parameters and computational burdens. Compared to existing approaches, GMSR-Net slashes parameters and FLOPS by substantial margins of 10 times and 20 times, respectively. Code is available at https://github.com/wxy11-27/GMSR. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.01616 [pdf, other]

Generative Active Learning for the Search of Small-molecule Protein Binders

Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.19438 [pdf, other]

Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings

Authors: Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng

Abstract: Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a… ▽ More Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. This unified feature extractor efficiently aligns fMRI features with multiple levels of visual embeddings, eliminating the need for subject-specific models and allowing extraction from single-trial data. The extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development. Integrating with LLMs enhances decoding capabilities, enabling tasks such as brain captioning, complex reasoning, concept localization, and visual reconstruction. Our approach demonstrates superior performance across these tasks, precisely identifying language-based concepts within brain signals, enhancing interpretability, and providing deeper insights into neural processes. These advances significantly broaden the applicability of non-invasive brain decoding in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models. △ Less

Submitted 22 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.17883 [pdf, other]

Underwater Variable Zoom: Depth-Guided Perception Network for Underwater Image Enhancement

Authors: Zhixiong Huang, Xinying Wang, Chengpei Xu, **jiang Li, Lin Feng

Abstract: Underwater scenes intrinsically involve degradation problems owing to heterogeneous ocean elements. Prevailing underwater image enhancement (UIE) methods stick to straightforward feature modeling to learn the map** function, which leads to limited vision gain as it lacks more explicit physical cues (e.g., depth). In this work, we investigate injecting the depth prior into the deep UIE model for… ▽ More Underwater scenes intrinsically involve degradation problems owing to heterogeneous ocean elements. Prevailing underwater image enhancement (UIE) methods stick to straightforward feature modeling to learn the map** function, which leads to limited vision gain as it lacks more explicit physical cues (e.g., depth). In this work, we investigate injecting the depth prior into the deep UIE model for more precise scene enhancement capability. To this end, we present a novel depth-guided perception UIE framework, dubbed underwater variable zoom (UVZ). Specifically, UVZ resorts to a two-stage pipeline. First, a depth estimation network is designed to generate critical depth maps, combined with an auxiliary supervision network introduced to suppress estimation differences during training. Second, UVZ parses near-far scenarios by harnessing the predicted depth maps, enabling local and non-local perceiving in different regions. Extensive experiments on five benchmark datasets demonstrate that UVZ achieves superior visual gain and delivers promising quantitative metrics. Besides, UVZ is confirmed to exhibit good generalization in some visual tasks, especially in unusual lighting conditions. The code, models and results are available at: https://github.com/WindySprint/UVZ. △ Less

Submitted 10 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.15557 [pdf, other]

Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

Authors: Shili Sheng, Pian Yu, David Parker, Marta Kwiatkowska, Lu Feng

Abstract: Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that offers probabilistic safety guarantees amidst environments populated by multiple dynami… ▽ More Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that offers probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) for assessing the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.12674 [pdf, other]

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

Authors: Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens

Abstract: Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance i… ▽ More Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance in input data distribution, and the use of different communication devices and topologies (e.g., NVLink, PCIe, network cards) that connect multiple compute devices, coupled with the desire for flexible training configurations. Built on top of our prior work for single-GPU platforms, we address these challenges and enable multi-GPU performance modeling by incorporating (1) data-distribution-aware performance models for embedding table lookup, and (2) data movement prediction of communication collectives, into our upgraded performance modeling pipeline equipped with inter-and intra-rank synchronization for ML workloads trained on multi-GPU platforms. Beyond accurately predicting the per-iteration training time of DLRM models with random configurations with a geomean error of 5.21% on two multi-GPU platforms, our prediction pipeline generalizes well to other types of ML workloads, such as Transformer-based NLP models with a geomean error of 3.00%. Moreover, even without actually running ML workloads like DLRMs on the hardware, it is capable of generating insights such as quickly selecting the fastest embedding table sharding configuration (with a success rate of 85%). △ Less

Submitted 27 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: 12 pages, 11 figures, 4 tables

arXiv:2404.06349 [pdf, other]

CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models

Authors: Yu Zhou, Xingyu Wu, Beicheng Huang, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the ab… ▽ More Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the absence of a comprehensive benchmark has rendered existing evaluation studies being straightforward, undiversified, and homogeneous. To address these challenges, this paper proposes a comprehensive benchmark, namely CausalBench, to evaluate the causality understanding capabilities of LLMs. Originating from the causal research community, CausalBench encompasses three causal learning-related tasks, which facilitate a convenient comparison of LLMs' performance with classic causal learning algorithms. Meanwhile, causal networks of varying scales and densities are integrated in CausalBench, to explore the upper limits of LLMs' capabilities across task scenarios of varying difficulty. Notably, background knowledge and structured data are also incorporated into CausalBench to thoroughly unlock the underlying potential of LLMs for long-text comprehension and prior information utilization. Based on CausalBench, this paper evaluates nineteen leading LLMs and unveils insightful conclusions in diverse aspects. Firstly, we present the strengths and weaknesses of LLMs and quantitatively explore the upper limits of their capabilities across various scenarios. Meanwhile, we further discern the adaptability and abilities of LLMs to specific structural networks and complex chain of thought structures. Moreover, this paper quantitatively presents the differences across diverse information sources and uncovers the gap between LLMs' capabilities in causal understanding within textual contexts and numerical domains. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06290 [pdf, other]

Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models

Authors: Beichen Huang, Xingyu Wu, Yu Zhou, Jibin Wu, Liang Feng, Ran Cheng, Kay Chen Tan

Abstract: Large language models (LLMs) have gained widespread popularity and demonstrated exceptional performance not only in natural language processing (NLP) tasks but also in non-linguistic domains. Their potential as artificial general intelligence extends beyond NLP, showcasing promising capabilities in diverse optimization scenarios. Despite this rising trend, whether the integration of LLMs into thes… ▽ More Large language models (LLMs) have gained widespread popularity and demonstrated exceptional performance not only in natural language processing (NLP) tasks but also in non-linguistic domains. Their potential as artificial general intelligence extends beyond NLP, showcasing promising capabilities in diverse optimization scenarios. Despite this rising trend, whether the integration of LLMs into these black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors to tackle this issue by offering deeper insights into the potential of LLMs in optimization tasks through a comprehensive investigation. Our approach involves a comprehensive evaluation, covering both discrete and continuous optimization problems, aiming to assess the efficacy and distinctive characteristics that LLMs bring to the realm of optimization. Our findings reveal both the limitations and advantages of LLMs in optimization. On one hand, despite consuming the significant power required to run the model, LLMs exhibit subpar performance and lack desirable properties in pure numerical tasks, primarily due to a mismatch between the problem domain and their processing capabilities. On the other hand, although LLMs may not be ideal for traditional numerical optimization, their potential in broader optimization contexts remains promising. LLMs exhibit the ability to solve problems in non-numerical domains and can leverage heuristics from the prompt to enhance their performance. To the best of our knowledge, this work presents the first systematic evaluation of LLMs for numerical optimization, offering a progressive, wide-coverage, and behavioral analysis. Our findings pave the way for a deeper understanding of LLMs' role in optimization and guide future application in diverse scenarios for LLMs. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.20213 [pdf, other]

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

Authors: Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

Abstract: The generic large Vision-Language Models (VLMs) is rapidly develo**, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack… ▽ More The generic large Vision-Language Models (VLMs) is rapidly develo**, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack of large-scale, high-quality RS vision-language datasets. We constructed HqDC-1.4M, the large scale High quality and Detailed Captions for RS images, containing 1.4 million image-caption pairs, which not only enhance the RSVLM's understanding of RS images but also significantly improve the model's spatial perception abilities, such as localization and counting, thereby increasing the helpfulness of the RSVLM. Moreover, to address the inevitable "hallucination" problem in RSVLM, we developed RSSA, the first dataset aimed at enhancing the Self-Awareness capability of RSVLMs. By incorporating a variety of unanswerable questions into typical RS visual question-answering tasks, RSSA effectively improves the truthfulness and reduces the hallucinations of the model's outputs, thereby enhancing the honesty of the RSVLM. Based on these datasets, we proposed the H2RSVLM, the Helpful and Honest Remote Sensing Vision Language Model. H2RSVLM has achieved outstanding performance on multiple RS public datasets and is capable of recognizing and refusing to answer the unanswerable questions, effectively mitigating the incorrect generations. We will release the code, data and model weights at https://github.com/opendatalab/H2RSVLM . △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Equal contribution: Chao Pang, Jiang Wu; Corresponding author: Gui-Song Xia, Conghui He

arXiv:2403.18379 [pdf, other]

IIP-Mixer:Intra-Inter Patch Mixing Architecture for Battery Remaining Useful Life Prediction

Authors: Guangzai Ye, Li Feng, Jianlan Guo, Yuqiang Chen

Abstract: Accurately estimating the Remaining Useful Life (RUL) of lithium-ion batteries is crucial for maintaining the safe and stable operation of rechargeable battery management systems. However, this task is often challenging due to the complex temporal dynamics involved. Recently, attention-based networks, such as Transformers and Informer, have been the popular architecture in time series forecasting.… ▽ More Accurately estimating the Remaining Useful Life (RUL) of lithium-ion batteries is crucial for maintaining the safe and stable operation of rechargeable battery management systems. However, this task is often challenging due to the complex temporal dynamics involved. Recently, attention-based networks, such as Transformers and Informer, have been the popular architecture in time series forecasting. Despite their effectiveness, these models with abundant parameters necessitate substantial training time to unravel temporal patterns. To tackle these challenges, we propose a simple MLP-Mixer-based architecture named 'Intra-Inter Patch Mixer' (IIP-Mixer), which is an architecture based exclusively on multi-layer perceptrons (MLPs), extracting information by mixing operations along both intra-patch and inter-patch dimensions for battery RUL prediction. The proposed IIP-Mixer comprises parallel dual-head mixer layers: the intra-patch mixing MLP, capturing local temporal patterns in the short-term period, and the inter-patch mixing MLP, capturing global temporal patterns in the long-term period. Notably, to address the varying importance of features in RUL prediction, we introduce a weighted loss function in the MLP-Mixer-based architecture, marking the first time such an approach has been employed. Our experiments demonstrate that IIP-Mixer achieves competitive performance in battery RUL prediction, outperforming other popular time-series frameworks △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.15098 [pdf, other]

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Authors: Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord, Alexandre Alahi

Abstract: Vehicle trajectory prediction has increasingly relied on data-driven solutions, but their ability to scale to different data domains and the impact of larger dataset sizes on their generalization remain under-explored. While these questions can be studied by employing multiple datasets, it is challenging due to several discrepancies, e.g., in data formats, map resolution, and semantic annotation t… ▽ More Vehicle trajectory prediction has increasingly relied on data-driven solutions, but their ability to scale to different data domains and the impact of larger dataset sizes on their generalization remain under-explored. While these questions can be studied by employing multiple datasets, it is challenging due to several discrepancies, e.g., in data formats, map resolution, and semantic annotation types. To address these challenges, we introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria, presenting new opportunities for the vehicle trajectory prediction field. In particular, using UniTraj, we conduct extensive experiments and find that model performance significantly drops when transferred to other datasets. However, enlarging data size and diversity can substantially improve performance, leading to a new state-of-the-art result for the nuScenes dataset. We provide insights into dataset characteristics to explain these findings. The code can be found here: https://github.com/vita-epfl/UniTraj △ Less

Submitted 27 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.14983 [pdf, other]

Reconstructing the evolution history of networked complex systems

Authors: Junya Wang, Yi-Jiao Zhang, Cong Xu, Jiaze Li, Jiachen Sun, Jiarong Xie, Ling Feng, Tianshou Zhou, Yanqing Hu

Abstract: The evolution processes of complex systems carry key information in the systems' functional properties. Applying machine learning algorithms, we demonstrate that the historical formation process of various networked complex systems can be extracted, including protein-protein interaction, ecology, and social network systems. The recovered evolution process has demonstrations of immense scientific v… ▽ More The evolution processes of complex systems carry key information in the systems' functional properties. Applying machine learning algorithms, we demonstrate that the historical formation process of various networked complex systems can be extracted, including protein-protein interaction, ecology, and social network systems. The recovered evolution process has demonstrations of immense scientific values, such as interpreting the evolution of protein-protein interaction network, facilitating structure prediction, and particularly revealing the key co-evolution features of network structures such as preferential attachment, community structure, local clustering, degree-degree correlation that could not be explained collectively by previous theories. Intriguingly, we discover that for large networks, if the performance of the machine learning model is slightly better than a random guess on the pairwise order of links, reliable restoration of the overall network formation process can be achieved. This suggests that evolution history restoration is generally highly feasible on empirical networks. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.05785 [pdf, other]

doi 10.1145/3555758

Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring

Authors: Can Liu, Siying Hu, Li Feng, Mingming Fan

Abstract: Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including… ▽ More Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including composition, reviewing and editing. We conducted an experiment in which ten pairs of participants took on the roles of authors and typists to work on a text authoring task. By analysing the natural language patterns of both authors and typists, we identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Journal ref: Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 338 (November 2022), 33 pages

arXiv:2403.05102 [pdf, other]

Enhancing Texture Generation with High-Fidelity Using Advanced Texture Priors

Authors: Kuo Xu, Maoyu Wang, Muyu Wang, Lincong Feng, Tianhui Zhang, Xiaoli Liu

Abstract: The recent advancements in 2D generation technology have sparked a widespread discussion on using 2D priors for 3D shape and texture content generation. However, these methods often overlook the subsequent user operations, such as texture aliasing and blurring that occur when the user acquires the 3D model and simplifies its structure. Traditional graphics methods partially alleviate this issue, b… ▽ More The recent advancements in 2D generation technology have sparked a widespread discussion on using 2D priors for 3D shape and texture content generation. However, these methods often overlook the subsequent user operations, such as texture aliasing and blurring that occur when the user acquires the 3D model and simplifies its structure. Traditional graphics methods partially alleviate this issue, but recent texture synthesis technologies fail to ensure consistency with the original model's appearance and cannot achieve high-fidelity restoration. Moreover, background noise frequently arises in high-resolution texture synthesis, limiting the practical application of these generation technologies.In this work, we propose a high-resolution and high-fidelity texture restoration technique that uses the rough texture as the initial input to enhance the consistency between the synthetic texture and the initial texture, thereby overcoming the issues of aliasing and blurring caused by the user's structure simplification operations. Additionally, we introduce a background noise smoothing technique based on a self-supervised scheme to address the noise problem in current high-resolution texture synthesis schemes. Our approach enables high-resolution texture synthesis, paving the way for high-definition and high-detail texture synthesis technology. Experiments demonstrate that our scheme outperforms currently known schemes in high-fidelity texture recovery under high-resolution conditions. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.02871 [pdf, other]

Quantum Mixed-State Self-Attention Network

Authors: Fu Chen, Qinglin Zhao, Li Feng, Chuangtao Chen, Yangbin Lin, Jianhong Lin

Abstract: The rapid advancement of quantum computing has increasingly highlighted its potential in the realm of machine learning, particularly in the context of natural language processing (NLP) tasks. Quantum machine learning (QML) leverages the unique capabilities of quantum computing to offer novel perspectives and methodologies for complex data processing and pattern recognition challenges. This paper i… ▽ More The rapid advancement of quantum computing has increasingly highlighted its potential in the realm of machine learning, particularly in the context of natural language processing (NLP) tasks. Quantum machine learning (QML) leverages the unique capabilities of quantum computing to offer novel perspectives and methodologies for complex data processing and pattern recognition challenges. This paper introduces a novel Quantum Mixed-State Attention Network (QMSAN), which integrates the principles of quantum computing with classical machine learning algorithms, especially self-attention networks, to enhance the efficiency and effectiveness in handling NLP tasks. QMSAN model employs a quantum attention mechanism based on mixed states, enabling efficient direct estimation of similarity between queries and keys within the quantum domain, leading to more effective attention weight acquisition. Additionally, we propose an innovative quantum positional encoding scheme, implemented through fixed quantum gates within the quantum circuit, to enhance the model's accuracy. Experimental validation on various datasets demonstrates that QMSAN model outperforms existing quantum and classical models in text classification, achieving significant performance improvements. QMSAN model not only significantly reduces the number of parameters but also exceeds classical self-attention networks in performance, showcasing its strong capability in data representation and information extraction. Furthermore, our study investigates the model's robustness in different quantum noise environments, showing that QMSAN possesses commendable robustness to low noise. △ Less

Submitted 8 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01757 [pdf, other]

How Multimodal Integration Boost the Performance of LLM for Optimization: Case Study on Capacitated Vehicle Routing Problems

Authors: Yuxiao Huang, Wenjie Zhang, Liang Feng, Xingyu Wu, Kay Chen Tan

Abstract: Recently, large language models (LLMs) have notably positioned them as capable tools for addressing complex optimization challenges. Despite this recognition, a predominant limitation of existing LLM-based optimization methods is their struggle to capture the relationships among decision variables when relying exclusively on numerical text prompts, especially in high-dimensional problems. Kee**… ▽ More Recently, large language models (LLMs) have notably positioned them as capable tools for addressing complex optimization challenges. Despite this recognition, a predominant limitation of existing LLM-based optimization methods is their struggle to capture the relationships among decision variables when relying exclusively on numerical text prompts, especially in high-dimensional problems. Kee** this in mind, we first propose to enhance the optimization performance using multimodal LLM capable of processing both textual and visual prompts for deeper insights of the processed optimization problem. This integration allows for a more comprehensive understanding of optimization problems, akin to human cognitive processes. We have developed a multimodal LLM-based optimization framework that simulates human problem-solving workflows, thereby offering a more nuanced and effective analysis. The efficacy of this method is evaluated through extensive empirical studies focused on a well-known combinatorial optimization problem, i.e., capacitated vehicle routing problem. The results are compared against those obtained from the LLM-based optimization algorithms that rely solely on textual prompts, demonstrating the significant advantages of our multimodal approach. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 8pages,3 figures, 2 tables

arXiv:2402.05453 [pdf, other]

Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss

Authors: Zhenlong Liu, Lei Feng, Hui** Zhuang, Xiaofeng Cao, Hongxin Wei

Abstract: Machine learning models are susceptible to membership inference attacks (MIAs), which aim to infer whether a sample is in the training set. Existing work utilizes gradient ascent to enlarge the loss variance of training data, alleviating the privacy risk. However, optimizing toward a reverse direction may cause the model parameters to oscillate near local minima, leading to instability and subopti… ▽ More Machine learning models are susceptible to membership inference attacks (MIAs), which aim to infer whether a sample is in the training set. Existing work utilizes gradient ascent to enlarge the loss variance of training data, alleviating the privacy risk. However, optimizing toward a reverse direction may cause the model parameters to oscillate near local minima, leading to instability and suboptimal performance. In this work, we propose a novel method -- Convex-Concave Loss, which enables a high variance of training loss distribution by gradient descent. Our method is motivated by the theoretical analysis that convex losses tend to decrease the loss variance during training. Thus, our key idea behind CCL is to reduce the convexity of loss functions with a concave term. Trained with CCL, neural networks produce losses with high variance for training data, reinforcing the defense against MIAs. Extensive experiments demonstrate the superiority of CCL, achieving state-of-the-art balance in the privacy-utility trade-off. △ Less

Submitted 18 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted by ICML 2024

arXiv:2402.04344 [pdf, other]

Does Confidence Calibration Help Conformal Prediction?

Authors: Huajun Xi, Jianguo Huang, Lei Feng, Hongxin Wei

Abstract: Conformal prediction, as an emerging uncertainty qualification technique, constructs prediction sets that are guaranteed to contain the true label with high probability. Previous works usually employ temperature scaling to calibrate the classifier, assuming that confidence calibration can benefit conformal prediction. In this work, we first show that post-hoc calibration methods surprisingly lead… ▽ More Conformal prediction, as an emerging uncertainty qualification technique, constructs prediction sets that are guaranteed to contain the true label with high probability. Previous works usually employ temperature scaling to calibrate the classifier, assuming that confidence calibration can benefit conformal prediction. In this work, we first show that post-hoc calibration methods surprisingly lead to larger prediction sets with improved calibration, while over-confidence with small temperatures benefits the conformal prediction performance instead. Theoretically, we prove that high confidence reduces the probability of appending a new class in the prediction set. Inspired by the analysis, we propose a novel method, $\textbf{Conformal Temperature Scaling}$ (ConfTS), which rectifies the objective through the gap between the threshold and the non-conformity score of the ground-truth label. In this way, the new objective of ConfTS will optimize the temperature value toward an optimal set that satisfies the $\textit{marginal coverage}$. Experiments demonstrate that our method can effectively improve widely-used conformal prediction methods. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01922 [pdf, other]

A General Framework for Learning from Weak Supervision

Authors: Hao Chen, **dong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj

Abstract: Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulati… ▽ More Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field. △ Less

Submitted 5 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 24 pages, 20 tables, 9 figures

arXiv:2402.01619 [pdf, other]

KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases

Authors: Jiajie Zhang, Shulin Cao, Linmei Hu, Ling Feng, Lei Hou, Juanzi Li

Abstract: Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, w… ▽ More Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. Firstly, KB-Plugin adopts self-supervised learning to encode the detailed schema information of a given KB into a pluggable module, namely schema plugin. Secondly, KB-Plugin utilizes abundant annotated data from a rich-resourced KB to train another pluggable module, namely PI plugin, which can help the LLM extract question-relevant schema information from the schema plugin of any KB and utilize this information to induce programs over this KB. Experiments on five heterogeneous KBQA datasets show that KB-Plugin achieves better or comparable performance with 25$\times$ smaller backbone LLM compared to SoTA PI methods for low-resourced KBs, and even approaches the performance of supervised methods. Our code and data are available at https://github.com/THU-KEG/KB-Plugin. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.13360 [pdf, other]

Debiased Sample Selection for Combating Noisy Labels

Authors: Qi Wei, Lei Feng, Haobo Wang, Bo An

Abstract: Learning with noisy labels aims to ensure model generalization given a label-corrupted training set. The sample selection strategy achieves promising performance by selecting a label-reliable subset for model training. In this paper, we empirically reveal that existing sample selection methods suffer from both data and training bias that are represented as imbalanced selected sets and accumulation… ▽ More Learning with noisy labels aims to ensure model generalization given a label-corrupted training set. The sample selection strategy achieves promising performance by selecting a label-reliable subset for model training. In this paper, we empirically reveal that existing sample selection methods suffer from both data and training bias that are represented as imbalanced selected sets and accumulation errors in practice, respectively. However, only the training bias was handled in previous studies. To address this limitation, we propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection. Specifically, to mitigate the training bias, we design a robust network architecture that integrates with multiple experts. Compared with the prevailing double-branch network, our network exhibits better performance of selection and prediction by ensembling these experts while training with fewer parameters. Meanwhile, to mitigate the data bias, we propose a mixed sampling strategy based on two weight-based data samplers. By training on the mixture of two class-discriminative mini-batches, the model mitigates the effect of the imbalanced training set while avoiding sparse representations that are easily caused by sampling strategies. Extensive experiments and analyses demonstrate the effectiveness of ITEM. Our code is available at this url \href{https://github.com/1998v7/ITEM}{ITEM}. △ Less

Submitted 24 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.10034 [pdf, other]

Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap

Authors: Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can… ▽ More Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of LLMs, this paper provides a foundational step** stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence. We have created a GitHub repository to index the relevant papers: https://github.com/wuxingyu-ai/LLM4EC. △ Less

Submitted 29 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: evolutionary algorithm (EA), large language model (LLM), optimization problem, prompt engineering, algorithm generation, neural architecture search

arXiv:2401.08013 [pdf, other]

A Day-to-Day Dynamical Approach to the Most Likely User Equilibrium Problem

Authors: Jiayang Li, Qianni Wang, Liyang Feng, Jun Xie, Yu Marco Nie

Abstract: The lack of a unique user equilibrium (UE) route flow in traffic assignment has posed a significant challenge to many transportation applications. The maximum-entropy principle, which advocates for the consistent selection of the most likely solution as a representative, is often used to address the challenge. Built on a recently proposed day-to-day (DTD) discrete-time dynamical model called cumul… ▽ More The lack of a unique user equilibrium (UE) route flow in traffic assignment has posed a significant challenge to many transportation applications. The maximum-entropy principle, which advocates for the consistent selection of the most likely solution as a representative, is often used to address the challenge. Built on a recently proposed day-to-day (DTD) discrete-time dynamical model called cumulative logit (CULO), this study provides a new behavioral underpinning for the maximum-entropy UE (MEUE) route flow. It has been proven that CULO can reach a UE state without presuming travelers are perfectly rational. Here, we further establish that CULO always converges to the MEUE route flow if (i) travelers have zero prior information about routes and thus are forced to give all routes an equal choice probability, or (ii) all travelers gather information from the same source such that the so-called general proportionality condition is satisfied. Thus, CULO may be used as a practical solution algorithm for the MEUE problem. To put this idea into practice, we propose to eliminate the route enumeration requirement of the original CULO model through an iterative route discovery scheme. We also examine the discrete-time versions of four popular continuous-time dynamical models and compare them to CULO. The analysis shows that the replicator dynamic is the only one that has the potential to reach the MEUE solution with some regularity. The analytical results are confirmed through numerical experiments. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.03426 [pdf, other]

On Leveraging Large Language Models for Enhancing Entity Resolution

Authors: Huahang Li, Longyu Feng, Shuangyin Li, Fei Hao, Chen Jason Zhang, Yuanfeng Song, Lei Chen

Abstract: Entity resolution, the task of identifying and consolidating records that pertain to the same real-world entity, plays a pivotal role in various sectors such as e-commerce, healthcare, and law enforcement. The emergence of Large Language Models (LLMs) like GPT-4 has introduced a new dimension to this task, leveraging their advanced linguistic capabilities. This paper explores the potential of LLMs… ▽ More Entity resolution, the task of identifying and consolidating records that pertain to the same real-world entity, plays a pivotal role in various sectors such as e-commerce, healthcare, and law enforcement. The emergence of Large Language Models (LLMs) like GPT-4 has introduced a new dimension to this task, leveraging their advanced linguistic capabilities. This paper explores the potential of LLMs in the entity resolution process, shedding light on both their advantages and the computational complexities associated with large-scale matching. We introduce strategies for the efficient utilization of LLMs, including the selection of an optimal set of matching questions, namely MQsSP, which is proved to be a NP-hard problem. Our approach optimally chooses the most effective matching questions while keep consumption limited to your budget . Additionally, we propose a method to adjust the distribution of possible partitions after receiving responses from LLMs, with the goal of reducing the uncertainty of entity resolution. We evaluate the effectiveness of our approach using entropy as a metric, and our experimental results demonstrate the efficiency and effectiveness of our proposed methods, offering promising prospects for real-world applications. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 12 pages,6 figures, ICDE 2024

arXiv:2401.01563 [pdf, other]

Towards Multi-Objective High-Dimensional Feature Selection via Evolutionary Multitasking

Authors: Yinglan Feng, Liang Feng, Songbai Liu, Sam Kwong, Kay Chen Tan

Abstract: Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit… ▽ More Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit transfer mechanisms through sole solution encodings, and employing single-objective transformation, which result in inadequate knowledge acquisition, exploitation, and transfer. To this end, this paper develops a novel EMT framework for multiobjective high-dimensional feature selection problems, namely MO-FSEMT. In particular, multiple auxiliary tasks are constructed by distinct formulation methods to provide diverse search spaces and information representations and then simultaneously addressed with the original task through a multi-slover-based multitask optimization scheme. Each task has an independent population with task-specific representations and is solved using separate evolutionary solvers with different biases and search preferences. A task-specific knowledge transfer mechanism is designed to leverage the advantage information of each task, enabling the discovery and effective transmission of high-quality solutions during the search process. Comprehensive experimental results demonstrate that our MO-FSEMT framework can achieve overall superior performance compared to the state-of-the-art FS methods on 26 datasets. Moreover, the ablation studies verify the contributions of different components of the proposed MO-FSEMT. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00426 [pdf, other]

keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM

Authors: Chaojie Wang, Yishi Xu, Zhong Peng, Chenxi Zhang, Bo Chen, Xinrun Wang, Lei Feng, Bo An

Abstract: Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering. However, in the face of problems beyond the scope of knowledge, these LLMs tend to talk nonsense with a straight face, where the potential solution could be incorporating an Information Retrieval (IR) module and generating response based on these… ▽ More Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering. However, in the face of problems beyond the scope of knowledge, these LLMs tend to talk nonsense with a straight face, where the potential solution could be incorporating an Information Retrieval (IR) module and generating response based on these retrieved knowledge. In this paper, we present a novel framework to assist LLMs, such as ChatGPT, to retrieve question-related structured information on the knowledge graph, and demonstrate that Knowledge-based question answering (Keqing) could be a nature Chain-of-Thought (CoT) mentor to guide the LLM to sequentially find the answer entities of a complex question through interpretable logical chains. Specifically, the workflow of Keqing will execute decomposing a complex question according to predefined templates, retrieving candidate entities on knowledge graph, reasoning answers of sub-questions, and finally generating response with reasoning paths, which greatly improves the reliability of LLM's response. The experimental results on KBQA datasets show that Keqing can achieve competitive performance and illustrate the logic of answering each question. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 12 pages, 6 figures

arXiv:2311.13811 [pdf]

Education distillation:getting student models to learn in shcools

Authors: Ling Feng, Danyang Li, Tianhao Wu, Xuliang Duan

Abstract: Knowledge distillation is one of the methods for model compression, and existing knowledge distillation techniques focus on how to improve the distillation algorithm so as to enhance the distillation efficiency. This paper introduces dynamic incremental learning into knowledge distillation and proposes a distillation strategy for education distillation. Specifically, it is proposed to take fragmen… ▽ More Knowledge distillation is one of the methods for model compression, and existing knowledge distillation techniques focus on how to improve the distillation algorithm so as to enhance the distillation efficiency. This paper introduces dynamic incremental learning into knowledge distillation and proposes a distillation strategy for education distillation. Specifically, it is proposed to take fragmented student models divided from the complete student model as lower-grade models. As the grade level rises, fragmented student models deepen in conjunction with designed teaching reference layers, while learning and distilling from more teacher models. By moving from lower to higher grades, fragmented student models were gradually integrated into a complete target student model, and the performance of the student models gradually improved from lower to higher grades of the stage. Education distillation strategies combined with distillation algorithms outperform the results of single distillation algorithms on the public dataset CIFAR100,Caltech256, Food-101 dataset. △ Less

Submitted 26 November, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.10123 [pdf, other]

MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

Authors: Lincong Feng, Muyu Wang, Maoyu Wang, Kuo Xu, Xiaoli Liu

Abstract: Generative models for 3D object synthesis have seen significant advancements with the incorporation of prior knowledge distilled from 2D diffusion models. Nevertheless, challenges persist in the form of multi-view geometric inconsistencies and slow generation speeds within the existing 3D synthesis frameworks. This can be attributed to two factors: firstly, the deficiency of abundant geometric a p… ▽ More Generative models for 3D object synthesis have seen significant advancements with the incorporation of prior knowledge distilled from 2D diffusion models. Nevertheless, challenges persist in the form of multi-view geometric inconsistencies and slow generation speeds within the existing 3D synthesis frameworks. This can be attributed to two factors: firstly, the deficiency of abundant geometric a priori knowledge in optimization, and secondly, the entanglement issue between geometry and texture in conventional 3D generation methods.In response, we introduce MetaDreammer, a two-stage optimization approach that leverages rich 2D and 3D prior knowledge. In the first stage, our emphasis is on optimizing the geometric representation to ensure multi-view consistency and accuracy of 3D objects. In the second stage, we concentrate on fine-tuning the geometry and optimizing the texture, thereby achieving a more refined 3D object. Through leveraging 2D and 3D prior knowledge in two stages, respectively, we effectively mitigate the interdependence between geometry and texture. MetaDreamer establishes clear optimization objectives for each stage, resulting in significant time savings in the 3D generation process. Ultimately, MetaDreamer can generate high-quality 3D objects based on textual prompts within 20 minutes, and to the best of our knowledge, it is the most efficient text-to-3D generation method. Furthermore, we introduce image control into the process, enhancing the controllability of 3D generation. Extensive empirical evidence confirms that our method is not only highly efficient but also achieves a quality level that is at the forefront of current state-of-the-art 3D generation techniques. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2306.17843, arXiv:2209.14988 by other authors

Showing 1–50 of 265 results for author: Feng, L