Skip to main content

Showing 1–50 of 182 results for author: Du, W

Searching in archive cs. Search in all archives.
.
  1. MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer Recharge

    Authors: Yuning Chen, Kang Yang, Zhiyu An, Brady Holder, Luke Paloutzian, Khaled Bali, Wan Du

    Abstract: The rapid decline in groundwater around the world poses a significant challenge to sustainable agriculture. To address this issue, agricultural managed aquifer recharge (Ag-MAR) is proposed to recharge the aquifer by artificially flooding agricultural lands using surface water. Ag-MAR requires a carefully selected flooding schedule to avoid affecting the oxygen absorption of crop roots. However, c… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  2. arXiv:2406.17245  [pdf, other

    cs.LG cs.AI cs.CL

    Unlocking Continual Learning Abilities in Language Models

    Authors: Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun Cheung, Reynold Cheng, Jie Fu

    Abstract: Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task informa… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: preprint, 19 pages

  3. arXiv:2406.16571  [pdf, other

    math.OC cs.AI cs.LG eess.SY

    Differentiable Distributionally Robust Optimization Layers

    Authors: Xutao Ma, Chao Ning, Wenli Du

    Abstract: In recent years, there has been a growing research interest in decision-focused learning, which embeds optimization problems as a layer in learning pipelines and demonstrates a superior performance than the prediction-focused approach. However, for distributionally robust optimization (DRO), a popular paradigm for decision-making under uncertainty, it is still unknown how to embed it as a layer, i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: In Forty-first International Conference on Machine Learning (2024)

  4. arXiv:2406.12747  [pdf, other

    cs.LG cs.AI

    TSI-Bench: Benchmarking Time Series Imputation

    Authors: Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Fanxing Liu, Zepu Wang, Zina Ibrahim, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen

    Abstract: Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellen… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.11906  [pdf, other

    q-bio.QM cs.AI

    NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

    Authors: **gbo Zhou, Shaorong Chen, Jun Xia, Sizhe Liu, Tianze Ling, Wenjie Du, Yue Liu, Jianwei Yin, Stan Z. Li

    Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this im… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  6. arXiv:2406.11231  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    Enabling robots to follow abstract instructions and complete complex dynamic tasks

    Authors: Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Chris Lucas

    Abstract: Completing complex tasks in unpredictable settings like home kitchens challenges robotic systems. These challenges include interpreting high-level human commands, such as "make me a hot beverage" and performing actions like pouring a precise amount of water into a moving mug. To address these challenges, we present a novel framework that combines Large Language Models (LLMs), a curated Knowledge B… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.06652  [pdf, other

    cs.LG cs.AI

    Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

    Authors: Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou

    Abstract: Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures, and 6 tables

  8. arXiv:2405.17508  [pdf, other

    cs.LG stat.ML

    Unveiling the Secrets: How Masking Strategies Shape Time Series Imputation

    Authors: Linglong Qian, Zina Ibrahim, Wenjie Du, Yiyuan Yang, Richard JB Dobson

    Abstract: In this study, we explore the impact of different masking strategies on time series imputation models. We evaluate the effects of pre-masking versus in-mini-batch masking, normalization timing, and the choice between augmenting and overlaying artificial missingness. Using three diverse datasets, we benchmark eleven imputation models with different missing rates. Our results demonstrate that maskin… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  9. arXiv:2405.15319  [pdf, other

    cs.CL cs.AI

    Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

    Authors: Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu

    Abstract: LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehen… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Preprint; The project link: $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$

  10. arXiv:2405.13401  [pdf, ps, other

    cs.CR cs.CL

    TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

    Authors: Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, ** Yi, Zhuosheng Zhang, Gongshen Liu

    Abstract: Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 4 tables

  11. arXiv:2404.10515  [pdf, other

    cs.NE

    An Enhanced Differential Grou** Method for Large-Scale Overlap** Problems

    Authors: Maojiang Tian, Mingke Chen, Wei Du, Yang Tang, Yaochu **

    Abstract: Large-scale overlap** problems are prevalent in practical engineering applications, and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based cooperative coevolution (CC) algorithms have demonstrated promising performance in addressing large-scale overlap** problems. However, current CC frameworks designed for overlap** problems r… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  12. arXiv:2403.15393  [pdf, other

    cs.CL cs.LG cs.SI

    Detection of Opioid Users from Reddit Posts via an Attention-based Bidirectional Recurrent Neural Network

    Authors: Yuchen Wang, Zhengyu Fang, Wei Du, Shuai Xu, Rong Xu, **g Li

    Abstract: The opioid epidemic, referring to the growing hospitalizations and deaths because of overdose of opioid usage and addiction, has become a severe health problem in the United States. Many strategies have been developed by the federal and local governments and health communities to combat this crisis. Among them, improving our understanding of the epidemic through better health surveillance is one o… ▽ More

    Submitted 9 February, 2024; originally announced March 2024.

  13. arXiv:2403.07013  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information

    Authors: Jun Xia, Shaorong Chen, **gbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, Stan Z. Li

    Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with… ▽ More

    Submitted 15 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  14. arXiv:2403.03425  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

    Authors: Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

    Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  15. arXiv:2403.01192  [pdf, other

    math.OC cs.LG cs.NE

    A Composite Decomposition Method for Large-Scale Global Optimization

    Authors: Maojiang Tian, Minyang Chen, Wei Du, Yang Tang, Yaochu **, Gary G. Yen

    Abstract: Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grou** stage significantly impact the performance of the optimization process. While the general separability grou** (GSG) method has overcome the limitation of previous differ… ▽ More

    Submitted 8 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  16. arXiv:2403.00172  [pdf, other

    eess.SY cs.AI cs.LG

    Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control

    Authors: Zhiyu An, Xianzhong Ding, Wan Du

    Abstract: Recent research has shown the potential of Model-based Reinforcement Learning (MBRL) to enhance energy efficiency of Heating, Ventilation, and Air Conditioning (HVAC) systems. However, existing methods rely on black-box thermal dynamics models and stochastic optimizers, lacking reliability guarantees and posing risks to occupant health. In this work, we overcome the reliability bottleneck by redes… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted for the 61st Design Automation Conference (DAC)

  17. arXiv:2402.18945  [pdf, other

    cs.CR cs.AI cs.CL

    SynGhost: Imperceptible and Universal Task-agnostic Backdoor Attack in Pre-trained Language Models

    Authors: Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Gongshen Liu

    Abstract: Pre-training has been a necessary phase for deploying pre-trained language models (PLMs) to achieve remarkable performance in downstream tasks. However, we empirically show that backdoor attacks exploit such a phase as a vulnerable entry point for task-agnostic. In this paper, we first propose $\mathtt{maxEntropy}$, an entropy-based poisoning filtering defense, to prove that existing task-agnostic… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 18 pages, 19 figures, 13 tables

  18. arXiv:2402.16918  [pdf, other

    cs.LG cs.CV

    m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

    Authors: Ka Man Lo, Yiming Liang, Wenyu Du, Yuantao Fan, Zili Wang, Wenhao Huang, Lei Ma, Jie Fu

    Abstract: Modular neural architectures are gaining attention for their powerful generalization and efficient adaptation to new domains. However, training these models poses challenges due to optimization difficulties arising from intrinsic sparse connectivity. Leveraging knowledge from monolithic models through techniques like knowledge distillation can facilitate training and enable integration of diverse… ▽ More

    Submitted 21 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  19. arXiv:2402.16061  [pdf, other

    cs.CL

    How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study

    Authors: Tianjie Ju, Weiwei Sun, Wei Du, Xinwei Yuan, Zhaochun Ren, Gongshen Liu

    Abstract: Previous work has showcased the intriguing capability of large language models (LLMs) in retrieving facts and processing context knowledge. However, only limited research exists on the layer-wise capability of LLMs to encode knowledge, which challenges our understanding of their internal mechanisms. In this paper, we devote the first attempt to investigate the layer-wise capability of LLMs through… ▽ More

    Submitted 4 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted at LREC-COLING 2024 (Long Paper)

  20. arXiv:2402.14600  [pdf, other

    cs.AI

    Diffusion Model-Based Multiobjective Optimization for Gasoline Blending Scheduling

    Authors: Wenxuan Fang, Wei Du, Renchu He, Yang Tang, Yaochu **, Gary G. Yen

    Abstract: Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusio… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  21. arXiv:2402.13419  [pdf, ps, other

    cs.AI

    Reward Bound for Behavioral Guarantee of Model-based Planning Agents

    Authors: Zhiyu An, Xianzhong Ding, Wan Du

    Abstract: Recent years have seen an emerging interest in the trustworthiness of machine learning-based agents in the wild, especially in robotics, to provide safety assurance for the industry. Obtaining behavioral guarantees for these agents remains an important problem. In this work, we focus on guaranteeing a model-based planning agent reaches a goal state within a specific future time step. We show that… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: To be published in ICLR 24 tiny paper track

  22. arXiv:2402.12720  [pdf, other

    cs.CR cs.AI

    Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

    Authors: Fangqi Li, Haodong Zhao, Wei Du, Shilin Wang

    Abstract: To trace the copyright of deep neural networks, an owner can embed its identity information into its model as a watermark. The capacity of the watermark quantify the maximal volume of information that can be verified from the watermarked model. Current studies on capacity focus on the ownership verification accuracy under ordinary removal attacks and fail to capture the relationship between robust… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024

  23. arXiv:2402.11900  [pdf, other

    cs.CL

    Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models

    Authors: Tianjie Ju, Yi** Chen, Xinwei Yuan, Zhuosheng Zhang, Wei Du, Yubin Zheng, Gongshen Liu

    Abstract: Recent work has showcased the powerful capability of large language models (LLMs) in recalling knowledge and reasoning. However, the reliability of LLMs in combining these two capabilities into reasoning through multi-hop facts has not been widely explored. This paper systematically investigates the possibilities for LLMs to utilize shortcuts based on direct connections between the initial and ter… ▽ More

    Submitted 2 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (Long Paper. Main Conference)

  24. arXiv:2402.10760  [pdf, other

    q-fin.ST cs.LG

    RAGIC: Risk-Aware Generative Adversarial Model for Stock Interval Construction

    Authors: **gyi Gu, Wenlu Du, Guiling Wang

    Abstract: Efforts to predict stock market outcomes have yielded limited success due to the inherently stochastic nature of the market, influenced by numerous unpredictable factors. Many existing prediction approaches focus on single-point predictions, lacking the depth needed for effective decision-making and often overlooking market risk. To bridge this gap, we propose a novel model, RAGIC, which introduce… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  25. arXiv:2402.04059  [pdf, other

    cs.LG cs.AI

    Deep Learning for Multivariate Time Series Imputation: A Survey

    Authors: Jun Wang, Wenjie Du, Wei Cao, Keli Zhang, Wenjia Wang, Yuxuan Liang, Qingsong Wen

    Abstract: The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 9 pages, 1 figure, 5 tables, 58 referred papers

  26. arXiv:2402.03781  [pdf, other

    q-bio.QM cs.AI cs.LG

    MolTC: Towards Molecular Relational Modeling In Language Models

    Authors: Junfeng Fang, Shuai Zhang, Chang Wu, Zhengyi Yang, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang

    Abstract: Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  27. arXiv:2402.01204  [pdf, other

    cs.LG cs.AI

    A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

    Authors: Wei-Yao Wang, Wei-Wei Du, Derek Xu, Wei Wang, Wen-Chih Peng

    Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations f… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: The paper list can be found at https://github.com/wwweiwei/awesome-self-supervised-learning-for-tabular-data

  28. arXiv:2401.15122  [pdf, other

    cs.LG cs.AI q-bio.BM q-bio.QM stat.ML

    A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics

    Authors: Shengchao Liu, Weitao Du, Yan**g Li, Zhuoxinran Li, Vignesh Bhethanabotla, Nakul Rampal, Omar Yaghi, Christian Borgs, Anima Anandkumar, Hongyu Guo, Jennifer Chayes

    Abstract: In drug discovery, molecular dynamics (MD) simulation for protein-ligand binding provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. There has been a long history of improving the efficiency of MD simulations through better numerical methods and, more recently, by utilizing machine learning (ML) methods. Yet, challenges remain, s… ▽ More

    Submitted 1 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  29. arXiv:2401.12975  [pdf, other

    cs.CV cs.AI cs.CL

    HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

    Authors: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

    Abstract: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: ICLR 2024. The first two authors contributed equally to this work

  30. arXiv:2401.10274  [pdf, ps, other

    cs.NE cs.AI

    Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil Scheduling

    Authors: Wanting Zhang, Wei Du, Guo Yu, Renchu He, Wenli Du, Yaochu **

    Abstract: With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  31. arXiv:2401.06786  [pdf, other

    cs.DC cs.AI

    CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

    Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

    Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More

    Submitted 9 November, 2023; originally announced January 2024.

  32. arXiv:2401.01801  [pdf, other

    cs.LG cs.AI physics.comp-ph

    A quatum inspired neural network for geometric modeling

    Authors: Weitao Du, Shengchao Liu, Xuecang Zhang

    Abstract: By conceiving physical systems as 3D many-body point clouds, geometric graph neural networks (GNNs), such as SE(3)/E(3) equivalent GNNs, have showcased promising performance. In particular, their effective message-passing mechanics make them adept at modeling molecules and crystalline materials. However, current geometric GNNs only offer a mean-field approximation of the many-body system, encapsul… ▽ More

    Submitted 28 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  33. arXiv:2401.01070  [pdf, other

    cs.NE

    A Novel Dual-Stage Evolutionary Algorithm for Finding Robust Solutions

    Authors: Wei Du, Wenxuan Fang, Chen Liang, Yang Tang, Yaochu **

    Abstract: In robust optimization problems, the magnitude of perturbations is relatively small. Consequently, solutions within certain regions are less likely to represent the robust optima when perturbations are introduced. Hence, a more efficient search process would benefit from increased opportunities to explore promising regions where global optima or good local optima are situated. In this paper, we in… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  34. arXiv:2312.14033  [pdf, other

    cs.CL

    T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

    Authors: Zehui Chen, Weihua Du, Wenwei Zhang, Kuikun Liu, Jiangning Liu, Miao Zheng, **gming Zhuo, Songyang Zhang, Dahua Lin, Kai Chen, Feng Zhao

    Abstract: Large language models (LLM) have achieved remarkable performance on various NLP tasks and are augmented by tools for broader applications. Yet, how to evaluate and analyze the tool-utilization capability of LLMs is still under-explored. In contrast to previous works that evaluate models holistically, we comprehensively decompose the tool utilization into multiple sub-processes, including instructi… ▽ More

    Submitted 14 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project: https://open-compass.github.io/T-Eval

  35. arXiv:2312.10163  [pdf, other

    cs.CV cs.LG

    Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

    Authors: Xu Liu, Tong Zhou, Yuanxin Wang, Yu** Wang, Qin**gwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen

    Abstract: The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities. Mirroring the transformative impact of foundation models like large language models (LLMs) in natural language processing, visual foundation models (VFMs) have become a catalyst for groundbreaki… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  36. arXiv:2312.06684  [pdf, other

    cs.AI

    Enhanced E-Commerce Attribute Extraction: Innovating with Decorative Relation Correction and LLAMA 2.0-Based Annotation

    Authors: Jianghong Zhou, Weizhi Du, Md Omar Faruk Rokon, Zhaodong Wang, Jiaxuan Xu, Isha Shah, Kuang-chih Lee, Musen Wen

    Abstract: The rapid proliferation of e-commerce platforms accentuates the need for advanced search and retrieval systems to foster a superior user experience. Central to this endeavor is the precise extraction of product attributes from customer queries, enabling refined search, comparison, and other crucial e-commerce functionalities. Unlike traditional Named Entity Recognition (NER) tasks, e-commerce quer… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 9 pages, 5 images

  37. arXiv:2312.03475  [pdf, other

    cs.LG cs.AI q-bio.BM

    Molecule Joint Auto-Encoding: Trajectory Pretraining with 2D and 3D Diffusion

    Authors: Weitao Du, Jiujiu Chen, Xuecang Zhang, Zhiming Ma, Shengchao Liu

    Abstract: Recently, artificial intelligence for drug discovery has raised increasing interest in both machine learning and chemistry domains. The fundamental building block for drug discovery is molecule geometry and thus, the molecule's geometrical representation is the main bottleneck to better utilize machine learning techniques for drug discovery. In this work, we propose a pretraining method for molecu… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  38. arXiv:2311.17629  [pdf, other

    cs.CV

    Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

    Authors: Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wenliang Du, Rui Yao, Abdulmotaleb El Saddik

    Abstract: Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two te… ▽ More

    Submitted 1 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 13 tables

  39. arXiv:2311.12833  [pdf, other

    cs.DC cs.AI cs.CL

    HPC-GPT: Integrating Large Language Model for High-Performance Computing

    Authors: Xianzhong Ding, Le Chen, Murali Emani, Chunhua Liao, Pei-Hung Lin, Tristan Vanderbruggen, Zhen Xie, Alberto E. Cerpa, Wan Du

    Abstract: Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, their performance in high-performance computing (HPC) domain tasks has been less than optimal due to the specialized expertise required to interpret the model responses. In response to this challenge, we propose HPC-GPT, a novel LLaM… ▽ More

    Submitted 2 October, 2023; originally announced November 2023.

    Comments: 9 pages

  40. arXiv:2311.12264  [pdf, other

    eess.SY cs.AI cs.LG

    Resilient Control of Networked Microgrids using Vertical Federated Reinforcement Learning: Designs and Real-Time Test-Bed Validations

    Authors: Sayak Mukherjee, Ramij R. Hossain, Sheik M. Mohiuddin, Yuan Liu, Wei Du, Veronica Adetola, Rohit A. **siwale, Qiuhua Huang, Tianzhixi Yin, Ankit Singhal

    Abstract: Improving system-level resiliency of networked microgrids is an important aspect with increased population of inverter-based resources (IBRs). This paper (1) presents resilient control design in presence of adversarial cyber-events, and proposes a novel federated reinforcement learning (Fed-RL) approach to tackle (a) model complexities, unknown dynamical behaviors of IBR devices, (b) privacy issue… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 10 pages, 7 figures

  41. arXiv:2311.05945  [pdf, other

    cs.RO

    Intersection-free Robot Manipulation with Soft-Rigid Coupled Incremental Potential Contact

    Authors: Wenxin Du, Siqiong Yao, Xinlei Wang, Yuhang Xu, Wenqiang Xu, Cewu Lu

    Abstract: This paper presents a novel simulation platform, ZeMa, designed for robotic manipulation tasks concerning soft objects. Such simulation ideally requires three properties: two-way soft-rigid coupling, intersection-free guarantees, and frictional contact modeling, with acceptable runtime suitable for deep and reinforcement learning tasks. Current simulators often satisfy only a subset of these needs… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  42. arXiv:2311.05843  [pdf, other

    cs.RO

    TacIPC: Intersection- and Inversion-free FEM-based Elastomer Simulation For Optical Tactile Sensors

    Authors: Wenxin Du, Wenqiang Xu, Jieji Ren, Zhenjun Yu, Cewu Lu

    Abstract: Tactile perception stands as a critical sensory modality for human interaction with the environment. Among various tactile sensor techniques, optical sensor-based approaches have gained traction, notably for producing high-resolution tactile images. This work explores gel elastomer deformation simulation through a physics-based approach. While previous works in this direction usually adopt the exp… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  43. arXiv:2311.00953  [pdf, ps, other

    cs.CL

    Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation

    Authors: Wanyu Du, Yangfeng Ji

    Abstract: The development of trustworthy conversational information-seeking systems relies on dialogue models that can generate faithful and accurate responses based on relevant knowledge texts. However, two main challenges hinder this task. Firstly, language models may generate hallucinations due to data biases present in their pretraining corpus. Secondly, knowledge texts often contain redundant and irrel… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  44. arXiv:2310.14509  [pdf, other

    cs.LG cs.AI

    Iteratively Learn Diverse Strategies with State Distance Information

    Authors: Wei Fu, Weihua Du, **gwei Li, Sunli Chen, **gzhao Zhang, Yi Wu

    Abstract: In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  45. arXiv:2309.12941  [pdf, other

    cs.SE cs.AI

    Trusta: Reasoning about Assurance Cases with Formal Methods and Large Language Models

    Authors: Zezhong Chen, Yuxin Deng, Wenjie Du

    Abstract: Assurance cases can be used to argue for the safety of products in safety engineering. In safety-critical areas, the construction of assurance cases is indispensable. Trustworthiness Derivation Trees (TDTs) enhance assurance cases by incorporating formal methods, rendering it possible for automatic reasoning about assurance cases. We present Trustworthiness Derivation Tree Analyzer (Trusta), a des… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 38 pages

    ACM Class: D.2.1

  46. arXiv:2309.06055  [pdf, other

    cs.CR

    Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review

    Authors: Pengzhou Cheng, Zongru Wu, Wei Du, Haodong Zhao, Wei Lu, Gongshen Liu

    Abstract: Applicating third-party data and models has become a new paradigm for language modeling in NLP, which also introduces some potential security vulnerabilities because attackers can manipulate the training process and data source. In this case, backdoor attacks can induce the model to exhibit expected behaviors through specific triggers and have little inferior influence on primitive tasks. Hence, i… ▽ More

    Submitted 8 November, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 21 pages, 4 figures

  47. arXiv:2309.05619  [pdf, other

    cs.CL

    Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP

    Authors: Wei Du, Laksh Advani, Yashmeet Gambhir, Daniel J Perry, Prashant Shiralkar, Zhengzheng Xing, Aaron Colak

    Abstract: Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to validate for a real-world setting. Human labeling to assess model error requires considerable expense and time delay. Here we demonstrate that ensemb… ▽ More

    Submitted 19 November, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Camera ready version for 2023 EMNLP (The Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM))

  48. arXiv:2309.02929  [pdf

    cs.CE

    Reinforcement Learning Based Gasoline Blending Optimization: Achieving More Efficient Nonlinear Online Blending of Fuels

    Authors: Muyi Huang, Renchu He, Xin Dai, Xin Peng, Wenli Du, Feng Qian

    Abstract: The online optimization of gasoline blending benefits refinery economies. However, the nonlinear blending mechanism, the oil property fluctuations, and the blending model mismatch bring difficulties to the optimization. To solve the above issues, this paper proposes a novel online optimization method based on deep reinforcement learning algorithm (DRL). The Markov decision process (MDP) expression… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 30 pages,13 figures

  49. arXiv:2309.00855  [pdf, other

    cs.LG

    DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal

    Authors: Wei-Wei Du, Wei-Yao Wang, Wen-Chih Peng

    Abstract: The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation model… ▽ More

    Submitted 14 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: Accepted by CIKM 2023

  50. arXiv:2308.10099  [pdf, other

    cs.LG cs.SI

    Geometric instability of graph neural networks on large graphs

    Authors: Emily Morris, Haotian Shen, Weiling Du, Muhammad Hamza Sajjad, Borun Shi

    Abstract: We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to… ▽ More

    Submitted 28 November, 2023; v1 submitted 19 August, 2023; originally announced August 2023.

    Journal ref: the Second Learning on Graphs Conference (LoG 2023)