Skip to main content

Showing 1–50 of 1,799 results for author: Zhang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01300  [pdf, other

    cs.CL cs.AI cs.LG

    Collaborative Performance Prediction for Large Language Models

    Authors: Qiyuan Zhang, Fuyuan Lyu, Xue Liu, Chen Ma

    Abstract: Comprehensively understanding and accurately predicting the performance of large language models across diverse downstream tasks has emerged as a pivotal challenge in NLP research. The pioneering scaling law on downstream works demonstrated intrinsic similarities within model families and utilized such similarities for performance prediction. However, they tend to overlook the similarities between… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00935  [pdf, other

    cs.LG cs.CL

    Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining

    Authors: Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang

    Abstract: In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical co… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2407.00643  [pdf, other

    cs.NI cs.PF

    A Power-Consumption Analysis for Different IPoWDM Network Architectures with ZR/ZR+ and Long-Haul Muxponders

    Authors: Qiaolun Zhang, Annalisa Morea, Patricia Layec, Memedhe Ibrahimi, Francesco Musumeci, Massimo Tornatore

    Abstract: Operators are constantly faced with the need to increase optical-network capacity to accommodate rapid traffic growth while minimizing the cost-per-bit and power-per-bit. The drastic reduction of power consumption of IP routers and ZR/ZR+ pluggable transponders seen in the last years has renewed the interest in "opaque" optical-network architectures, where no optical bypassing is allowed. In this… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00502  [pdf, other

    cs.LG cs.AI

    Deep Frequency Derivative Learning for Non-stationary Time Series Forecasting

    Authors: Wei Fan, Kun Yi, Hangting Ye, Zhiyuan Ning, Qi Zhang, Ning An

    Abstract: While most time series are non-stationary, it is inevitable for models to face the distribution shift issue in time series forecasting. Existing solutions manipulate statistical measures (usually mean and std.) to adjust time series distribution. However, these operations can be theoretically seen as the transformation towards zero frequency component of the spectrum which cannot reveal full distr… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  5. arXiv:2407.00482  [pdf, other

    cs.LG cs.AI cs.CV cs.CY cs.IT

    Quantifying Spuriousness of Biased Datasets Using Partial Information Decomposition

    Authors: Barproda Halder, Faisal Hamman, Pasan Dissanayake, Qiuyi Zhang, Ilia Sucholutsky, Sanghamitra Dutta

    Abstract: Spurious patterns refer to a mathematical association between two or more variables in a dataset that are not causally related. However, this notion of spuriousness, which is usually introduced due to sampling biases in the dataset, has classically lacked a formal definition. To address this gap, this work presents the first information-theoretic formalization of spuriousness in a dataset (given a… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Workshop on Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models

  6. arXiv:2407.00379  [pdf, other

    cs.AI cs.CL

    GraphArena: Benchmarking Large Language Models on Graph Computational Problems

    Authors: Jianheng Tang, Qifan Zhang, Yuhan Li, Jia Li

    Abstract: The "arms race" of Large Language Models (LLMs) demands novel, challenging, and diverse benchmarks to faithfully examine their progresses. We introduce GraphArena, a benchmarking tool designed to evaluate LLMs on graph computational problems using million-scale real-world graphs from diverse scenarios such as knowledge graphs, social networks, and molecular structures. GraphArena offers a suite of… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  7. arXiv:2407.00042  [pdf

    q-bio.NC cs.SI eess.SY

    Module control of network analysis in psychopathology

    Authors: Chunyu Pan, Quan Zhang, Yue Zhu, Shengzhou Kong, Juan Liu, Changsheng Zhang, Fei Wang, Xizhe Zhang

    Abstract: The network approach to characterizing psychopathology departs from traditional latent categorical and dimensional approaches. Causal interplay among symptoms contributed to dynamic psychopathology system. Therefore, analyzing the symptom clusters is critical for understanding mental disorders. Furthermore, despite extensive research studying the topological features of symptom networks, the contr… ▽ More

    Submitted 30 May, 2024; originally announced July 2024.

  8. arXiv:2407.00028  [pdf, other

    q-bio.NC cs.LG stat.AP

    Harnessing XGBoost for Robust Biomarker Selection of Obsessive-Compulsive Disorder (OCD) from Adolescent Brain Cognitive Development (ABCD) data

    Authors: Xinyu Shen, Qimin Zhang, Huili Zheng, Weiwei Qi

    Abstract: This study evaluates the performance of various supervised machine learning models in analyzing highly correlated neural signaling data from the Adolescent Brain Cognitive Development (ABCD) Study, with a focus on predicting obsessive-compulsive disorder scales. We simulated a dataset to mimic the correlation structures commonly found in imaging data and evaluated logistic regression, elastic netw… ▽ More

    Submitted 14 May, 2024; originally announced July 2024.

  9. arXiv:2406.19966  [pdf, other

    cs.CL

    Simulating Financial Market via Large Language Model based Agents

    Authors: Shen Gao, Yuntao Wen, Minghang Zhu, Jianing Wei, Yuhan Cheng, Qunzi Zhang, Shuo Shang

    Abstract: Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  10. arXiv:2406.19579  [pdf, ps, other

    math.OC cs.CR cs.LG

    Private Zeroth-Order Nonsmooth Nonconvex Optimization

    Authors: Qinzi Zhang, Hoang Tran, Ashok Cutkosky

    Abstract: We introduce a new zeroth-order algorithm for private stochastic optimization on nonconvex and nonsmooth objectives. Given a dataset of size $M$, our algorithm ensures $(α,αρ^2/2)$-Rényi differential privacy and finds a $(δ,ε)$-stationary point so long as $M=\tildeΩ\left(\frac{d}{δε^3} + \frac{d^{3/2}}{ρδε^2}\right)$. This matches the optimal complexity of its non-private zeroth-order analog. Nota… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  11. arXiv:2406.19251  [pdf, other

    cs.CL cs.AI

    AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

    Authors: Jia Fu, Xiaoting Qin, Fangkai Yang, Lu Wang, Jue Zhang, Qingwei Lin, Yubo Chen, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

    Abstract: Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  12. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dong** Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.18884  [pdf, other

    cs.AI

    Sequential three-way group decision-making for double hierarchy hesitant fuzzy linguistic term set

    Authors: Nanfang Luo, Qinghua Zhang, Qin Xie, Yutai Wang, Longjun Yin, Guoyin Wang

    Abstract: Group decision-making (GDM) characterized by complexity and uncertainty is an essential part of various life scenarios. Most existing researches lack tools to fuse information quickly and interpret decision results for partially formed decisions. This limitation is particularly noticeable when there is a need to improve the efficiency of GDM. To address this issue, a novel multi-level sequential t… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  14. arXiv:2406.18118  [pdf, other

    cs.CR cs.CL

    SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

    Authors: Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuan**g Huang

    Abstract: As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.18054  [pdf, other

    eess.IV cs.CV

    Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation

    Authors: Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He

    Abstract: The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereb… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  16. arXiv:2406.16531  [pdf, other

    cs.CV

    GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

    Authors: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

    Abstract: The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Code page: https://github.com/chenyirui/GIM

  17. arXiv:2406.15568  [pdf, other

    cs.LG

    Robust Reinforcement Learning from Corrupted Human Feedback

    Authors: Alexander Bukharin, Ilgee Hong, Haoming Jiang, Qingru Zhang, Zixuan Zhang, Tuo Zhao

    Abstract: Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted p… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 22 pages, 7 figures

  18. arXiv:2406.15043  [pdf, other

    cs.LG

    Discovering Common Information in Multi-view Data

    Authors: Qi Zhang, Mingfei Lu, Shujian Yu, **gmin Xin, Badong Chen

    Abstract: We introduce an innovative and mathematically rigorous definition for computing common information from multi-view data, drawing inspiration from Gács-Körner common information in information theory. Leveraging this definition, we develop a novel supervised multi-view learning framework to capture both common and unique information. By explicitly minimizing a total correlation term, the extracted… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Manuscript accepted by Information Fusion (\url{https://www.sciencedirect.com/science/article/pii/S1566253524001787}). We have updated a few descriptions for clarity. Code is available at \url{https://github.com/archy666/CUMI}

  19. arXiv:2406.14756  [pdf, other

    cs.AI

    SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions

    Authors: Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut, Longin Jan Latecki

    Abstract: We present SciDMT, an enhanced and expanded corpus for scientific mention detection, offering a significant advancement over existing related resources. SciDMT contains annotated scientific documents for datasets (D), methods (M), and tasks (T). The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated me… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: LREC/COLING 2024

    MSC Class: I.2.7

    Journal ref: LREC-COLING. (2024) 14407-14417

  20. arXiv:2406.13897  [pdf, other

    cs.CV

    CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

    Authors: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, **gyi Yu

    Abstract: In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

    Comments: Project page: https://sites.google.com/view/clay-3dlm Video: https://youtu.be/YcKFp4U2Voo

  21. arXiv:2406.13551  [pdf, other

    cs.CL cs.AI

    Mitigating Social Biases in Language Models through Unlearning

    Authors: Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak

    Abstract: Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs. Numerous approaches revolve around data pre-processing and fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. Consequently, there is a growing interest in machine unlearning techniques given their capacity to induce the forgetting of unde… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  22. arXiv:2406.13372  [pdf, other

    cs.AI

    Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

    Authors: Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-conne… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  23. arXiv:2406.13113  [pdf, other

    cs.CV cs.AI q-bio.NC

    CU-Net: a U-Net architecture for efficient brain-tumor segmentation on BraTS 2019 dataset

    Authors: Qimin Zhang, Weiwei Qi, Huili Zheng, Xinyu Shen

    Abstract: Accurately segmenting brain tumors from MRI scans is important for develo** effective treatment plans and improving patient outcomes. This study introduces a new implementation of the Columbia-University-Net (CU-Net) architecture for brain tumor segmentation using the BraTS 2019 dataset. The CU-Net model has a symmetrical U-shaped structure and uses convolutional layers, max pooling, and upsampl… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.12799  [pdf, ps, other

    cs.DS

    Sample-Based Matroid Prophet Inequalities

    Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Hongxun Wu, **zhao Wu, Qianfan Zhang

    Abstract: We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at EC'24

  25. ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

    Authors: Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, Liansheng Wang

    Abstract: With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D vi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (2024)

  26. arXiv:2406.12236  [pdf, other

    eess.AS cs.SD eess.SP

    Binaural Selective Attention Model for Target Speaker Extraction

    Authors: Hanyu Meng, Qiquan Zhang, Xiangyu Zhang, Vidhyasaharan Sethu, Eliathamby Ambikairajah

    Abstract: The remarkable ability of humans to selectively focus on a target speaker in cocktail party scenarios is facilitated by binaural audio processing. In this paper, we present a binaural time-domain Target Speaker Extraction model based on the Filter-and-Sum Network (FaSNet). Inspired by human selective hearing, our proposed model introduces target speaker embedding into separators using a multi-head… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  27. arXiv:2406.12125  [pdf, other

    cs.LG cs.CL

    Efficient Sequential Decision Making with Large Language Models

    Authors: Dingyang Chen, Qi Zhang, Yinglun Zhu

    Abstract: This paper focuses on extending the success of large language models (LLMs) to sequential decision making. Existing efforts either (i) re-train or finetune LLMs for decision making, or (ii) design prompts for pretrained LLMs. The former approach suffers from the computational burden of gradient updates, and the latter approach does not show promising results. In this paper, we propose a new approa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  28. arXiv:2406.11274  [pdf, other

    cs.CL

    Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

    Abstract: The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 1 figure

  29. arXiv:2406.11192  [pdf, other

    cs.CL

    Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition

    Authors: Yuming Yang, Wantong Zhao, Caishuang Huang, Junjie Ye, Xiao Wang, Huiyuan Zheng, Yang Nan, Yuran Wang, Xueying Xu, Kaixin Huang, Yunke Zhang, Tao Gui, Qi Zhang, Xuan**g Huang

    Abstract: Open Named Entity Recognition (NER), which involves identifying arbitrary types of entities from arbitrary domains, remains challenging for Large Language Models (LLMs). Recent studies suggest that fine-tuning LLMs on extensive NER data can boost their performance. However, training directly on existing datasets faces issues due to inconsistent entity definitions and redundant data, limiting LLMs… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages. Project page: https://github.com/UmeanNever/B2NER

  30. arXiv:2406.11190  [pdf, other

    cs.CL cs.AI

    Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

    Authors: Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo Wang, Qi Zhang, Liang Ding, Dacheng Tao

    Abstract: In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values, and provide accurate preference feedback based on these. Current AI feedback methods rely on powerful LLMs, carefully designed specific principles t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures

  31. arXiv:2406.10977  [pdf, other

    cs.CL cs.AI

    Toward Optimal LLM Alignments Using Two-Player Games

    Authors: Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuan**g Huang, Hang Li, Yang Liu

    Abstract: The standard Reinforcement Learning from Human Feedback (RLHF) framework primarily focuses on optimizing the performance of large language models using pre-collected prompts. However, collecting prompts that provide comprehensive coverage is both tedious and challenging, and often fails to include scenarios that LLMs need to improve on the most. In this paper, we investigate alignment through the… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Our code is released at https://github.com/ruizheng20/gpo

    MSC Class: 68

  32. arXiv:2406.10849  [pdf, other

    math.OC cs.CC

    A parallel framework for graphical optimal transport

    Authors: Jiaojiao Fan, Isabel Haasler, Qinsheng Zhang, Johan Karlsson, Yongxin Chen

    Abstract: We study multi-marginal optimal transport (MOT) problems where the underlying cost has a graphical structure. These graphical multi-marginal optimal transport problems have found applications in several domains including traffic flow control and regression problems in the Wasserstein space. MOT problem can be approached through two aspects: a single big MOT problem, or coupled minor OT problems. I… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  33. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dong** Chen, Yue Huang, Siyuan Wu, **gyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  34. arXiv:2406.10663  [pdf, other

    cs.NE cs.HC

    Interpreting Multi-objective Evolutionary Algorithms via Sokoban Level Generation

    Authors: Qingquan Zhang, Yuchen Li, Yuhang Lin, Handing Wang, Jialin Liu

    Abstract: This paper presents an interactive platform to interpret multi-objective evolutionary algorithms. Sokoban level generation is selected as a showcase for its widespread use in procedural content generation. By balancing the emptiness and spatial diversity of Sokoban levels, we illustrate the improved two-archive algorithm, Two_Arch2, a well-known multi-objective evolutionary algorithm. Our web-base… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  35. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  36. arXiv:2406.09098  [pdf, other

    cs.CL

    SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

    Authors: Kehua Feng, Keyan Ding, Weijie Wang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen

    Abstract: The burgeoning utilization of Large Language Models (LLMs) in scientific research necessitates advanced benchmarks capable of evaluating their understanding and application of scientific knowledge comprehensively. To address this need, we introduce the SciKnowEval benchmark, a novel framework that systematically evaluates LLMs across five progressive levels of scientific knowledge: studying extens… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 48 pages, 2 figures

  37. arXiv:2406.08426  [pdf, other

    cs.CL cs.AI cs.DB

    Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

    Authors: Zi** Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, Xiao Huang

    Abstract: Generating accurate SQL according to natural language questions (text-to-SQL) is a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have be… ▽ More

    Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  38. arXiv:2406.07455  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis

    Authors: Qining Zhang, Honghao Wei, Lei Ying

    Abstract: In this paper, we study reinforcement learning from human feedback (RLHF) under an episodic Markov decision process with a general trajectory-wise reward model. We developed a model-free RLHF best policy identification algorithm, called $\mathsf{BSAD}$, without explicit reward model inference, which is a critical intermediate step in the contemporary RLHF paradigms for training large language mode… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.06843  [pdf, other

    cs.CV

    HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

    Authors: Jikai Wang, Qifan Zhang, Yu-Wei Chao, Bowen Wen, Xiaohu Guo, Yu Xiang

    Abstract: We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and o… ▽ More

    Submitted 16 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  40. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  41. arXiv:2406.04854  [pdf, other

    cs.CL

    Uncertainty Aware Learning for Language Model Alignment

    Authors: Yikun Wang, Rui Zheng, Liang Ding, Qi Zhang, Dahua Lin, Dacheng Tao

    Abstract: As instruction-tuned large language models (LLMs) evolve, aligning pretrained foundation models presents increasing challenges. Existing alignment strategies, which typically leverage diverse and high-quality data sources, often overlook the intrinsic uncertainty of tasks, learning all data samples equally. This may lead to suboptimal data efficiency and model performance. In response, we propose… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  42. arXiv:2406.04151  [pdf, other

    cs.AI cs.CL

    AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

    Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuan**g Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project site: https://agentgym.github.io

  43. arXiv:2406.03944  [pdf, other

    cs.LG

    Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples

    Authors: Dake Bu, Wei Huang, Taiji Suzuki, Ji Cheng, Qingfu Zhang, Zhiqiang Xu, Hau-San Wong

    Abstract: Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples. While existing work successfully develops various effective or theory-justified NAL algorithms, the understanding of the two commonly used query criteria of NAL: uncertainty-based and diversity-based, remains in its infancy. In this… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by the 41th Intemational Conference on Machine Learning (lCML 2024)

  44. arXiv:2406.03159  [pdf, other

    cs.NI cs.DC

    Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading

    Authors: Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao

    Abstract: Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. Each satellite collects TB-level data daily, including delay-sensitive data used for crucial tasks, such as military surveillance, natural disaster monitoring, and weather forecasting. According to NASA's sta… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  45. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  46. arXiv:2406.02370  [pdf, other

    cs.RO

    Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

    Authors: Jiaxu Wang, Ziyi Zhang, Qiang Zhang, Jia Li, **gkai Sun, Mingyuan Sun, Junhao He, Ren**g Xu

    Abstract: Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rend… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  47. arXiv:2406.02013  [pdf, other

    cs.LG

    Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

    Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Ren**g Xu

    Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures

  48. arXiv:2406.01587  [pdf, other

    cs.RO

    PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

    Authors: Yupeng Zheng, Zebin Xing, Qichao Zhang, Bu **, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao

    Abstract: Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  49. arXiv:2406.01512  [pdf, other

    cs.CL

    MAD: Multi-Alignment MEG-to-Text Decoding

    Authors: Yiqian Yang, Hyejeong Jo, Yiqun Duan, Qiang Zhang, **ni Zhou, Won Hee Lee, Ren**g Xu, Hui Xiong

    Abstract: Deciphering language from brain activity is a crucial task in brain-computer interface (BCI) research. Non-invasive cerebral signaling techniques including electroencephalography (EEG) and magnetoencephalography (MEG) are becoming increasingly popular due to their safety and practicality, avoiding invasive electrode implantation. However, current works under-investigated three points: 1) a predomi… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  50. arXiv:2406.01460  [pdf, other

    cs.CV cs.AI

    MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

    Authors: Yu Zhang, Qi Zhang, Zixuan Gong, Yiwei Shi, Yepeng Liu, Duoqian Miao, Yang Liu, Ke Liu, Kun Yi, Wei Fan, Liang Hu, Changwei Wang

    Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer s… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024