Skip to main content

Showing 1–50 of 192 results for author: Su, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18048  [pdf, other

    cs.CV

    ScanFormer: Referring Expression Comprehension by Iteratively Scanning

    Authors: Wei Su, Peihan Miao, Huanzhang Dou, Xi Li

    Abstract: Referring Expression Comprehension (REC) aims to localize the target objects specified by free-form natural language descriptions in images. While state-of-the-art methods achieve impressive performance, they perform a dense perception of images, which incorporates redundant visual regions unrelated to linguistic queries, leading to additional computational overhead. This inspires us to explore a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR2024

  2. arXiv:2406.15313  [pdf, other

    cs.IR cs.CL

    STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals

    Authors: Weihang Su, Yiran Hu, Anzhe Xie, Qingyao Ai, Zibing Que, Ning Zheng, Yun Liu, Weixing Shen, Yiqun Liu

    Abstract: Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, legal document drafting, etc. Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents, thereby neglecting non-professional quer… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.14550  [pdf, other

    cs.CL cs.AI

    GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

    Authors: Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first four authors contributed equally, 27 pages

  4. arXiv:2406.11050  [pdf, other

    cs.CL cs.AI

    A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

    Authors: Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

    Abstract: This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syll… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Codes are open-sourced at https://github.com/bowen-upenn/llm_token_bias

  5. arXiv:2406.07543  [pdf, other

    cs.CV

    Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

    Authors: Chenyu Yang, Xizhou Zhu, **guo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai

    Abstract: Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data. Despite these advances, there is no pre-training method that effectively exploits the interleaved image-text data, which is very prevalent on the Internet. Inspired by the recent success of compression learning in natural language processing, we propos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2406.05372  [pdf, ps, other

    stat.ML cs.LG

    Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

    Authors: Jiancong Xiao, Ruoyu Sun, Qi Long, Weijie J. Su

    Abstract: Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfa… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  7. arXiv:2406.03341  [pdf, other

    cs.LG cs.AI stat.AP stat.ME stat.ML

    Tackling GenAI Copyright Issues: Originality Estimation and Genericization

    Authors: Hiroaki Chiba-Okabe, Weijie J. Su

    Abstract: The rapid progress of generative AI technology has sparked significant copyright concerns, leading to numerous lawsuits filed against AI developers. While some studies explore methods to mitigate copyright risks by steering the outputs of generative models away from those resembling copyrighted data, little attention has been paid to the question of how much of a resemblance is undesirable; more o… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  8. arXiv:2406.01658  [pdf, other

    cs.CV

    Proxy Denoising for Source-Free Domain Adaptation

    Authors: Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  9. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  10. arXiv:2406.01359  [pdf, other

    cs.CL cs.SE

    R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    Authors: Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, **gxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

    Abstract: Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2406.00252  [pdf, other

    cs.AI cs.CL cs.CV cs.MA

    Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

    Authors: Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick

    Abstract: Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  12. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Gei**, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.16455  [pdf, other

    stat.ML cs.LG stat.ME

    On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

    Authors: Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, Weijie J. Su

    Abstract: Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its K… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  14. arXiv:2405.08920  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

    Authors: Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang

    Abstract: A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in… ▽ More

    Submitted 16 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: To appear in ICML 2024

  15. arXiv:2405.03393  [pdf, other

    cs.RO eess.SY

    On-site scale factor linearity calibration of MEMS triaxial gyroscopes

    Authors: Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Jiaojiao Li, Steven Weidong Su

    Abstract: The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environme… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  16. arXiv:2405.00723  [pdf, other

    eess.SP cs.AI cs.LG

    EEG_RL-Net: Enhancing EEG MI Classification through Reinforcement Learning-Optimised Graph Neural Networks

    Authors: Htoo Wai Aung, Jiao Jiao Li, Yang An, Steven W. Su

    Abstract: Brain-Computer Interfaces (BCIs) rely on accurately decoding electroencephalography (EEG) motor imagery (MI) signals for effective device control. Graph Neural Networks (GNNs) outperform Convolutional Neural Networks (CNNs) in this regard, by leveraging the spatial relationships between EEG electrodes through adjacency matrices. The EEG_GLT-Net framework, featuring the state-of-the-art EEG_GLT adj… ▽ More

    Submitted 26 April, 2024; originally announced May 2024.

  17. arXiv:2404.13964  [pdf, other

    cs.LG econ.GN stat.ME

    An Economic Solution to Copyright Challenges of Generative AI

    Authors: Jiachen T. Wang, Zhun Deng, Hiroaki Chiba-Okabe, Boaz Barak, Weijie J. Su

    Abstract: Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their cont… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  18. arXiv:2404.12347  [pdf, other

    cs.CV cs.GR

    AniClipart: Clipart Animation with Text-to-Video Priors

    Authors: Ronghuan Wu, Wanchao Su, Kede Ma, **g Liao

    Abstract: Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nev… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project Page: https://aniclipart.github.io/

  19. arXiv:2404.11075  [pdf, other

    cs.LG cs.AI eess.SP

    EEG_GLT-Net: Optimising EEG Graphs for Real-time Motor Imagery Signals Classification

    Authors: Htoo Wai Aung, Jiao Jiao Li, Yang An, Steven W. Su

    Abstract: Brain-Computer Interfaces connect the brain to external control devices, necessitating the accurate translation of brain signals such as from electroencephalography (EEG) into executable commands. Graph Neural Networks (GCN) have been increasingly applied for classifying EEG Motor Imagery signals, primarily because they incorporates the spatial relationships among EEG channels, resulting in improv… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  20. arXiv:2404.06772  [pdf, other

    cs.RO

    Beyond Gait: Learning Knee Angle for Seamless Prosthesis Control in Multiple Scenarios

    Authors: Pengwei Wang, Yilong Chen, Wan Su, Jie Wang, Teng Ma, Haoyong Yu

    Abstract: Deep learning models have become a powerful tool in knee angle estimation for lower limb prostheses, owing to their adaptability across various gait phases and locomotion modes. Current methods utilize Multi-Layer Perceptrons (MLP), Long-Short Term Memory Networks (LSTM), and Convolutional Neural Networks (CNN), predominantly analyzing motion information from the thigh. Contrary to these approache… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures, This work has been submitted to the IEEE-RAL for possible publication

  21. arXiv:2404.06324  [pdf, other

    cs.NI cs.AI cs.LG

    Dynamic D2D-Assisted Federated Learning over O-RAN: Performance Analysis, MAC Scheduler, and Asymmetric User Selection

    Authors: Payam Abdisarabshali, Kwang Taik Kim, Michael Langberg, Weifeng Su, Seyyedali Hosseinalipour

    Abstract: Existing studies on federated learning (FL) are mostly focused on system orchestration for static snapshots of the network and making static control decisions (e.g., spectrum allocation). However, real-world wireless networks are susceptible to temporal variations of wireless channel capacity and users' datasets. In this paper, we incorporate multi-granular system dynamics (MSDs) into FL, includin… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 120 pages, 13 figures

  22. arXiv:2404.01245  [pdf, other

    math.ST cs.CL cs.CR cs.LG stat.ML

    A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

    Authors: Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

    Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical effi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  23. arXiv:2403.18684  [pdf, other

    cs.IR cs.CL

    Scaling Laws For Dense Retrieval

    Authors: Yan Fang, **gtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen, Yiqun Liu

    Abstract: Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-in… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at SIGIR 2024

  24. arXiv:2403.10081  [pdf, other

    cs.CL cs.IR

    DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

    Authors: Weihang Su, Yichen Tang, Qingyao Ai, Zhi**g Wu, Yiqun Liu

    Abstract: Dynamic retrieval augmented generation (RAG) paradigm actively decides when and what to retrieve during the text generation process of Large Language Models (LLMs). There are two key elements of this paradigm: identifying the optimal moment to activate the retrieval module (deciding when to retrieve) and crafting the appropriate query once retrieval is triggered (determining what to retrieve). How… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  25. arXiv:2403.10068  [pdf, other

    cs.CV cs.MA

    What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

    Authors: Wanfang Su, Lixing Chen, Yang Bai, Xi Lin, Gaolei Li, Zhe Qu, Pan Zhou

    Abstract: Multi-agent perception (MAP) allows autonomous systems to understand complex environments by interpreting data from multiple sources. This paper investigates intermediate collaboration for MAP with a specific focus on exploring "good" properties of collaborative view (i.e., post-collaboration feature) and its underlying relationship to individual views (i.e., pre-collaboration features), which wer… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  26. arXiv:2403.07601  [pdf, other

    cs.CV

    Unified Source-Free Domain Adaptation

    Authors: Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: In the pursuit of transferring a source model to a target domain without access to the source training data, Source-Free Domain Adaptation (SFDA) has been extensively explored across various scenarios, including closed-set, open-set, partial-set, and generalized settings. Existing methods, focusing on specific scenarios, not only address only a subset of challenges but also necessitate prior knowl… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  27. arXiv:2403.06448  [pdf, other

    cs.CL cs.AI

    Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

    Authors: Weihang Su, Changyue Wang, Qingyao Ai, Yiran HU, Zhi**g Wu, Yujia Zhou, Yiqun Liu

    Abstract: Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, whic… ▽ More

    Submitted 10 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  28. arXiv:2403.05006  [pdf, ps, other

    cs.LG cs.AI stat.ME stat.ML

    Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

    Authors: Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang

    Abstract: Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how trad… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  29. arXiv:2403.01867  [pdf, other

    cs.LO

    Deciding Separation Logic with Pointer Arithmetic and Inductive Definitions

    Authors: Wanyun Su, Zhilin Wu, Mihaela Sighireanu

    Abstract: Pointer arithmetic is widely used in low-level programs, e.g. memory allocators. The specification of such programs usually requires using pointer arithmetic inside inductive definitions to define the common data structures, e.g. heap lists in memory allocators. In this work, we investigate decision problems for SLAH, a separation logic fragment that allows pointer arithmetic inside inductive defi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2403.00278  [pdf, other

    cs.LG cs.CR math.OC math.ST stat.ML

    Shifted Interpolation for Differential Privacy

    Authors: **ho Bok, Weijie Su, Jason M. Altschuler

    Abstract: Noisy gradient descent and its variants are the predominant algorithms for differentially private machine learning. It is a fundamental question to quantify their privacy leakage, yet tight characterizations remain open even in the foundational setting of convex losses. This paper improves over previous analyses by establishing (and refining) the "privacy amplification by iteration" phenomenon in… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: 45 pages, ICML 2024. v2: added lower bounds (Appendix C.5)

  31. arXiv:2402.14762  [pdf, other

    cs.CL cs.AI

    MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues

    Authors: Ge Bai, Jie Liu, Xingyuan Bu, Yancheng He, Jiaheng Liu, Zhanhui Zhou, Zhuoran Lin, Wenbo Su, Tiezheng Ge, Bo Zheng, Wanli Ouyang

    Abstract: The advent of Large Language Models (LLMs) has drastically enhanced dialogue systems. However, comprehensively evaluating the dialogue abilities of LLMs remains a challenge. Previous benchmarks have primarily focused on single-turn dialogues or provided coarse-grained and incomplete assessments of multi-turn dialogues, overlooking the complexity and fine-grained nuances of real-life dialogues. To… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: [ACL 2024] The first three authors contribute equally, 34 pages, repo at https://github.com/mtbench101/mt-bench-101

  32. arXiv:2402.14660  [pdf, other

    cs.CL cs.AI

    ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

    Authors: Yanan Wu, Jie Liu, Xingyuan Bu, Jiaheng Liu, Zhanhui Zhou, Yuanxing Zhang, Chenchen Zhang, Zhiqi Bai, Haibin Chen, Tiezheng Ge, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: The benchmark dataset will be released soon

  33. arXiv:2402.07877  [pdf, other

    cs.AI

    WildfireGPT: Tailored Large Language Model for Wildfire Analysis

    Authors: Yangxinyu Xie, Tanwi Mallick, Joshua David Bergerson, John K. Hutchison, Duane R. Verner, Jordan Branham, M. Ross Alexander, Robert B. Ross, Yan Feng, Leslie-Anne Levy, Weijie Su

    Abstract: The recent advancement of large language models (LLMs) represents a transformational capability at the frontier of artificial intelligence (AI) and machine learning (ML). However, LLMs are generalized models, trained on extensive text corpus, and often struggle to provide context-specific information, particularly in areas requiring specialized knowledge such as wildfire details within the broader… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  34. arXiv:2402.05146  [pdf, other

    cs.LG cs.AI cs.RO

    Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving

    Authors: Wensheng Su, Zhenni Li, Minrui Xu, Jiawen Kang, Dusit Niyato, Shengli Xie

    Abstract: Deep reinforcement learning (DRL) has shown remarkable success in complex autonomous driving scenarios. However, DRL models inevitably bring high memory consumption and computation, which hinders their wide deployment in resource-limited autonomous driving devices. Structured Pruning has been recognized as a useful method to compress and accelerate DRL models, but it is still challenging to estima… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  35. arXiv:2401.12751  [pdf, other

    cs.CV

    PSDF: Prior-Driven Neural Implicit Surface Learning for Multi-view Reconstruction

    Authors: Wanjuan Su, Chen Zhang, Qingshan Xu, Wenbing Tao

    Abstract: Surface reconstruction has traditionally relied on the Multi-View Stereo (MVS)-based pipeline, which often suffers from noisy and incomplete geometry. This is due to that although MVS has been proven to be an effective way to recover the geometry of the scenes, especially for locally detailed areas with rich textures, it struggles to deal with areas with low texture and large variations of illumin… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  36. arXiv:2401.06951  [pdf, other

    cs.CL cs.AI

    E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

    Authors: Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

    Abstract: Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issue… ▽ More

    Submitted 22 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  37. arXiv:2401.01623  [pdf, other

    cs.AI cs.CL

    Can AI Be as Creative as Humans?

    Authors: Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

    Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the da… ▽ More

    Submitted 25 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.io

  38. arXiv:2312.14389  [pdf, other

    cs.CV

    StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors

    Authors: Wanchao Su, Can Wang, Chen Liu, Hangzhou Han, Hongbo Fu, **g Liao

    Abstract: Creating fine-retouched portrait images is tedious and time-consuming even for professional artists. There exist automatic retouching methods, but they either suffer from over-smoothing artifacts or lack generalization ability. To address such issues, we present StyleRetoucher, a novel automatic portrait image retouching framework, leveraging StyleGAN's generation and generalization ability to imp… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 15 figures

  39. arXiv:2312.14238  [pdf, other

    cs.CV

    InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

    Authors: Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, ** Luo, Tong Lu, Yu Qiao, Jifeng Dai

    Abstract: The exponential growth of large language models (LLMs) has opened up numerous possibilities for multimodal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model… ▽ More

    Submitted 15 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 25 pages, 5 figures, 28 tables

  40. arXiv:2312.11577  [pdf, other

    cs.CV

    PR-NeuS: A Prior-based Residual Learning Paradigm for Fast Multi-view Neural Surface Reconstruction

    Authors: Jianyao Xu, Qingshan Xu, Xinyao Liao, Wanjuan Su, Chen Zhang, Yew-Soon Ong, Wenbing Tao

    Abstract: Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a p… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  41. arXiv:2312.10661  [pdf, other

    cs.IR cs.AI

    Wikiformer: Pre-training with Structured Information of Wikipedia for Ad-hoc Retrieval

    Authors: Weihang Su, Qingyao Ai, Xiangsheng Li, Jia Chen, Yiqun Liu, Xiaolong Wu, Shengluan Hou

    Abstract: With the development of deep learning and natural language processing techniques, pre-trained language models have been widely used to solve information retrieval (IR) problems. Benefiting from the pre-training and fine-tuning paradigm, these models achieve state-of-the-art performance. In previous works, plain texts in Wikipedia have been widely used in the pre-training stage. However, the rich s… ▽ More

    Submitted 1 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

  42. arXiv:2312.05669  [pdf, other

    cs.AI cs.IR

    Relevance Feedback with Brain Signals

    Authors: Ziyi Ye, Xiaohui Xie, Qingyao Ai, Yiqun Liu, Zhihong Wang, Weihang Su, Min Zhang

    Abstract: The Relevance Feedback (RF) process relies on accurate and real-time relevance estimation of feedback documents to improve retrieval performance. Since collecting explicit relevance annotations imposes an extra burden on the user, extensive studies have explored using pseudo-relevance signals and implicit feedback signals as substitutes. However, such signals are indirect indicators of relevance a… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  43. arXiv:2311.16510  [pdf, other

    cs.CV

    Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

    Authors: Song Tang, Wenxin Su, Mao Ye, Xiatian Zhu

    Abstract: Source-Free Domain Adaptation (SFDA) aims to adapt a source model for a target domain, with only access to unlabeled target training data and the source model pre-trained on a supervised source domain. Relying on pseudo labeling and/or auxiliary supervision, conventional methods are inevitably error-prone. To mitigate this limitation, in this work we for the first time explore the potentials of of… ▽ More

    Submitted 13 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted at CVPR 2024

  44. arXiv:2311.14619  [pdf, other

    cs.GT cs.AI

    Eliciting Honest Information From Authors Using Sequential Review

    Authors: Yichi Zhang, Grant Schoenebeck, Weijie Su

    Abstract: In the setting of conference peer review, the conference aims to accept high-quality papers and reject low-quality papers based on noisy review scores. A recent work proposes the isotonic mechanism, which can elicit the ranking of paper qualities from an author with multiple submissions to help improve the conference's decisions. However, the isotonic mechanism relies on the assumption that the au… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 29 pages, 6 figures

  45. arXiv:2311.12205  [pdf, other

    cs.CR cs.CY

    SDN-Based Dynamic Cybersecurity Framework of IEC-61850 Communications in Smart Grid

    Authors: Mansi Girdhar, Junho Hong, Wencong Su, Akila Herath, Chen-Ching Liu

    Abstract: In recent years, critical infrastructure and power grids have experienced a series of cyber-attacks, leading to temporary, widespread blackouts of considerable magnitude. Since most substations are unmanned and have limited physical security protection, cyber breaches into power grid substations present a risk. Nowadays, software-defined network (SDN), a popular virtual network technology based on… ▽ More

    Submitted 7 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 5 pages, 6 figures, 1 table, conference paper, supported by DOE (CESER) program

  46. arXiv:2311.11539  [pdf

    cs.AI

    A New Approach to Intuitionistic Fuzzy Decision Making Based on Projection Technology and Cosine Similarity Measure

    Authors: **g Yang, Wei Su

    Abstract: For a multi-attribute decision making (MADM) problem, the information of alternatives under different attributes is given in the form of intuitionistic fuzzy number(IFN). Intuitionistic fuzzy set (IFS) plays an important role in dealing with un-certain and incomplete information. The similarity measure of intuitionistic fuzzy sets (IFSs) has always been a research hotspot. A new similarity measure… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  47. arXiv:2311.00333  [pdf, other

    cs.IR

    Caseformer: Pre-training for Legal Case Retrieval Based on Inter-Case Distinctions

    Authors: Weihang Su, Qingyao Ai, Yueyue Wu, Yixiao Ma, Haitao Li, Yiqun Liu, Zhi**g Wu, Min Zhang

    Abstract: Legal case retrieval aims to help legal workers find relevant cases related to their cases at hand, which is important for the guarantee of fairness and justice in legal judgments. While recent advances in neural retrieval methods have significantly improved the performance of open-domain retrieval tasks (e.g., Web search), their advantages have not been observed in legal case retrieval due to the… ▽ More

    Submitted 2 January, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

  48. arXiv:2310.19973  [pdf, other

    stat.ML cs.CR cs.LG math.ST stat.ME

    Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy

    Authors: Chendi Wang, Buxin Su, Jiayuan Ye, Reza Shokri, Weijie J. Su

    Abstract: Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy… ▽ More

    Submitted 1 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

  49. arXiv:2310.07997  [pdf, other

    cs.CV cs.AI

    PG-NeuS: Robust and Efficient Point Guidance for Multi-View Neural Surface Reconstruction

    Authors: Chen Zhang, Wanjuan Su, Qingshan Xu, Wenbing Tao

    Abstract: Recently, learning multi-view neural surface reconstruction with the supervision of point clouds or depth maps has been a promising way. However, due to the underutilization of prior information, current methods still struggle with the challenges of limited accuracy and excessive time complexity. In addition, prior data perturbation is also an important but rarely considered issue. To address thes… ▽ More

    Submitted 25 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  50. arXiv:2308.10355  [pdf, other

    eess.AS cs.SD

    Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music

    Authors: Ching-Yu Chiu, Meinard Müller, Matthew E. P. Davies, Alvin Wen-Yu Su, Yi-Hsuan Yang

    Abstract: To model the periodicity of beats, state-of-the-art beat tracking systems use "post-processing trackers" (PPTs) that rely on several empirically determined global assumptions for tempo transition, which work well for music with a steady tempo. For expressive classical music, however, these assumptions can be too rigid. With two large datasets of Western classical piano music, namely the Aligned Sc… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (July 2023)