Skip to main content

Showing 1–50 of 2,033 results for author: Chen, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01239  [pdf, other

    cs.CV cs.AI

    SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

    Authors: Ao Liang, Wenyu Chen, Jian Fang, Huaici Zhao

    Abstract: The single-stage point-based 3D object detectors have attracted widespread research interest due to their advantages of lightweight and fast inference speed. However, they still face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC). In this paper, we propose SGCCNet to alleviate these two issues.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 16 figures

  2. arXiv:2407.01027  [pdf, other

    cs.CV

    Blind Inversion using Latent Diffusion Priors

    Authors: Weimin Bai, Siyi Chen, Wenzheng Chen, He Sun

    Abstract: Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion m… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.01014  [pdf, other

    cs.CV

    An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations

    Authors: Weimin Bai, Yifei Wang, Wenzheng Chen, He Sun

    Abstract: Diffusion models excel in solving imaging inverse problems due to their ability to model complex image priors. However, their reliance on large, clean datasets for training limits their practical use where clean data is scarce. In this paper, we propose EMDiffusion, an expectation-maximization (EM) approach to train diffusion models from corrupted observations. Our method alternates between recons… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2407.01004  [pdf, other

    cs.LG stat.ME

    CURLS: Causal Rule Learning for Subgroups with Significant Treatment Effect

    Authors: Jiehui Zhou, Linxiao Yang, Xingyu Liu, Xinyue Gu, Liang Sun, Wei Chen

    Abstract: In causal inference, estimating heterogeneous treatment effects (HTE) is critical for identifying how different subgroups respond to interventions, with broad applications in fields such as precision medicine and personalized advertising. Although HTE estimation methods aim to improve accuracy, how to provide explicit subgroup descriptions remains unclear, hindering data interpretation and strateg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  5. arXiv:2407.00837  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Towards Robust Speech Representation Learning for Thousands of Languages

    Authors: William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, **chuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 millio… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 20 pages

  6. arXiv:2407.00750  [pdf, other

    cs.CR cs.IT

    Physical Layer Deception with Non-Orthogonal Multiplexing

    Authors: Wenwen Chen, Bin Han, Yao Zhu, Anke Schmeink, Giuseppe Caire, Hans D. Schotten

    Abstract: Physical layer security (PLS) is a promising technology to secure wireless communications by exploiting the physical properties of the wireless channel. However, the passive nature of PLS creates a significant imbalance between the effort required by eavesdroppers and legitimate users to secure data. To address this imbalance, in this article, we propose a novel framework of physical layer decepti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Submitted to IEEE Transactions on Wireless Communications

  7. arXiv:2407.00674  [pdf, other

    cs.MA cs.GR cs.RO

    Emergent Crowd Grou** via Heuristic Self-Organization

    Authors: Xiao-Cheng Liao, Wei-Neng Chen, Xiang-Ling Chen, Yi Mei

    Abstract: Modeling crowds has many important applications in games and computer animation. Inspired by the emergent following effect in real-life crowd scenarios, in this work, we develop a method for implicitly grou** moving agents. We achieve this by analyzing local information around each agent and rotating its preferred velocity accordingly. Each agent could automatically form an implicit group with i… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Gras** in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool gras** remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  9. arXiv:2407.00336  [pdf, other

    cs.CR cs.LG

    Dual-view Aware Smart Contract Vulnerability Detection for Ethereum

    Authors: Jiacheng Yao, Maolin Wang, Wanqi Chen, Chengxiang **, Jiajun Zhou, Shanqing Yu, Qi Xuan

    Abstract: The wide application of Ethereum technology has brought technological innovation to traditional industries. As one of Ethereum's core applications, smart contracts utilize diverse contract codes to meet various functional needs and have gained widespread use. However, the non-tamperability of smart contracts, coupled with vulnerabilities caused by natural flaws or human errors, has brought unprece… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

  10. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  11. arXiv:2406.19195  [pdf, other

    cs.LG cs.AI

    Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

    Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

    Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  12. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang **g, Haining Tan, **g** Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.18540  [pdf, other

    cs.CV cs.CR

    Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing

    Authors: Yunlong Zhao, Xiaoheng Deng, Yi**g Liu, Xinjun Pei, Jiazhi Xia, Wei Chen

    Abstract: Model stealing (MS) involves querying and observing the output of a machine learning model to steal its capabilities. The quality of queried data is crucial, yet obtaining a large amount of real data for MS is often challenging. Recent works have reduced reliance on real data by using generative models. However, when high-dimensional query data is required, these methods are impractical due to the… ▽ More

    Submitted 18 May, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  14. arXiv:2406.18538  [pdf, other

    cs.CV cs.AI eess.IV

    VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

    Authors: Jiangyuan Guo, Wei Chen, Yuxuan Sun, Jialong Xu, Bo Ai

    Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth eff… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

  15. arXiv:2406.18310  [pdf, other

    cs.CV cs.LG eess.IV

    Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

    Authors: Wenting Chen, Jie Liu, Tommy W. S. Chow, Yixuan Yuan

    Abstract: Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and mis… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE TRANSACTIONS ON MEDICAL IMAGING (TMI)

  16. arXiv:2406.18082  [pdf, other

    cs.CL cs.HC

    Octo-planner: On-device Language Model for Planner-Action Agents

    Authors: Wei Chen, Zhiyuan Li, Zhen Guo, Yikang Shen

    Abstract: AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two di… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  17. arXiv:2406.17588  [pdf, other

    cs.CL

    LongIns: A Challenging Long-context Instruction-based Exam for LLMs

    Authors: Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

    Abstract: The long-context capabilities of large language models (LLMs) have been a hot topic in recent years. To evaluate the performance of LLMs in different scenarios, various assessment benchmarks have emerged. However, as most of these benchmarks focus on identifying key information to answer questions, which mainly requires the retrieval ability of LLMs, these benchmarks can partially represent the re… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  18. arXiv:2406.15319  [pdf, other

    cs.CL cs.AI

    LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

    Authors: Ziyan Jiang, Xueguang Ma, Wenhu Chen

    Abstract: In traditional RAG framework, the basic retrieval units are normally short. The common retrievers like DPR normally work with 100-word Wikipedia paragraphs. Such a design forces the retriever to search over a large corpus to find the `needle' unit. In contrast, the readers only need to extract answers from the short retrieved units. Such an imbalanced `heavy' retriever and `light' reader design ca… ▽ More

    Submitted 30 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Technical Report

  19. arXiv:2406.15252  [pdf, other

    cs.CV cs.AI

    VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

    Authors: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

    Abstract: The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  20. arXiv:2406.14955  [pdf, other

    cs.CL

    ICLEval: Evaluating In-Context Learning Ability of Large Language Models

    Authors: Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen

    Abstract: In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs. Evaluating the ICL ability of LLMs can enhance their utilization and deepen our understanding of how this ability is acquired at the training stage. However, existing evaluation frameworks primarily focus on language abilities and knowledge,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  21. arXiv:2406.14939  [pdf, other

    cs.IT eess.SP

    RIS-aided MIMO Beamforming: Piece-Wise Near-field Channel Model

    Authors: Weijian Chen, Zai Yang, Zhiqiang Wei, Derrick Wing Kwan Ng, Michail Matthaiou

    Abstract: This paper proposes a joint active and passive beamforming design for reconfigurable intelligent surface (RIS)-aided wireless communication systems, adopting a piece-wise near-field channel model. While a traditional near-field channel model, applied without any approximations, offers higher modeling accuracy than a far-field model, it renders the system design more sensitive to channel estimation… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 28pages

  22. arXiv:2406.14928  [pdf, other

    cs.AI cs.CL cs.HC cs.MA cs.SI

    Autonomous Agents for Collaborative Task under Information Asymmetry

    Authors: Wei Liu, Chenxi Wang, Yifei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian

    Abstract: Large Language Model Multi-Agent Systems (LLM-MAS) have achieved great progress in solving complex tasks. It performs communication among agents within the system to collaboratively solve tasks, under the premise of shared information. However, when agents' communication is leveraged to enhance human cooperation, a new challenge arises due to information asymmetry, since each agent can only access… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures, 5 tables, Work in progress

  23. arXiv:2406.14753  [pdf, other

    cs.LG stat.ME

    A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

    Authors: Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain

    Abstract: We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish theoretical properties of our approach and derive an algorithm based on a specific instance of this approach. Our empirical results demonstrate the significant benefits of our approach.

    Submitted 20 June, 2024; originally announced June 2024.

  24. arXiv:2406.14129  [pdf, other

    cs.CV cs.CL cs.MM

    Towards Event-oriented Long Video Understanding

    Authors: Yifan Du, Kun Zhou, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen

    Abstract: With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work on progress

  25. arXiv:2406.14023  [pdf, other

    cs.CL cs.AI

    Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

    Authors: Yuchen Wen, Ke** Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng

    Abstract: As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired by psychometric principles… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Code and datasets are available at https://github.com/wen112358/ImplicitBiasPsychometricEvaluation

  26. arXiv:2406.13930  [pdf, other

    cs.LG

    Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

    Authors: Wentse Chen, Shiyu Huang, Jeff Schneider

    Abstract: Multi-agent reinforcement learning (MARL) tasks often utilize a centralized training with decentralized execution (CTDE) framework. QMIX is a successful CTDE method that learns a credit assignment function to derive local value functions from a global value function, defining a deterministic local policy. However, QMIX is hindered by its poor exploration strategy. While maximum entropy reinforceme… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  28. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  29. arXiv:2406.13629  [pdf, other

    cs.CL cs.LG

    InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising

    Authors: Zhepei Wei, Wei-Lin Chen, Yu Meng

    Abstract: Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting fina… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/weizhepei/InstructRAG

  30. arXiv:2406.13162  [pdf, other

    cs.LG cs.AI q-bio.QM

    AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions

    Authors: Bohao Xu, Yanbo Wang, Wenyu Chen, Shimin Shan

    Abstract: Therapeutic antibodies have been extensively studied in drug discovery and development in the past decades. Antibodies are specialized protective proteins that bind to antigens in a lock-to-key manner. The binding strength/affinity between an antibody and a specific antigen is heavily determined by the complementarity-determining regions (CDRs) on the antibodies. Existing machine learning methods… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  31. arXiv:2406.12699  [pdf, other

    cs.SD eess.AS eess.SP

    Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition

    Authors: Kuan-Chen Wang, You-** Li, Wei-Lun Chen, Yu-Wen Chen, Yi-Ching Wang, **-Cheng Yeh, Chao Zhang, Yu Tsao

    Abstract: Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introduces artifacts into the enhanced signals and harms ASR performance, particularly when SE and ASR are independently trained. Therefore, this study intro… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  32. arXiv:2406.12566  [pdf, other

    cs.CL

    RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation

    Authors: Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-augmented generation (RAG) effectively addresses issues of static knowledge and hallucination in large language models. Existing studies mostly focus on question scenarios with clear user intents and concise answers. However, it is prevalent that users issue broad, open-ended queries with diverse sub-intents, for which they desire rich and long-form answers covering multiple relevant asp… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  33. arXiv:2406.12403  [pdf, other

    cs.CL cs.AI

    PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

    Authors: Tao Fan, Yan Kang, Wei**g Chen, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

    Abstract: In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed promp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  34. arXiv:2406.12397  [pdf, other

    cs.CL

    Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models

    Authors: Jie Chen, Yupeng Zhang, Bingning Wang, Wayne Xin Zhao, Ji-Rong Wen, Weipeng Chen

    Abstract: Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs). Studies have shown that synthetic data can effectively improve the performance of LLMs on downstream benchmarks. However, despite its potential benefits, our analysis suggests that there may be inherent flaws in synthetic data. The uniform format of syn… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 15 pages

  35. arXiv:2406.12123  [pdf, other

    cs.RO cs.AI cs.LG

    ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke

    Authors: **gxi Xu, Runsheng Wang, Siqi Shang, Ava Chen, Lauren Winterbottom, To-Liang Hsu, Wenxi Chen, Khondoker Ahmed, Pedro Leandro La Rotta, Xinyue Zhu, Dawn M. Nilsen, Joel Stein, Matei Ciocarlie

    Abstract: Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train i… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  36. arXiv:2406.11441  [pdf, other

    cs.CV

    SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

    Authors: Zhenchao Lin, Li He, Hongqiang Yang, Xiaoqun Sun, Cuo** Zhang, Weinan Chen, Yisheng Guan, Hong Zhang

    Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  37. arXiv:2406.11385  [pdf, other

    cs.CL

    MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

    Authors: Yuyan Zhou, Liang Song, Bingning Wang, Weipeng Chen

    Abstract: The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages

  38. arXiv:2406.11345  [pdf, other

    cs.CL cs.AI

    Full-ECE: A Metric For Token-level Calibration on Large Language Models

    Authors: Han Liu, Yupeng Zhang, Bingning Wang, Weipeng Chen, Xiaolin Hu

    Abstract: Deep Neural Networks (DNNs) excel in various domains but face challenges in providing accurate uncertainty estimates, which are crucial for high-stakes applications. Large Language Models (LLMs) have recently emerged as powerful tools, demonstrating exceptional performance in language tasks. However, traditional calibration metrics such as Expected Calibration Error (ECE) and classwise-ECE (cw-ECE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  39. arXiv:2406.11317  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    GUICourse: From General Vision Language Models to Versatile GUI Agents

    Authors: Wentong Chen, Junbo Cui, **yi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan Yao, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Utilizing Graphic User Interface (GUI) for human-computer interaction is essential for accessing a wide range of digital tools. Recent advancements in Vision Language Models (VLMs) highlight the compelling potential to develop versatile agents to help humans finish GUI navigation tasks. However, current VLMs are challenged in terms of fundamental abilities (OCR and grounding) and GUI knowledge (th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  40. arXiv:2406.11256  [pdf, other

    cs.CL

    Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

    Authors: Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng

    Abstract: Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply fixed sampling weights, without considering the importance of different tasks as the model training state changes. In this way, the most helpful data c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  41. arXiv:2406.11251  [pdf, other

    cs.IR

    Unifying Multimodal Retrieval via Document Screenshot Embedding

    Authors: Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin

    Abstract: In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding} (DSE), a novel retrieval paradigm that regards docu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  42. arXiv:2406.11245  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks

    Authors: Kangwei Qi, Qiong Wu, **yi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/RIS-RB-AoI-V2X-DRL.git

  43. arXiv:2406.11159  [pdf, other

    cs.LG cs.DC

    Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework

    Authors: Siyuan Yu, Wei Chen, H. Vincent Poor

    Abstract: Distributed stochastic gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and hel** protect user privacy in machine learning. However, the staggers and limited bandwidth may induce random computational/communication delays, thereby severely hindering the learning process. Therefore, how to accelerat… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  44. arXiv:2406.11132  [pdf, other

    cs.CL cs.AI cs.LG

    RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents

    Authors: Weizhe Chen, Sven Koenig, Bistra Dilkina

    Abstract: In this past year, large language models (LLMs) have had remarkable success in domains outside the traditional natural language processing, and people are starting to explore the usage of LLMs in more general and close to application domains like code generation, travel planning, and robot controls. Connecting these LLMs with great capacity and external tools, people are building the so-called LLM… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  45. arXiv:2406.11069  [pdf, other

    cs.CV cs.AI cs.CL

    WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

    Authors: Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Ye** Choi, Bill Yuchen Lin

    Abstract: Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that collects human preferences to evaluate VLMs. We curated WV-Bench by selecting 500 high-quality samples from 8,000 user submissions in WV-Arena. WV-Bench uses GPT-4… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: link: https://hf.co/spaces/WildVision/vision-arena

  46. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  47. arXiv:2406.10462  [pdf, other

    cs.CV

    CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

    Authors: Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

    Abstract: Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data qu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 22 pages

  48. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  49. arXiv:2406.09846  [pdf, ps, other

    cs.IT eess.SP

    Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

    Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, **gfeng Chen, Nan Cheng

    Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

  50. arXiv:2406.09627  [pdf, other

    cs.CV cs.AI eess.IV

    RobustSAM: Segment Anything Robustly on Degraded Images

    Authors: Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

    Abstract: Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system. Nonetheless, its performance is challenged by images with degraded quality. Addressing this limitation, we propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality image… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR2024 (Highlight); Project Page: https://robustsam.github.io/