Skip to main content

Showing 1–50 of 3,135 results for author: Wu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01219  [pdf, other

    cs.CL

    Searching for Best Practices in Retrieval-Augmented Generation

    Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuan**g Huang

    Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01103  [pdf, other

    cs.RO

    FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

    Authors: Wei-Bin Kou, Qingfeng Lin, Ming Tang, Shuai Wang, Guangxu Zhu, Yik-Chung Wu

    Abstract: Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because o… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This work has been accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  3. Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation

    Authors: Yuting Zhang, Yiqing Wu, Ruidong Han, Ying Sun, Yongchun Zhu, Xiang Li, Wei Lin, Fuzhen Zhuang, Zhulin An, Yongjun Xu

    Abstract: Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00467  [pdf, other

    cs.LG cs.DC eess.IV

    VcLLM: Video Codecs are Secretly Tensor Codecs

    Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

    Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  5. arXiv:2407.00431  [pdf, other

    cs.CV

    Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones

    Authors: Qiangguo **, Jiapeng Huang, Changming Sun, Hui Cui, ** Xuan, Ran Su, Leyi Wei, Yu-Jie Wu, Chia-An Wu, Henry B. L. Duh, Yueh-Hsun Lu

    Abstract: The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Journal ref: MICCAI 2024

  6. arXiv:2407.00079  [pdf, other

    cs.DC cs.AI cs.AR

    Mooncake: Kimi's KVCache-centric Architecture for LLM Serving

    Authors: Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu

    Abstract: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances ma… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 21 pages, 11 figures

  7. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.19665  [pdf, other

    cs.CV

    PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation

    Authors: Zhang**g Yang, Dun Liu, Xin Wang, Zhe Li, Barathwaj Anandan, Yi Wu

    Abstract: Video instance segmentation requires detecting, segmenting, and tracking objects in videos, typically relying on costly video annotations. This paper introduces a method that eliminates video annotations by utilizing image datasets. The PM-VIS algorithm is adapted to handle both bounding box and instance-level pixel annotations dynamically. We introduce ImageNet-bbox to supplement missing categori… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: MIPR 2024

  9. arXiv:2406.19396  [pdf, other

    cs.CE

    SimLOB: Learning Representations of Limited Order Book for Financial Market Simulation

    Authors: Yuanzhe Li, Yue Wu, Peng Yang

    Abstract: Financial market simulation (FMS) serves as a promising tool for understanding market anomalies and the underlying trading behaviors. To ensure high-fidelity simulations, it is crucial to calibrate the FMS model for generating data closely resembling the observed market data. Previous efforts primarily focused on calibrating the mid-price data, leading to essential information loss of the market a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  10. arXiv:2406.18941  [pdf, other

    cs.CV

    CLIP3D-AD: Extending CLIP for 3D Few-Shot Anomaly Detection with Multi-View Images Generation

    Authors: Zuo Zuo, Jiahao Dong, Yao Wu, Yanyun Qu, Zongze Wu

    Abstract: Few-shot anomaly detection methods can effectively address data collecting difficulty in industrial scenarios. Compared to 2D few-shot anomaly detection (2D-FSAD), 3D few-shot anomaly detection (3D-FSAD) is still an unexplored but essential task. In this paper, we propose CLIP3D-AD, an efficient 3D-FSAD method extended on CLIP. We successfully transfer strong generalization ability of CLIP into 3D… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

  11. arXiv:2406.18543  [pdf, ps, other

    cs.CV

    A Set-based Approach for Feature Extraction of 3D CAD Models

    Authors: Peng Xu, Qi Gao, Ying-Jie Wu

    Abstract: Feature extraction is a critical technology to realize the automatic transmission of feature information throughout product life cycles. As CAD models primarily capture the 3D geometry of products, feature extraction heavily relies on geometric information. However, existing feature extraction methods often yield inaccurate outcomes due to the diverse interpretations of geometric information. This… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

    Comments: 13 pages

  12. arXiv:2406.18534  [pdf, other

    cs.CL cs.LG

    Towards Compositionality in Concept Learning

    Authors: Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong

    Abstract: Concept-based interpretability methods offer a lens into the internals of foundation models by decomposing their embeddings into high-level concepts. These concept representations are most useful when they are compositional, meaning that the individual concepts compose to explain the full sample. We show that existing unsupervised concept extraction methods find concepts which are not compositiona… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. 26 pages, 10 figures

  13. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, **feng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  14. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

  15. arXiv:2406.17114  [pdf, other

    cs.LG cs.CR cs.GT

    Inception: Efficiently Computable Misinformation Attacks on Markov Games

    Authors: Jeremy McMahan, Young Wu, Yudong Chen, Xiao** Zhu, Qiaomin Xie

    Abstract: We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal wors… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to Reinforcement Learning Conference (RLC) 2024

  16. arXiv:2406.16864  [pdf, other

    cs.CV cs.AI cs.GR

    StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

    Authors: Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

    Abstract: This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the e… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: HF Demo: hf.co/Stable-X, Video: https://www.youtube.com/watch?v=sylXTxG_U2U

  17. arXiv:2406.16776  [pdf, other

    cs.CV

    Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation

    Authors: Yizheng Wu, Zhiyu Pan, Kewei Wang, Xingyi Li, Jiahao Cui, Liwen Xiao, Guosheng Lin, Zhiguo Cao

    Abstract: Large-scale datasets with point-wise semantic and instance labels are crucial to 3D instance segmentation but also expensive. To leverage unlabeled data, previous semi-supervised 3D instance segmentation approaches have explored self-training frameworks, which rely on high-quality pseudo labels for consistency regularization. They intuitively utilize both instance and semantic pseudo labels in a j… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures

  18. arXiv:2406.16449  [pdf, other

    cs.CV

    Evaluating and Analyzing Relationship Hallucinations in LVLMs

    Authors: Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

    Abstract: The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Benc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML2024

  19. arXiv:2406.16271  [pdf, other

    cs.CV

    Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation

    Authors: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng

    Abstract: Assessment of the glomerular basement membrane (GBM) in transmission electron microscopy (TEM) is crucial for diagnosing chronic kidney disease (CKD). The lack of domain-independent automatic segmentation tools for the GBM necessitates an AI-based solution to automate the process. In this study, we introduce GBMSeg, a training-free framework designed to automatically segment the GBM in TEM images… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted for MICCAI2024

  20. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, **g Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  21. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  22. arXiv:2406.15073  [pdf, other

    cs.AI cs.DB

    KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning

    Authors: Jiahan Chen, Shuhan Qi, Yifan Li, Zeyu Dong, Mingfeng Ding, Yulin Wu, Xuan Wang

    Abstract: Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box prope… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  23. arXiv:2406.14884  [pdf, other

    cs.CL

    FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents

    Authors: Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li

    Abstract: LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. De… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  24. arXiv:2406.14096  [pdf, other

    cs.AI cs.LG

    Graph Neural Networks for Job Shop Scheduling Problems: A Survey

    Authors: Igor G. Smit, Jianan Zhou, Robbert Reijnen, Yaoxin Wu, Jian Chen, Cong Zhang, Zaharah Bukhsh, Wim Nuijten, Yingqian Zhang

    Abstract: Job shop scheduling problems (JSSPs) represent a critical and challenging class of combinatorial optimization problems. Recent years have witnessed a rapid increase in the application of graph neural networks (GNNs) to solve JSSPs, albeit lacking a systematic survey of the relevant literature. This paper aims to thoroughly review prevailing GNN methods for different types of JSSPs and the closely… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  25. arXiv:2406.14088  [pdf, other

    cs.DC cs.AI cs.CL cs.LG

    ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

    Authors: Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 13 pages (15 pages with references), 13 figures

  26. Unifying Graph Convolution and Contrastive Learning in Collaborative Filtering

    Authors: Yihong Wu, Le Zhang, Fengran Mo, Tianyu Zhu, Weizhi Ma, Jian-Yun Nie

    Abstract: Graph-based models and contrastive learning have emerged as prominent methods in Collaborative Filtering (CF). While many existing models in CF incorporate these methods in their design, there seems to be a limited depth of analysis regarding the foundational principles behind them. This paper bridges graph convolution, a pivotal element of graph-based models, with contrastive learning through a t… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  27. arXiv:2406.13495  [pdf, other

    cs.CV

    DF40: Toward Next-Generation Deepfake Detection

    Authors: Zhiyuan Yan, Tai** Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding, Yunsheng Wu

    Abstract: We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass"… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  28. arXiv:2406.12757  [pdf, other

    cs.CV

    MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

    Authors: Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Bo Du, Yu Wu

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' n… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13pages,5figures

  29. arXiv:2406.12738  [pdf, other

    cs.CL cs.AI

    Large Language Model as a Universal Clinical Multi-task Decoder

    Authors: Yujiang Wu, Hongjian Song, Jiawen Zhang, Xumeng Wen, Shun Zheng, Jiang Bian

    Abstract: The development of effective machine learning methodologies for enhancing the efficiency and accuracy of clinical systems is crucial. Despite significant research efforts, managing a plethora of diversified clinical tasks and adapting to emerging new tasks remain significant challenges. This paper presents a novel paradigm that employs a pre-trained large language model as a universal clinical mul… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in progress

  30. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  31. arXiv:2406.11589  [pdf, other

    cs.SE cs.AI cs.IR

    CoSQA+: Enhancing Code Search Dataset with Matching Code

    Authors: **g Gong, Yanghui Wu, Linxi Liang, Zibin Zheng, Yanlin Wang

    Abstract: Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering. Existing code search datasets are problematic: either using unrealistic queries, or with mismatched codes, and typically using one-to-one query-code pairing, which fails to reflect the reality that a query might have multiple valid code matches. T… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 11 pages, 4 figures, conference

    ACM Class: I.2.7; D.2.3

  32. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  33. arXiv:2406.11213  [pdf, other

    cs.SE

    A Survey of AIOps for Failure Management in the Era of Large Language Models

    Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S. Yu, Ying Li

    Abstract: As software systems grow increasingly intricate, Artificial Intelligence for IT Operations (AIOps) methods have been widely used in software system failure management to ensure the high availability and reliability of large-scale distributed software systems. However, these methods still face several challenges, such as lack of cross-platform generality and cross-task flexibility. Fortunately, rec… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 35 pages

  34. arXiv:2406.10911  [pdf, other

    cs.SD eess.AS

    SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

    Authors: Yuxun Tang, Jiatong Shi, Yuning Wu, Qin **

    Abstract: In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction m… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  35. arXiv:2406.10845  [pdf, other

    cs.CV

    LAIP: Learning Local Alignment from Image-Phrase Modeling for Text-based Person Search

    Authors: Haiguang Wang, Yu Wu, Mengxia Wu, Cao Min, Min Zhang

    Abstract: Text-based person search aims at retrieving images of a particular person based on a given textual description. A common solution for this task is to directly match the entire images and texts, i.e., global alignment, which fails to deal with discerning specific details that discriminate against appearance-similar people. As a result, some works shift their attention towards local alignment. One g… ▽ More

    Submitted 23 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  36. arXiv:2406.10462  [pdf, other

    cs.CV

    CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

    Authors: Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

    Abstract: Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data qu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 22 pages

  37. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  38. arXiv:2406.09781  [pdf, other

    cs.CV

    GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding

    Authors: Yiqi Wu, Xiaodan Hu, Ziming Fu, Siling Zhou, Jiangong Li

    Abstract: Animal ethology is an crucial aspect of animal research, and animal behavior labeling is the foundation for studying animal behavior. This process typically involves labeling video clips with behavioral semantic tags, a task that is complex, subjective, and multimodal. With the rapid development of multimodal large language models(LLMs), new application have emerged for animal behavior understandi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  39. arXiv:2406.09631  [pdf, other

    cs.RO

    Optimal Convex Cover as Collision-free Space Approximation for Trajectory Generation

    Authors: Yuwei Wu, Igor Spasojevic, Pratik Chaudhari, Vijay Kumar

    Abstract: We propose an online iterative algorithm to find a suitable convex cover to under-approximate the free space for autonomous navigation to delineate Safe Flight Corridors (SFC). The convex cover consists of a set of polytopes such that the union of the polytopes represents obstacle-free space, allowing us to find trajectories for robots that lie within the convex cover. In order to find the SFC tha… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  40. arXiv:2406.09485  [pdf, other

    cs.SE

    Integrated Modeling, Verification, and Code Generation for Unmanned Aerial Systems

    Authors: Jianyu Zhang, Long Zhang, Yixuan Wu, Linru Ma, Feng Yang

    Abstract: Unmanned Aerial Systems (UAS) are currently widely used in safety-critical fields such as industrial production, military operations, and disaster relief. Due to the diversity and complexity of application scenarios, UAS have become increasingly intricate. The challenge of designing and implementing highly reliable UAS while effectively controlling development costs and enhancing efficiency is a p… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  41. arXiv:2406.09481  [pdf, other

    cs.CV cs.LG

    ELF-UA: Efficient Label-Free User Adaptation in Gaze Estimation

    Authors: Yong Wu, Yang Wang, Sanqing Qu, Zhijun Li, Guang Chen

    Abstract: We consider the problem of user-adaptive 3D gaze estimation. The performance of person-independent gaze estimation is limited due to interpersonal anatomical differences. Our goal is to provide a personalized gaze estimation model specifically adapted to a target user. Previous work on user-adaptive gaze estimation requires some labeled images of the target person data to fine-tune the model at te… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by IJCAI'24

  42. arXiv:2406.09363  [pdf, ps, other

    cs.AI cs.GT cs.LG

    ElicitationGPT: Text Elicitation Mechanisms via Language Models

    Authors: Yifan Wu, Jason Hartline

    Abstract: Scoring rules evaluate probabilistic forecasts of an unknown state against the realized state and are a fundamental building block in the incentivized elicitation of information and the training of machine learning models. This paper develops mechanisms for scoring elicited text against ground truth text using domain-knowledge-free queries to a large language model (specifically ChatGPT) and empir… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  43. arXiv:2406.09295  [pdf, other

    cs.CL cs.CV

    AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

    Authors: Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

    Abstract: Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically des… ▽ More

    Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  44. arXiv:2406.09071  [pdf

    cs.LG

    FlamePINN-1D: Physics-informed neural networks to solve forward and inverse problems of 1D laminar flames

    Authors: Jiahao Wu, Su Zhang, Yuxin Wu, Guihua Zhang, Xin Li, Hai Zhang

    Abstract: Given the existence of various forward and inverse problems in combustion studies and applications that necessitate distinct methods for resolution, a framework to solve them in a unified way is critically needed. A promising approach is the integration of machine learning methods with governing equations of combustion systems, which exhibits superior generality and few-shot learning ability compa… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  45. arXiv:2406.08905  [pdf, other

    cs.SD eess.AS

    SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

    Authors: Yuxun Tang, Yuning Wu, Jiatong Shi, Qin **

    Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation th… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  46. arXiv:2406.08761  [pdf, other

    cs.SD eess.AS

    VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

    Authors: Yifeng Yu, Jiatong Shi, Yuning Wu, Shinji Watanabe

    Abstract: Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data, which limits the effectiveness of supervised learning methods. In response to this challenge, this paper introduces a novel approach to enhance the quality of SVS by leveraging unlabeled data from pr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures

  47. arXiv:2406.08491  [pdf, other

    quant-ph cs.DC

    FPGA-based Distributed Union-Find Decoder for Surface Codes

    Authors: Namitha Liyanage, Yue Wu, Siona Tagare, Lin Zhong

    Abstract: A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than $O(d^3)$. We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation,… ▽ More

    Submitted 20 March, 2024; originally announced June 2024.

    Comments: The article extends the work in arXiv:2301.08419, which also appeared in https://ieeexplore.ieee.org/document/10313800

  48. arXiv:2406.08416  [pdf, other

    cs.SD eess.AS

    TokSing: Singing Voice Synthesis based on Discrete Tokens

    Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin **

    Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  49. arXiv:2406.08068  [pdf, other

    cs.CL

    Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

    Authors: Hao Yang, Yanyan Zhao, Yang Wu, Shilong Wang, Tian Zheng, Hongbo Zhang, Wanxiang Che, Bing Qin

    Abstract: Compared to traditional sentiment analysis, which only considers text, multimodal sentiment analysis needs to consider emotional signals from multimodal sources simultaneously and is therefore more consistent with the way how humans process sentiment in real-world scenarios. It involves processing emotional information from various sources such as natural language, images, videos, audio, physiolog… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  50. arXiv:2406.08037  [pdf, other

    cs.CV

    Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

    Authors: Xiangyang Yang, Dan Zeng, Xucheng Wang, You Wu, Hengzhou Ye, Qijun Zhao, Shuiwang Li

    Abstract: Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is ro… ▽ More

    Submitted 1 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.