Skip to main content

Showing 1–50 of 427 results for author: Yuan, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18849  [pdf, other

    cs.CV

    Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

    Authors: Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

    Abstract: Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.18051  [pdf, other

    cs.CV

    ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

    Authors: Zhengqing Yuan, Rong Zhou, Hongyi Wang, Lifang He, Yanfang Ye, Lichao Sun

    Abstract: Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose significant challenges for deployment in resource-constrained environments. This paper introduces ViT-1.58b, a novel 1.58-bit quantized ViT model designed to dr… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17115  [pdf, other

    cs.CV cs.AI

    Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

    Authors: Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen

    Abstract: Despite the rapid progress and outstanding performance of Large Vision-Language Models (LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, i.e., LVLMs tend to generate responses that are inconsistent with the corresponding visual inputs. To evaluate the degree of hallucination in LVLMs, previous works have proposed a series of benchmarks featuring different types of tas… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.15339  [pdf, other

    cs.CV cs.AI cs.MM

    Image Conductor: Precision Control for Interactive Video Synthesis

    Authors: Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying Shan

    Abstract: Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/

  5. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  6. arXiv:2406.14194  [pdf, other

    cs.CV cs.AI

    VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

    Authors: Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

    Abstract: The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and nar… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.09913  [pdf, other

    cs.CV

    OpenECAD: An Efficient Visual Language Model for Computer-Aided Design

    Authors: Zhe Yuan, Jianqi Shi, Yanhong Huang

    Abstract: Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manuf… ▽ More

    Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.09838  [pdf, other

    cs.CV cs.AI

    Vision-Language Models Meet Meteorology: Develo** Models for Extreme Weather Events Detection with Heatmaps

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang

    Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  9. arXiv:2406.09399  [pdf, other

    cs.CV

    OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

    Authors: Junke Wang, Yi Jiang, Zehuan Yuan, Binyue Peng, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models. Based on the finding that existing tokenizers are tailored to image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. OmniTokenizer is designed with a spatial-temporal decoupled archite… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.08552  [pdf, other

    cs.CV

    DiTFastAttn: Attention Compression for Diffusion Transformer Models

    Authors: Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention's quadratic complexity. We propose DiTFastAttn, a novel post-training compression method to alleviate DiT's computational bottleneck. We identify three key redundancies in the attention computation during DiT inference: 1. spatial redundancy, where many attention heads focus on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.08426  [pdf, other

    cs.CL cs.AI cs.DB

    Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

    Authors: Zi** Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, Xiao Huang

    Abstract: Generating accurate SQL according to natural language questions (text-to-SQL) is a long-standing challenge due to the complexities involved in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have be… ▽ More

    Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.08285  [pdf, other

    cs.CV

    A New Class Biorthogonal Spline Wavelet for Image Edge Detection

    Authors: Dujuan Zhou, Zizhao Yuan

    Abstract: Spline wavelets have shown favorable characteristics for localizing in both time and frequency. In this paper, we propose a new biorthogonal cubic special spline wavelet (BCSSW), based on the Cohen-Daubechies-Feauveau wavelet construction method and the cubic special spline algorithm. BCSSW has better properties in compact support, symmetry, and frequency domain characteristics. However, current m… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.07937  [pdf, other

    cs.CV cs.RO

    IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes

    Authors: Fengtian Lang, Ruiye Ming, Zikang Yuan, Xin Yang

    Abstract: In this work, we propose a fast and robust Image Feature Triangle Descriptor (IFTD) based on the STD method, aimed at improving the efficiency and accuracy of place recognition in driving scenarios. We extract keypoints from BEV projection image of point cloud and construct these keypoints into triangle descriptors. By matching these feature triangles, we achieved precise place recognition and cal… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.07272  [pdf, other

    cs.IT

    Integrated Near Field Sensing and Communications Using Unitary Approximate Message Passing Based Matrix Factorization

    Authors: Zhengdao Yuan, Qinghua Guo, Yonina C. Eldar, Yonghui Li

    Abstract: Due to the utilization of large antenna arrays at base stations (BSs) and the operations of wireless communications in high frequency bands, mobile terminals often find themselves in the near-field of the array aperture. In this work, we address the signal processing challenges of integrated near-field localization and communication in uplink transmission of an integrated sensing and communication… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 13 pages, 10 figures. arXiv admin note: text overlap with arXiv:2208.00422

  15. arXiv:2406.06525  [pdf, other

    cs.CV

    Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

    Authors: Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, ** Luo, Zehuan Yuan

    Abstract: We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Codes and models: \url{https://github.com/FoundationVision/LlamaGen}

  16. arXiv:2406.05972  [pdf, other

    cs.AI cs.CY cs.HC cs.LG econ.TH

    Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

    Authors: **gru Jia, Zehua Yuan, Junhao Pan, Paul McNamara, Deming Chen

    Abstract: When making decisions under uncertainty, individuals often deviate from rational behavior, which can be evaluated across three dimensions: risk preference, probability weighting, and loss aversion. Given the widespread use of large language models (LLMs) in decision-making processes, it is crucial to assess whether their behavior aligns with human norms and ethical expectations or exhibits potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: **gru Jia and Zehua Yuan has equal contribution

  17. arXiv:2406.04647  [pdf, other

    cs.CV

    UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

    Authors: Yuchao Wang, Peirui Cheng, Pengju Tian, Ziyang Yuan, Liang** Zhao, **g Tian, Wensheng Wang, Zhirui Wang, Xian Sun

    Abstract: With the advancement of collaborative perception, the role of aerial-ground collaborative perception, a crucial component, is becoming increasingly important. The demand for collaborative perception across different perspectives to construct more comprehensive perceptual information is growing. However, challenges arise due to the disparities in the field of view between cross-domain agents and th… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2406.03805  [pdf, other

    cs.CR

    AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

    Authors: Lin Lu, Hai Yan, Zenghui Yuan, Jiawen Shi, Wenqi Wei, Pin-Yu Chen, Pan Zhou

    Abstract: Jailbreak attacks in large language models (LLMs) entail inducing the models to generate content that breaches ethical and legal norm through the use of malicious prompts, posing a substantial threat to LLM security. Current strategies for jailbreak attack and defense often focus on optimizing locally within specific algorithmic frameworks, resulting in ineffective optimization and limited scalabi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 32 pages, 2 figures

  19. arXiv:2406.01103  [pdf, other

    cs.AI cs.HC cs.LG

    Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

    Authors: Chen Zhang, Qiang He, Zhou Yuan, Elvis S. Liu, Hong Wang, Jian Zhao, Yang Wang

    Abstract: Deep Reinforcement Learning (DRL) agents have demonstrated impressive success in a wide range of game genres. However, existing research primarily focuses on optimizing DRL competence rather than addressing the challenge of prolonged player interaction. In this paper, we propose a practical DRL agent system for fighting games named Shūkai, which has been successfully deployed to Naruto Mobile, a p… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept at ICML 2024

  20. arXiv:2405.18734  [pdf, other

    cs.CV cs.RO

    PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram

    Authors: Sifan Zhou, Zhihang Yuan, Dawei Yang, Xubin Wen, Xing Hu, Yuguang Shi, Ziyu Zhao, Xiaobo Lu

    Abstract: Real-time and high-performance 3D object detection plays a critical role in autonomous driving and robotics. Recent pillar-based 3D object detectors have gained significant attention due to their compact representation and low computational overhead, making them suitable for onboard deployment and quantization. However, existing pillar-based detectors still suffer from information loss along heigh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 17 pages, 3 figures

  21. arXiv:2405.17849  [pdf, other

    cs.LG cs.AI cs.CL

    I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models

    Authors: Xing Hu, Yuan Cheng, Dawei Yang, Zhihang Yuan, Jiangyong Yu, Chen Xu, Sifan Zhou

    Abstract: Post-training quantization (PTQ) serves as a potent technique to accelerate the inference of large language models (LLMs). Nonetheless, existing works still necessitate a considerable number of floating-point (FP) operations during inference, including additional quantization and de-quantization, as well as non-linear operators such as RMSNorm and Softmax. This limitation hinders the deployment of… ▽ More

    Submitted 5 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  22. arXiv:2405.17403  [pdf, other

    cs.LG cs.AI

    A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

    Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

    Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    ACM Class: I.2

  23. arXiv:2405.15239  [pdf, other

    cs.CV

    Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D Generation

    Authors: Li Zhang, Yuankun Yang, Ziyang Xie, Zhiyuan Yuan, Jianfeng Feng, Xiatian Zhu, Yu-Gang Jiang

    Abstract: Understanding the hidden mechanisms behind human's visual perception is a fundamental quest in neuroscience, underpins a wide variety of critical applications, e.g. clinical diagnosis. To that end, investigating into the neural responses of human mind activities, such as functional Magnetic Resonance Imaging (fMRI), has been a significant research vehicle. However, analyzing fMRI signals is challe… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 25 pages, 16 figures, project page: https://brain-3d.github.io/

  24. arXiv:2405.14279  [pdf, other

    cs.GT econ.TH

    Optimized Cost Per Click in Online Advertising: A Theoretical Analysis

    Authors: Kaichen Zhang, Zixuan Yuan, Hui Xiong

    Abstract: In recent years, Optimized Cost Per Click (OCPC) and Optimized Cost Per Mille (OCPM) have emerged as the most widely adopted pricing models in the online advertising industry. However, the existing literature has yet to identify the specific conditions under which these models outperform traditional pricing models like Cost Per Click (CPC) and Cost Per Action (CPA). To fill the gap, this paper bui… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGKDD2024 Research Track

  25. arXiv:2405.10102  [pdf, other

    cs.NE cs.AI cs.LG eess.AS

    A novel Reservoir Architecture for Periodic Time Series Prediction

    Authors: Zhongju Yuan, Geraint Wiggins, Dick Botteldooren

    Abstract: This paper introduces a novel approach to predicting periodic time series using reservoir computing. The model is tailored to deliver precise forecasts of rhythms, a crucial aspect for tasks such as generating musical rhythm. Leveraging reservoir computing, our proposed method is ultimately oriented towards predicting human perception of rhythm. Our network accurately predicts rhythmic signals wit… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  26. arXiv:2405.06590  [pdf, other

    physics.ao-ph cs.LG

    Decomposing weather forecasting into advection and convection with neural networks

    Authors: Mengxuan Chen, Ziqi Yuan, **xiao Zhang, Runmin Dong, Haohuan Fu

    Abstract: Operational weather forecasting models have advanced for decades on both the explicit numerical solvers and the empirical physical parameterization schemes. However, the involved high computational costs and uncertainties in these existing schemes are requiring potential improvements through alternative machine learning methods. Previous works use a unified model to learn the dynamics and physics… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  27. arXiv:2405.06219  [pdf, other

    cs.LG cs.CL

    SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

    Authors: Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, Dahua Lin

    Abstract: Large language models (LLMs) can now handle longer sequences of tokens, enabling complex tasks like book understanding and generating lengthy novels. However, the key-value (KV) cache required for LLMs consumes substantial memory as context length increasing, becoming the bottleneck for deployment. In this paper, we present a strategy called SKVQ, which stands for sliding-window KV cache quantizat… ▽ More

    Submitted 13 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  28. arXiv:2405.04285  [pdf, other

    cs.AI eess.SP

    On the Foundations of Earth and Climate Foundation Models

    Authors: Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

    Abstract: Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an en… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  29. arXiv:2404.19748  [pdf, other

    cs.CV cs.AI

    Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning

    Authors: Zhipeng Yuan, Nasamu Musa, Katarzyna Dybal, Matthew Back, Daniel Leybourne, Po Yang

    Abstract: Every year, plant parasitic nematodes, one of the major groups of plant pathogens, cause a significant loss of crops worldwide. To mitigate crop yield losses caused by nematodes, an efficient nematode monitoring method is essential for plant and crop disease management. In other respects, efficient nematode detection contributes to medical research and drug discovery, as nematodes are model organi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: The 26th IEEE International Conference on Computational Science and Engineering (CSE-2023)

  30. arXiv:2404.17845  [pdf, other

    cs.CV

    Instance-free Text to Point Cloud Localization with Relative Position Awareness

    Authors: Lichao Wang, Zhihao Yuan, **ke Ren, Shuguang Cui, Zhen Li

    Abstract: Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration. It seeks to localize a position from a city-scale point cloud scene based on a few natural language instructions. In this paper, we address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relati… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, conference

  31. arXiv:2404.14294  [pdf, other

    cs.CL cs.AI

    A Survey on Efficient Inference for Large Language Models

    Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-** Zhang, Yuhan Dong, Yu Wang

    Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards develo** techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More

    Submitted 8 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  32. arXiv:2404.13013  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

    Authors: Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi

    Abstract: We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability. Beyond holistic image understanding, Groma is adept at region-level tasks such as region captioning and visual grounding. Such capabilities are built upon a localized visual tokenization mechanism, where an image input is decomposed into regions of interest and subsequently encode… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  33. arXiv:2404.12489  [pdf, other

    cs.CL

    Grammatical Error Correction for Code-Switched Sentences by Learners of English

    Authors: Kelvin Wey Han Chan, Christopher Bryant, Li Nguyen, Andrew Caines, Zheng Yuan

    Abstract: Code-switching (CSW) is a common phenomenon among multilingual speakers where multiple languages are used in a single discourse or utterance. Mixed language utterances may still contain grammatical errors however, yet most existing Grammar Error Correction (GEC) systems have been trained on monolingual data and not developed with CSW in mind. In this work, we conduct the first exploration into the… ▽ More

    Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  34. arXiv:2404.12041  [pdf, other

    cs.CL cs.AI

    Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation

    Authors: Siya Qi, Yulan He, Zheng Yuan

    Abstract: Hallucination in Natural Language Generation (NLG) is like the elephant in the room, obvious but often overlooked until recent achievements significantly improved the fluency and grammaticality of generated text. As the capabilities of text generation models have improved, researchers have begun to pay more attention to the phenomenon of hallucination. Despite significant progress in this field in… ▽ More

    Submitted 15 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 16 pages, 2 figures

  35. arXiv:2404.10440  [pdf, other

    cs.CL eess.AS

    Language Proficiency and F0 Entrainment: A Study of L2 English Imitation in Italian, French, and Slovak Speakers

    Authors: Zheng Yuan, Štefan Beňuš, Alessandro D'Ausilio

    Abstract: This study explores F0 entrainment in second language (L2) English speech imitation during an Alternating Reading Task (ART). Participants with Italian, French, and Slovak native languages imitated English utterances, and their F0 entrainment was quantified using the Dynamic Time War** (DTW) distance between the parameterized F0 contours of the imitated utterances and those of the model utteranc… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted at Speech Prosody 2024

  36. arXiv:2404.10324  [pdf

    cs.LG cs.CE eess.SY

    Graph neural network-based surrogate modelling for real-time hydraulic prediction of urban drainage networks

    Authors: Zhiyu Zhang, Chenkaixiang Lu, Wenchong Tian, Zhenliang Liao, Zhiguo Yuan

    Abstract: Physics-based models are computationally time-consuming and infeasible for real-time scenarios of urban drainage networks, and a surrogate model is needed to accelerate the online predictive modelling. Fully-connected neural networks (NNs) are potential surrogate models, but may suffer from low interpretability and efficiency in fitting complex targets. Owing to the state-of-the-art modelling powe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  37. arXiv:2404.09201  [pdf, other

    cs.IT

    Joint Near Field Uplink Communication and Localization Using Message Passing-Based Sparse Bayesian Learning

    Authors: Fei Liu, Zhengdao Yuan, Qinghua Guo, Yuanyuan Zhang, Zhongyong Wang, J. Andrew Zhang

    Abstract: This work deals with the problem of uplink communication and localization in an integrated sensing and communication system, where users are in the near field (NF) of antenna aperture due to the use of high carrier frequency and large antenna arrays at base stations. We formulate joint NF signal detection and localization as a problem of recovering signals with a sparse pattern. To solve the probl… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  38. arXiv:2404.07495  [pdf, other

    cs.CV

    PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds

    Authors: Weisheng Xu, Sifan Zhou, Zhihang Yuan

    Abstract: LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. It aims to obtain accurate 3D BBox from the search area based on similarity or motion. However, existing 3D SOT methods usually follow the point-based pipeline, where the sampling operation inevitably leads to redundant or lost information, resulting in unexpected performance. To address these is… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  39. arXiv:2404.05363  [pdf, other

    cs.LG

    A parameter-free clustering algorithm for missing datasets

    Authors: Qi Li, Xianjun Zeng, Shuliang Wang, Wenhao Zhu, Shijie Ruan, Zhimeng Yuan

    Abstract: Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world. Existing clustering algorithms for missing datasets first impute the missing values and then perform clustering. However, both the imputation and clustering processes require input parameters. Too many input parameters inevitably increase the difficulty of obtaining accurate clusteri… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  40. arXiv:2404.03818  [pdf, other

    cs.CL

    PRobELM: Plausibility Ranking Evaluation for Language Models

    Authors: Zhangdie Yuan, Chenxi Whitehouse, Eric Chamoun, Rami Aly, Andreas Vlachos

    Abstract: This paper introduces PRobELM (Plausibility Ranking Evaluation for Language Models), a benchmark designed to assess language models' ability to discern more plausible from less plausible scenarios through their parametric knowledge. While benchmarks such as TruthfulQA emphasise factual accuracy or truthfulness, and others such as COPA explore plausible scenarios without explicitly incorporating wo… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  41. arXiv:2404.03144  [pdf, other

    cs.CV

    Diverse and Tailored Image Generation for Zero-shot Multi-label Classification

    Authors: Kaixin Zhang, Zhixiang Yuan, Tao Huang

    Abstract: Recently, zero-shot multi-label classification has garnered considerable attention for its capacity to operate predictions on unseen labels without human annotations. Nevertheless, prevailing approaches often use seen classes as imperfect proxies for unseen ones, resulting in suboptimal performance. Drawing inspiration from the success of text-to-image generation models in producing realistic imag… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  42. arXiv:2404.02905  [pdf, other

    cs.CV cs.AI

    Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

    Authors: Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

    Abstract: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". This simple, intuitive methodology allows autoregressive (AR) transformers to learn visual distributions fast and generalize well: V… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Demo website: https://var.vision/

  43. arXiv:2404.02710  [pdf, other

    cs.CL eess.AS

    ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation

    Authors: Zheng Yuan, Dorina de Jong, Štefan Beňuš, Noël Nguyen, Ruitao Feng, Róbert Sabo, Luciano Fadiga, Alessandro D`Ausilio

    Abstract: We introduce the Alternating Reading Task (ART) Corpus, a collection of dyadic sentence reading for studying the entrainment and imitation behaviour in speech communication. The ART corpus features three experimental conditions - solo reading, alternating reading, and deliberate imitation - as well as three sub-corpora encompassing French-, Italian-, and Slovak-accented English. This design allows… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages, 2 figures, 7 tables, accepted at LREC-COLING 2024 conference

  44. arXiv:2403.18866  [pdf, other

    cs.SI cs.LG

    Graph Bayesian Optimization for Multiplex Influence Maximization

    Authors: Zirui Yuan, Minglai Shao, Zhiqian Chen

    Abstract: Influence maximization (IM) is the problem of identifying a limited number of initial influential users within a social network to maximize the number of influenced users. However, previous research has mostly focused on individual information propagation, neglecting the simultaneous and interactive dissemination of multiple information items. In reality, when users encounter a piece of informatio… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 2024

  45. arXiv:2403.18469  [pdf, other

    cs.CV cs.AI

    Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds

    Authors: Zhimin Yuan, Wankang Zeng, Yanfei Su, Weiquan Liu, Ming Cheng, Yulan Guo, Cheng Wang

    Abstract: 3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between doma… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  46. arXiv:2403.17710  [pdf, other

    cs.CR cs.AI

    Optimization-based Prompt Injection Attack to LLM-as-a-Judge

    Authors: Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

    Abstract: LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs). Based on existing research studies, LLMs demonstrate remarkable performance in providing a compelling alternative to traditional human assessment. However, the robustness of these systems against prompt injection attacks remains an open question. In this work, we introduce JudgeDeceiver, a nov… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  47. arXiv:2403.17367  [pdf, other

    cs.RO

    RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment

    Authors: Guo** Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe Xu

    Abstract: Combining the mobility of legged robots with the manipulation skills of arms has the potential to significantly expand the operational range and enhance the capabilities of robotic systems in performing various mobile manipulation tasks. Existing approaches are confined to imprecise six degrees of freedom (DoF) manipulation and possess a limited arm workspace. In this paper, we propose a novel fra… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  48. arXiv:2403.17307  [pdf, other

    cs.CL cs.IT

    HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification

    Authors: He Zhu, Junran Wu, Ruomei Liu, Yue Hou, Ze Yuan, Shangzhe Li, Yicheng Pan, Ke Xu

    Abstract: Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information. In this paper, we tend to investigate the feasibility of a contrastiv… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024

  49. arXiv:2403.13583  [pdf, other

    cs.SE cs.CL cs.LG

    CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing

    Authors: Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian Yuan, Dongmei Zhang

    Abstract: Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which e… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  50. arXiv:2403.13248  [pdf, other

    cs.CV

    Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

    Authors: Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun

    Abstract: Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority bein… ▽ More

    Submitted 22 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.