Skip to main content

Showing 1–50 of 400 results for author: Shi, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00499  [pdf, other

    cs.CL cs.AI cs.LG

    ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

    Authors: Zhiyuan Wang, **hao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures, 6 tables

  2. arXiv:2407.00389  [pdf, other

    cs.CV

    Query-Efficient Hard-Label Black-Box Attack against Vision Transformers

    Authors: Chao Zhou, Xiaowen Shi, Yuan-Gen Wang

    Abstract: Recent studies have revealed that vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs). However, directly applying attack methodology on CNNs to ViTs has been demonstrated to be ineffective since the ViTs typically work on patch-wise encoding. This article explores the vulnerability of ViTs against adversarial attacks under a… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2407.00371  [pdf, other

    cs.LG cs.AI

    Axiomatization of Gradient Smoothing in Neural Networks

    Authors: Linjiang Zhou, Xiaochuan Shi, Chao Ma, Zepeng Wang

    Abstract: Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we propos… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2406.19651  [pdf, other

    cs.DB cs.AI

    CANDY: A Benchmark for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion

    Authors: Xianzhi Zeng, Zhuoyan Wu, Xin**g Hu, Xuanhua Shi, Shixuan Sun, Shuhao Zhang

    Abstract: Approximate K Nearest Neighbor (AKNN) algorithms play a pivotal role in various AI applications, including information retrieval, computer vision, and natural language processing. Although numerous AKNN algorithms and benchmarks have been developed recently to evaluate their effectiveness, the dynamic nature of real-world data presents significant challenges that existing benchmarks fail to addres… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.19485  [pdf, other

    eess.IV cs.CV

    GAPNet: Granularity Attention Network with Anatomy-Prior-Constraint for Carotid Artery Segmentation

    Authors: Lin Zhang, Chenggang Lu, Xin-yang Shi, Caifeng Shan, Jiong Zhang, Da Chen, Laurent D. Cohen

    Abstract: Atherosclerosis is a chronic, progressive disease that primarily affects the arterial walls. It is one of the major causes of cardiovascular disease. Magnetic Resonance (MR) black-blood vessel wall imaging (BB-VWI) offers crucial insights into vascular disease diagnosis by clearly visualizing vascular structures. However, the complex anatomy of the neck poses challenges in distinguishing the carot… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.12182  [pdf, other

    cs.CL cs.AI

    Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models

    Authors: Lulu Zhao, Weihao Zeng, Xiaofeng Shi, Hua Zhou, Donglin Hao, Yonghua Lin

    Abstract: Recently, both closed-source LLMs and open-source communities have made significant strides, outperforming humans in various general domains. However, their performance in specific professional fields such as medicine, especially within the open-source community, remains suboptimal due to the complexity of medical knowledge. We propose Aquila-Med, a bilingual medical LLM based on Aquila, addressin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.12009  [pdf, other

    cs.CL

    FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure

    Authors: Ziyue Xu, Peilin Zhou, Xinyu Shi, Jiageng Wu, Yikang Jiang, Bin Ke, Jie Yang

    Abstract: Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.08644  [pdf, other

    eess.SP cs.AI cs.SD eess.AS

    Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

    Authors: Jihwan Lee, Aditya Kommineni, Tiantian Feng, Kleanthis Avramidis, Xuan Shi, Sudarsana Kadiri, Shrikanth Narayanan

    Abstract: Speech decoding from EEG signals is a challenging task, where brain activity is modeled to estimate salient characteristics of acoustic stimuli. We propose FESDE, a novel framework for Fully-End-to-end Speech Decoding from EEG signals. Our approach aims to directly reconstruct listened speech waveforms given EEG signals, where no intermediate acoustic feature processing step is required. The propo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: accepted to Interspeech2024

  9. arXiv:2406.03718  [pdf, other

    cs.CR cs.AI cs.CL

    Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning

    Authors: Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, Hai **

    Abstract: Code Pre-trained Models (CodePTMs) based vulnerability detection have achieved promising results over recent years. However, these models struggle to generalize as they typically learn superficial map** from source code to labels instead of understanding the root causes of code vulnerabilities, resulting in poor performance in real-world scenarios beyond the training instances. To tackle this ch… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  10. arXiv:2405.19740  [pdf, other

    cs.CL cs.AI cs.CY

    PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

    Authors: Jiatong Li, Renjun Hu, Kunzhe Huang, Yan Zhuang, Qi Liu, Mengxiao Zhu, Xing Shi, Wei Lin

    Abstract: Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through k… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 12 figures, 10 tables

  11. arXiv:2405.18991  [pdf, other

    cs.CV cs.CL cs.MM

    EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

    Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

    Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6 pages, 5 figures

  12. arXiv:2405.17051  [pdf, other

    cs.LG cs.AI

    BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics

    Authors: Hao Wu, Xingjian Shi, Ziyue Huang, Penghao Zhao, Wei Xiong, **bao Xue, Yangyu Tao, Xiaomeng Huang, Weiyan Wang

    Abstract: Data-driven deep learning has emerged as the new paradigm to model complex physical space-time systems. These data-driven methods learn patterns by optimizing statistical metrics and tend to overlook the adherence to physical laws, unlike traditional model-driven numerical methods. Thus, they often generate predictions that are not physically realistic. On the other hand, by sampling a large amoun… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  13. arXiv:2405.14616  [pdf, other

    cs.LG cs.AI

    TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

    Authors: Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, Jun Zhou

    Abstract: Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, wh… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  14. arXiv:2405.13548  [pdf, other

    cs.SE cs.CL

    ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing

    Authors: Wei Zhang, Xianfu Cheng, Yi Zhang, Jian Yang, Hongcheng Guo, Zhoujun Li, Xiaolin Yin, Xiangyuan Guan, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: Log parsing, a vital task for interpreting the vast and complex data produced within software architectures faces significant challenges in the transition from academic benchmarks to the industrial domain. Existing log parsers, while highly effective on standardized public datasets, struggle to maintain performance and efficiency when confronted with the sheer scale and diversity of real-world ind… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  15. arXiv:2405.10630  [pdf, other

    cs.CL cs.AI

    Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

    Authors: Xiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting Zhang

    Abstract: This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dial… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  16. arXiv:2405.07582  [pdf, other

    cs.CV

    FRRffusion: Unveiling Authenticity with Diffusion-Based Face Retouching Reversal

    Authors: Fengchuang Xing, Xiaowen Shi, Yuan-Gen Wang, Chunsheng Yang

    Abstract: Unveiling the real appearance of retouched faces to prevent malicious users from deceptive advertising and economic fraud has been an increasing concern in the era of digital economics. This article makes the first attempt to investigate the face retouching reversal (FRR) problem. We first collect an FRR dataset, named deepFRR, which contains 50,000 StyleGAN-generated high-resolution (1024*1024) f… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  17. arXiv:2405.07176  [pdf, other

    cs.IT eess.SP

    Capacity Maximization for Base Station with Hybrid Fixed and Movable Antennas

    Authors: Xiaoming Shi, Xiaodan Shao, Rui Zhang

    Abstract: Six-dimensional movable antenna (6DMA) is an effective solution for enhancing wireless network capacity through the adjustment of both 3D positions and 3D rotations of distributed antennas/antenna surfaces. Although freely positioning/rotating 6DMA surfaces offers the greatest flexibility and thus highest capacity improvement, its implementation may be challenging in practice due to the drastic ar… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  18. arXiv:2404.17983  [pdf, other

    cs.SD cs.CL eess.AS

    TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality

    Authors: Tiantian Feng, Xuan Shi, Rahul Gupta, Shrikanth S. Narayanan

    Abstract: Automatic Speech Understanding (ASU) aims at human-like speech interpretation, providing nuanced intent, emotion, sentiment, and content understanding from speech and language (text) content conveyed in speech. Typically, training a robust ASU model relies heavily on acquiring large-scale, high-quality speech and associated transcriptions. However, it is often challenging to collect or use speech… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  19. arXiv:2404.15772  [pdf, other

    cs.LG

    Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

    Authors: Aobo Liang, Xingguo Jiang, Yan Sun, Xiaohou Shi, Ke Li

    Abstract: Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. Over the past few years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, LTSF faces inherent challenges such as long-term dependencies capturing and sparse semantic characteristics. Recently, a new state space model (SSM) named Mamba is proposed… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: New Mamba-based architecture. All experiments rerun

  20. arXiv:2404.12135  [pdf, other

    cs.MA cs.CR cs.DC

    mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

    Authors: Wei Zhang, Hongcheng Guo, Jian Yang, Yi Zhang, Chaoran Yan, Zhou** Tian, Hangyuan Ji, Zhoujun Li, Tongliang Li, Tieqiao Zheng, Chao Chen, Yi Liang, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  21. A Phone-based Distributed Ambient Temperature Measurement System with An Efficient Label-free Automated Training Strategy

    Authors: Dayin Chen, Xiaodan Shi, Haoran Zhang, Xuan Song, Dongxiao Zhang, Yuntian Chen, **yue Yan

    Abstract: Enhancing the energy efficiency of buildings significantly relies on monitoring indoor ambient temperature. The potential limitations of conventional temperature measurement techniques, together with the omnipresence of smartphones, have redirected researchers'attention towards the exploration of phone-based ambient temperature estimation methods. However, existing phone-based methods face challen… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Journal ref: IEEE Transactions on Mobile Computing,13 May 2024, 1 - 13

  22. arXiv:2404.07032  [pdf, other

    cs.CV

    An Evidential-enhanced Tri-Branch Consistency Learning Method for Semi-supervised Medical Image Segmentation

    Authors: Zhenxi Zhang, Heng Zhou, Xiaoran Shi, Ran Ran, Chunna Tian, Feng Zhou

    Abstract: Semi-supervised segmentation presents a promising approach for large-scale medical image analysis, effectively reducing annotation burdens while achieving comparable performance. This methodology holds substantial potential for streamlining the segmentation process and enhancing its feasibility within clinical settings for translational investigations. While cross-supervised training, based on dis… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  23. arXiv:2404.01260  [pdf, other

    cs.CV cs.AI cs.LG

    Bridging Remote Sensors with Multisensor Geospatial Foundation Models

    Authors: Boran Han, Shuai Zhang, Xingjian Shi, Markus Reichstein

    Abstract: In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images.… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR

  24. arXiv:2403.16792  [pdf, other

    cs.CL cs.SE

    Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

    Authors: Zhangqian Bi, Yao Wan, Zheng Wang, Hongyu Zhang, Batu Guan, Fangxin Lu, Zili Zhang, Yulei Sui, Hai **, Xuanhua Shi

    Abstract: Large Language Models (LLMs) have shown remarkable progress in automated code generation. Yet, LLM-generated code may contain errors in API usage, class, data structure, or missing project-specific information. As much of this project-specific context cannot fit into the prompts of LLMs, we must find ways to allow the model to explore the project-level code context. We present CoCoGen, a new code… ▽ More

    Submitted 10 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  25. arXiv:2403.15679  [pdf, other

    cs.CV cs.MM

    DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

    Authors: Hao Yan, Zhihui Ke, Xiaobo Zhou, Tie Qiu, Xidong Shi, Dadong Jiang

    Abstract: Implicit neural representations for video (NeRV) have recently become a novel way for high-quality video representation. However, existing works employ a single network to represent the entire video, which implicitly confuse static and dynamic information. This leads to an inability to effectively compress the redundant static information and lack the explicitly modeling of global temporal-coheren… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: CVPR 2024. Project page at https://haoyan14.github.io/DS-NeRV

  26. arXiv:2403.14302  [pdf, other

    cs.NE cs.CV cs.LG

    SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

    Authors: Xinyu Shi, Zecheng Hao, Zhaofei Yu

    Abstract: The remarkable success of Vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures propo… ▽ More

    Submitted 28 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: To be published in the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  27. arXiv:2403.13745  [pdf, other

    cs.CV

    Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

    Authors: Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-spec… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Code will be available at https://github.com/G-U-N/Be-Your-Outpainter

  28. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/

  29. arXiv:2403.12506  [pdf, ps, other

    cs.IT eess.SP

    Sparse Estimation for XL-MIMO with Unified LoS/NLoS Representation

    Authors: Xu Shi, Xuehan Wang, **gbo Tan, **tao Wang

    Abstract: Extremely large-scale antenna array (ELAA) is promising as one of the key ingredients for the sixth generation (6G) of wireless communications. The electromagnetic propagation of spherical wavefronts introduces an additional distance-dependent dimension beyond conventional beamspace. In this paper, we first present one concise closed-form channel formulation for extremely large-scale multiple-inpu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: ICC 2024

  30. arXiv:2403.12381  [pdf, other

    cs.CE

    Explainable AutoML (xAutoML) with adaptive modeling for yield enhancement in semiconductor smart manufacturing

    Authors: Weihong Zhai, Xiupeng Shi, Yiik Diew Wong, Qing Han, Lisheng Chen

    Abstract: Enhancing yield is recognized as a paramount driver to reducing production costs in semiconductor smart manufacturing. However, optimizing and ensuring high yield rates is a highly complex and technical challenge, especially while maintaining reliable yield diagnosis and prognosis, and this shall require understanding all the confounding factors in a complex condition. This study proposes a domain… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  31. arXiv:2403.08420  [pdf, other

    cs.CV

    Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models

    Authors: Wensheng Liang, Ruiyan Zhuang, Xianwei Shi, Shuai Li, Zhicheng Wang, Xiaoguang Ma

    Abstract: Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightwe… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  32. arXiv:2403.05606  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data

    Authors: Yifan Wu, Yang Liu, Yue Yang, Michael S. Yao, Wenli Yang, Xuehui Shi, Lihong Yang, Dongjun Li, Yueming Liu, James C. Gee, Xuan Yang, Wenbin Wei, Shi Gu

    Abstract: Diagnosing rare diseases presents a common challenge in clinical practice, necessitating the expertise of specialists for accurate identification. The advent of machine learning offers a promising solution, while the development of such technologies is hindered by the scarcity of data on rare conditions and the demand for models that are both interpretable and trustworthy in a clinical context. In… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  33. arXiv:2403.02202  [pdf, other

    cs.HC

    Exploring Interactive Color Palettes for Abstraction-Driven Exploratory Image Colorization

    Authors: Xinyu Shi, Mingyu Liu, Ziqi Zhou, Ali Neshati, Ryan Rossi, Jian Zhao

    Abstract: Color design is essential in areas such as product, graphic, and fashion design. However, current tools like Photoshop, with their concrete-driven color manipulation approach, often stumble during early ideation, favoring polished end results over initial exploration. We introduced Mondrian as a test-bed for abstraction-driven approach using interactive color palettes for image colorization. Throu… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by CHI 2024

  34. arXiv:2403.02199  [pdf, other

    cs.HC

    Piet: Facilitating Color Authoring for Motion Graphics Video

    Authors: Xinyu Shi, Yinghou Wang, Yun Wang, Jian Zhao

    Abstract: Motion graphic (MG) videos are effective and compelling for presenting complex concepts through animated visuals; and colors are important to convey desired emotions, maintain visual continuity, and signal narrative transitions. However, current video color authoring workflows are fragmented, lacking contextual previews, hindering rapid theme adjustments, and not aligning with progressive authorin… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by CHI 2024

  35. arXiv:2402.18158  [pdf, other

    cs.CL cs.AI

    Evaluating Quantized Large Language Models

    Authors: Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the requirements of both high efficiency and performance across diverse scenarios, a comprehensive evaluation of quantized LLMs is essential to guide the selection o… ▽ More

    Submitted 6 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  36. arXiv:2402.14259  [pdf, other

    cs.CL cs.AI cs.LG

    Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

    Authors: Zhiyuan Wang, **hao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Huaxiu Yao, Yue Zhang, Ren Wang, Kaidi Xu, Xiaoshuang Shi

    Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the pri… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 18 pages

  37. arXiv:2402.13516  [pdf, other

    cs.LG cs.AI cs.CL

    ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

    Authors: Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li, Chen Chen, Zhiyuan Liu, Guangli Li, Tao Yang, Maosong Sun

    Abstract: Activation sparsity refers to the existence of considerable weakly-contributed elements among activation outputs. As a prevalent property of the models using the ReLU activation function, activation sparsity has been proven a promising paradigm to boost model inference efficiency. Nevertheless, most large language models (LLMs) adopt activation functions without intrinsic activation sparsity (e.g.… ▽ More

    Submitted 27 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 19 pages, 4 figures, 9 tables

    ACM Class: I.2.7

  38. arXiv:2402.11453  [pdf, other

    cs.CL

    MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

    Authors: Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

    Abstract: Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Work in Progress

  39. arXiv:2402.11241  [pdf, other

    cs.CV cs.AI

    DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model

    Authors: Yu Feng, Xing Shi, Mengli Cheng, Yun Xiong

    Abstract: As the task of 2D-to-3D reconstruction has gained significant attention in various real-world scenarios, it becomes crucial to be able to generate high-quality point clouds. Despite the recent success of deep learning models in generating point clouds, there are still challenges in producing high-fidelity results due to the disparities between images and point clouds. While vision transformers (Vi… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  40. arXiv:2402.11139  [pdf, other

    cs.LG cs.AI

    LiGNN: Graph Neural Networks at LinkedIn

    Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, ** Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

    Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on develo** and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  41. arXiv:2402.10738  [pdf, other

    cs.CL

    Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning

    Authors: Yinpeng Liu, Jiawei Liu, Xiang Shi, Qikai Cheng, Yong Huang, Wei Lu

    Abstract: Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most of the current approaches of ordering require high computational costs to introduce the priori knowledge. In this paper, inspired by the human learning process, we propose a simple but effective demonstration ordering method… ▽ More

    Submitted 16 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  42. arXiv:2402.03009  [pdf, other

    cs.CL cs.AI

    UniMem: Towards a Unified View of Long-Context Large Language Models

    Authors: Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce U… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  43. arXiv:2402.02334  [pdf, other

    cs.LG cs.AI

    Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

    Authors: Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin

    Abstract: Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetic… ▽ More

    Submitted 19 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: 11 pages, 8 figures, to be published to AAAI2024

    ACM Class: I.2.4

  44. arXiv:2402.00769  [pdf, other

    cs.CV cs.LG

    AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

    Authors: Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Video diffusion models has been gaining increasing attention for its ability to produce videos that are both coherent and of high fidelity. However, the iterative denoising process makes it computationally intensive and time-consuming, thus limiting its applications. Inspired by the Consistency Model (CM) that distills pretrained image diffusion models to accelerate the sampling with minimal steps… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Project Page: https://animatelcm.github.io/

  45. arXiv:2402.00411  [pdf, other

    cs.NE cs.AI cs.CV

    LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model

    Authors: Zecheng Hao, Xinyu Shi, Zhiyu Pan, Yujia Liu, Zhaofei Yu, Tiejun Huang

    Abstract: Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more biological-inspired and energy-efficient manner. However, despite previous efforts to optimize the learning gradients and model structure of SNNs through various methods, SNNs still lag behind ANNs in terms of pe… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 15 pages, 2 figures

  46. arXiv:2401.15977  [pdf, other

    cs.CV

    Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

    Authors: Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

    Abstract: We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video map**, Motion-I2V factorizes I2V into two stages with explicit motion modeling. For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the ref… ▽ More

    Submitted 31 January, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Project page: https://xiaoyushi97.github.io/Motion-I2V/

  47. arXiv:2401.15649  [pdf, other

    cs.CV

    CPDM: Content-Preserving Diffusion Model for Underwater Image Enhancement

    Authors: Xiaowen Shi, Yuan-Gen Wang

    Abstract: Underwater image enhancement (UIE) is challenging since image degradation in aquatic environments is complicated and changing over time. Existing mainstream methods rely on either physical-model or data-driven, suffering from performance bottlenecks due to changes in imaging conditions or training instability. In this article, we make the first attempt to adapt the diffusion model to the UIE task… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  48. arXiv:2401.13260  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction

    Authors: Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda

    Abstract: The prevalent approach in speech emotion recognition (SER) involves integrating both audio and textual information to comprehensively identify the speaker's emotion, with the text generally obtained through automatic speech recognition (ASR). An essential issue of this approach is that ASR errors from the text modality can worsen the performance of SER. Previous studies have proposed using an auxi… ▽ More

    Submitted 28 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  49. arXiv:2401.07655  [pdf, other

    cs.SE cs.AI cs.LG

    MLAD: A Unified Model for Multi-system Log Anomaly Detection

    Authors: Runqiang Zang, Hongcheng Guo, Jian Yang, Jiaheng Liu, Zhoujun Li, Tieqiao Zheng, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: In spite of the rapid advancements in unsupervised log anomaly detection techniques, the current mainstream models still necessitate specific training for individual system datasets, resulting in costly procedures and limited scalability due to dataset size, thereby leading to performance bottlenecks. Furthermore, numerous models lack cognitive reasoning capabilities, posing challenges in direct t… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  50. arXiv:2401.07339  [pdf, other

    cs.SE

    CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges

    Authors: Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi **

    Abstract: Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code repositories (named repo) with complex dependencies and extensive documentation. To fill this gap, our research pivots towards evaluating LLMs in a more realistic settin… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.