Skip to main content

Showing 1–50 of 279 results for author: Gu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01004  [pdf, other

    cs.LG stat.ME

    CURLS: Causal Rule Learning for Subgroups with Significant Treatment Effect

    Authors: Jiehui Zhou, Linxiao Yang, Xingyu Liu, Xinyue Gu, Liang Sun, Wei Chen

    Abstract: In causal inference, estimating heterogeneous treatment effects (HTE) is critical for identifying how different subgroups respond to interventions, with broad applications in fields such as precision medicine and personalized advertising. Although HTE estimation methods aim to improve accuracy, how to provide explicit subgroup descriptions remains unclear, hindering data interpretation and strateg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  2. arXiv:2406.17679  [pdf, other

    cs.CV

    Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

    Authors: Xuming Zhang, Naoto Yokoya, Xingfa Gu, Qingjiu Tian, Lorenzo Bruzzone

    Abstract: Hyperspectral image (HSI) classification has recently reached its performance bottleneck. Multimodal data fusion is emerging as a promising approach to overcome this bottleneck by providing rich complementary information from the supplementary modality (X-modality). However, achieving comprehensive cross-modal interaction and fusion that can be generalized across different sensing modalities is ch… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.16864  [pdf, other

    cs.CV cs.AI cs.GR

    StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

    Authors: Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han

    Abstract: This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the e… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: HF Demo: hf.co/Stable-X, Video: https://www.youtube.com/watch?v=sylXTxG_U2U

  4. arXiv:2406.16714  [pdf, other

    cs.CL cs.AI cs.LG

    AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

    Authors: Jiale Cheng, Yida Lu, Xiaotao Gu, Pei Ke, Xiao Liu, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

    Abstract: Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences in practical deployments, it is crucial to investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, **g Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been develo** over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.06451  [pdf, other

    cs.HC cs.AI cs.CY

    Insights from Social Sha** Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course

    Authors: Aadarsh Padiyath, Xinying Hou, Amy Pang, Diego Viramontes Vargas, Xingjian Gu, Tamara Nelson-Fromm, Zihan Wu, Mark Guzdial, Barbara Ericson

    Abstract: The capability of large language models (LLMs) to generate, debug, and explain code has sparked the interest of researchers and educators in undergraduate programming, with many anticipating their transformative potential in programming education. However, decisions about why and how to use LLMs in programming education may involve more than just the assessment of an LLM's technical capabilities.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to the ACM Conference on International Computing Education Research V.1 (ICER '24 Vol. 1)

  7. arXiv:2406.05704  [pdf, other

    cs.CV

    Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

    Authors: Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia

    Abstract: Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, negle… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  8. arXiv:2406.04938  [pdf, other

    cs.LG cs.AI

    SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training

    Authors: Xizhi Gu, Hongzheng Li, Shihong Gao, Xinyan Zhang, Lei Chen, Yingxia Shao

    Abstract: Graph Neural Networks (GNNs) have superior capability in learning graph data. Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage and encounters the Out-of-Memory problem when handling large graphs. To address this memory problem, a popular solution is mini-batch GNN training. However, mini-batch GNN training increases the training variance and sac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.04115  [pdf, other

    cs.CV cs.GR

    Global Parameterization-based Texture Space Optimization

    Authors: Wei Chen, Yuxue Ren, Na Lei, Zhongxuan Luo, Xianfeng Gu

    Abstract: Texture map** is a common technology in the area of computer graphics, it maps the 3D surface space onto the 2D texture space. However, the loose texture space will reduce the efficiency of data storage and GPU memory addressing in the rendering process. Many of the existing methods focus on repacking given textures, but they still suffer from high computational cost and hardly produce a wholly… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint submitted to Comput. Math. Math. Phys

  10. arXiv:2406.02495  [pdf, other

    cs.CV

    GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

    Authors: Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang

    Abstract: Combining the signed distance function (SDF) and differentiable volume rendering has emerged as a powerful paradigm for surface reconstruction from multi-view images without 3D supervision. However, current methods are impeded by requiring long-time per-scene optimizations and cannot generalize to new scenes. In this paper, we present GenS, an end-to-end generalizable neural surface reconstruction… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2023 Accepted

  11. arXiv:2406.01903  [pdf, ps, other

    cs.IT

    Reverse PAC Codes: Look-ahead List Decoding

    Authors: Xinyi Gu, Mohammad Rowshan, **hong Yuan

    Abstract: Convolutional precoding in polarization-adjusted convolutional (PAC) codes is a recently introduced variant of polar codes. It has demonstrated an effective reduction in the number of minimum weight codewords (a.k.a error coefficient) of polar codes. This reduction has the potential to significantly improve the error correction performance. From a codeword formation perspective, this reduction has… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: To appear in the proceedings of ISIT'24. It contains 6 pages, 3 figures, and 1 table

  12. arXiv:2405.20848  [pdf, other

    cs.SE cs.AI cs.LG

    SLIM: a Scalable Light-weight Root Cause Analysis for Imbalanced Data in Microservice

    Authors: Rui Ren, **gbang Yang, Linxiao Yang, Xinyue Gu, Liang Sun

    Abstract: The newly deployed service -- one kind of change service, could lead to a new type of minority fault. Existing state-of-the-art methods for fault localization rarely consider the imbalanced fault classification in change service. This paper proposes a novel method that utilizes decision rule sets to deal with highly imbalanced data by optimizing the F1 score subject to cardinality constraints. The… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  13. arXiv:2405.17044  [pdf, other

    cs.AI cs.CL cs.DL cs.LG

    Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models

    Authors: Xuemei Gu, Mario Krenn

    Abstract: Advanced artificial intelligence (AI) systems with access to millions of research papers could inspire new research ideas that may not be conceived by humans alone. However, how interesting are these AI-generated ideas, and how can we improve their quality? Here, we introduce SciMuse, a system that uses an evolving knowledge graph built from more than 58 million scientific papers to generate perso… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages; 5 figures

  14. arXiv:2405.15627  [pdf, other

    physics.class-ph cs.CE

    The Scattering Matrix-Based Characteristic Mode for Structure amidst Arbitrary Background: Theory, Benchmark and Applications

    Authors: Chenbo Shi, ** Pan, Xin Gu, Shichen Liang, Le Zuo

    Abstract: This paper presents a novel approach for computing substructure characteristic modes. This method leverages electromagnetic scattering matrices and spherical wave expansion to directly decompose electromagnetic fields. Unlike conventional methods that rely on the impedance matrix generated by the method of moments (MoM), our technique simplifies the problem into a small-scale ordinary eigenvalue p… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  15. arXiv:2405.07547  [pdf, other

    cs.IT eess.SP

    Channel Coding Toward 6G: Technical Overview and Outlook

    Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, **hong Yuan

    Abstract: Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Th… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

  16. arXiv:2405.04520  [pdf, other

    cs.CL cs.LG cs.SE

    NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

    Authors: Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on algorithm and data science, insufficiently satisfying challenging requirements prevalent in real-world coding. To fill this gap, we propose NaturalCodeB… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  17. arXiv:2405.02843  [pdf, other

    cs.CV

    Residual-Conditioned Optimal Transport: Towards Structure-Preserving Unpaired and Paired Image Restoration

    Authors: Xiaole Tang, Xin Hu, Xiang Gu, Jian Sun

    Abstract: Deep learning-based image restoration methods generally struggle with faithfully preserving the structures of the original image. In this work, we propose a novel Residual-Conditioned Optimal Transport (RCOT) approach, which models image restoration as an optimal transport (OT) problem for both unpaired and paired settings, introducing the transport residual as a unique degradation-specific cue fo… ▽ More

    Submitted 10 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  18. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, **g Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jian** Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 4 June, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  19. arXiv:2404.17275  [pdf, other

    cs.CV cs.LG

    Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation

    Authors: Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu

    Abstract: The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $α$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propos… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: To appear in IJCV

  20. arXiv:2404.09507  [pdf, other

    cs.CV

    Clothes-Changing Person Re-Identification with Feasibility-Aware Intermediary Matching

    Authors: Jiahe Zhao, Ruibing Hou, Hong Chang, Xinqian Gu, Bingpeng Ma, Shiguang Shan, Xilin Chen

    Abstract: Current clothes-changing person re-identification (re-id) approaches usually perform retrieval based on clothes-irrelevant features, while neglecting the potential of clothes-relevant features. However, we observe that relying solely on clothes-irrelevant features for clothes-changing re-id is limited, since they often lack adequate identity information and suffer from large intra-class variations… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  21. arXiv:2404.08947  [pdf, other

    cs.SE

    Zero-Shot Code Representation Learning via Prompt Tuning

    Authors: Nan Cui, Xiaodong Gu, Beijun Shen

    Abstract: Learning code representations has been the core prerequisite of many software engineering tasks such as code clone detection and code generation. State-of-the-art program representation techniques mainly utilize pre-trained language models (PLMs) such as CodeBERT. A Transformer encoder is firstly pre-trained on a large-scale code corpus to acquire general knowledge about source code. The pre-train… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2204.08360

  22. arXiv:2404.05695  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer

    Authors: Xinyang Gu, Yen-Jen Wang, Jianyu Chen

    Abstract: Humanoid-Gym is an easy-to-use reinforcement learning (RL) framework based on Nvidia Isaac Gym, designed to train locomotion skills for humanoid robots, emphasizing zero-shot transfer from simulation to the real-world environment. Humanoid-Gym also integrates a sim-to-sim framework from Isaac Gym to Mujoco that allows users to verify the trained policies in different physical simulations to ensure… ▽ More

    Submitted 18 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Journal ref: ICRA 2024 Workshop on Agile Robotics

  23. arXiv:2404.00320  [pdf, other

    cs.AI

    Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives

    Authors: Xingrui Gu, Zhixuan Wang, Irisa **, Zekun Wu

    Abstract: This research tackles the challenge of integrating heterogeneous data for specific behavior recognition within the domain of Pain Recognition, presenting a novel methodology that harmonizes statistical correlations with a human-centered approach. By leveraging a diverse range of deep learning architectures, we highlight the adaptability and efficacy of our approach in improving model performance a… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Under reviewed by ACII 2024

  24. arXiv:2403.16439  [pdf, other

    cs.RO cs.CV cs.LG

    Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

    Authors: Xunjiang Gu, Guanyu Song, Igor Gilitschenski, Marco Pavone, Boris Ivanovic

    Abstract: High-definition (HD) maps have played an integral role in the development of modern autonomous vehicle (AV) stacks, albeit with high associated labeling and maintenance costs. As a result, many recent works have proposed methods for estimating HD maps online from sensor data, enabling AVs to operate outside of previously-mapped regions. However, current online map estimation approaches are develop… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 14 pages, 14 figures, 6 tables. CVPR 2024

  25. arXiv:2403.16048  [pdf, other

    cs.CV

    Edit3K: Universal Representation Learning for Video Editing Components

    Authors: Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, Yufei Wang, Tiejian Luo, Sijie Zhu

    Abstract: This paper focuses on understanding the predominant video creation pipeline, i.e., compositional video editing with six main types of editing components, including video effects, animation, transition, filter, sticker, and text. In contrast to existing visual representation learning of visual materials (i.e., images/videos), we aim to learn visual representations of editing actions/components that… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  26. arXiv:2403.15559  [pdf, other

    cs.CV cs.AI

    An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models

    Authors: Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  27. arXiv:2403.12010  [pdf, other

    cs.CV cs.AI cs.GR

    VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

    Authors: Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: aigc3d.github.io/VideoMV/

  28. arXiv:2403.07463  [pdf, other

    cs.CR cs.CV

    Backdoor Attack with Mode Mixture Latent Modification

    Authors: Hongwei Zhang, Xiaoyin Xu, Dongsheng An, Xianfeng Gu, Min Zhang

    Abstract: Backdoor attacks become a significant security concern for deep neural networks in recent years. An image classification model can be compromised if malicious backdoors are injected into it. This corruption will cause the model to function normally on clean images but predict a specific target label when triggers are present. Previous research can be categorized into two genres: poisoning a portio… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  29. arXiv:2403.06033  [pdf, other

    cs.LG cs.CY

    Predicting Depression and Anxiety: A Multi-Layer Perceptron for Analyzing the Mental Health Impact of COVID-19

    Authors: David Fong, Tianshu Chu, Matthew Heflin, Xiaosi Gu, Oshani Seneviratne

    Abstract: We introduce a multi-layer perceptron (MLP) called the COVID-19 Depression and Anxiety Predictor (CoDAP) to predict mental health trends, particularly anxiety and depression, during the COVID-19 pandemic. Our method utilizes a comprehensive dataset, which tracked mental health symptoms weekly over ten weeks during the initial COVID-19 wave (April to June 2020) in a diverse cohort of U.S. adults. T… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  30. arXiv:2403.05121  [pdf, other

    cs.CV

    CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

    Authors: Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang

    Abstract: Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  31. arXiv:2403.00834  [pdf, other

    cs.HC cs.AI cs.GR quant-ph

    Virtual Reality for Understanding Artificial-Intelligence-driven Scientific Discovery with an Application in Quantum Optics

    Authors: Philipp Schmidt, Sören Arlt, Carlos Ruiz-Gonzalez, Xuemei Gu, Carla Rodríguez, Mario Krenn

    Abstract: Generative Artificial Intelligence (AI) models can propose solutions to scientific problems beyond human capability. To truly make conceptual contributions, researchers need to be capable of understanding the AI-generated structures and extracting the underlying concepts and ideas. When algorithms provide little explanatory reasoning alongside the output, scientists have to reverse-engineer the fu… ▽ More

    Submitted 20 February, 2024; originally announced March 2024.

    Comments: 12 pages, 6 figures, comments welcome

  32. arXiv:2402.18540  [pdf, other

    cs.LG cs.AI cs.CL

    Kee** LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

    Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

    Abstract: Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extens… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 20 pages

  33. arXiv:2402.08640  [pdf, other

    cs.DL cs.AI cs.LG

    Forecasting high-impact research topics via machine learning on evolving knowledge graphs

    Authors: Xuemei Gu, Mario Krenn

    Abstract: The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one's own field. While there are ways to predict a scientific paper's future citation counts, they need the research to be finished and the paper written, usual… ▽ More

    Submitted 3 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures, Comments welcome!

  34. arXiv:2402.08567  [pdf, other

    cs.CL cs.CR cs.CV cs.LG cs.MA

    Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

    Authors: Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, **g Jiang, Min Lin

    Abstract: A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming efforts have revealed that adversarial images/prompts can jailbreak an MLLM and cause unaligned behaviors. In this work, we report an even more severe safety issue in multi-agent environments, referred to as infectious jail… ▽ More

    Submitted 3 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  35. arXiv:2401.06461  [pdf, other

    cs.SE cs.AI cs.CL

    Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

    Authors: Yuling Shi, Hongyu Zhang, Chengcheng Wan, Xiaodong Gu

    Abstract: Large language models have catalyzed an unprecedented wave in code generation. While achieving significant advances, they blur the distinctions between machine- and human-authored source code, causing integrity and authenticity issues of software artifacts. Previous methods such as DetectGPT have proven effective in discerning machine-generated texts, but they do not identify and harness the uniqu… ▽ More

    Submitted 23 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: code available at https://github.com/YerbaPage/DetectCodeGPT

  36. arXiv:2401.01578  [pdf, other

    cs.CV

    Context-Guided Spatio-Temporal Video Grounding

    Authors: Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang

    Abstract: Spatio-temporal video grounding (or STVG) task aims at locating a spatio-temporal tube for a specific instance given a text query. Despite advancements, current methods easily suffer the distractors or heavy object appearance variations in videos due to insufficient object information from the text, leading to degradation. Addressing this, we propose a novel framework, context-guided STVG (CG-STVG… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  37. arXiv:2312.14125  [pdf, other

    cs.CV cs.AI

    VideoPoet: A Large Language Model for Zero-Shot Video Generation

    Authors: Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam , et al. (6 additional authors not shown)

    Abstract: We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and tas… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: To appear at ICML 2024; Project page: http://sites.research.google/videopoet/

  38. arXiv:2312.09237  [pdf, other

    cs.CV

    Pixel Aligned Language Models

    Authors: Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid

    Abstract: Large language models have achieved great success in recent years, so as their variants in vision. Existing vision-language models can describe images in natural languages, answer visual-related questions, or perform complex reasoning about the image. However, it is yet unclear how localization tasks, such as word grounding or referring localization, can be performed using large language models. I… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Project page: https://jerryxu.net/PixelLLM

  39. arXiv:2312.07661  [pdf, other

    cs.CV cs.CL cs.LG cs.MM

    CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

    Authors: Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li

    Abstract: Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to… ▽ More

    Submitted 7 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: To appear in CVPR 2024. Project page: https://torrvision.com/clip_as_rnn/

  40. arXiv:2312.06662  [pdf, other

    cs.CV cs.AI cs.LG

    Photorealistic Video Generation with Diffusion Models

    Authors: Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama

    Abstract: We present W.A.L.T, a transformer-based approach for photorealistic video generation via diffusion modeling. Our approach has two key design decisions. First, we use a causal encoder to jointly compress images and videos within a unified latent space, enabling training and generation across modalities. Second, for memory and training efficiency, we use a window attention architecture tailored for… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project website https://walt-video-diffusion.github.io/

  41. arXiv:2312.05921  [pdf, other

    cs.AI

    Dig-CSI: A Distributed and Generative Model Assisted CSI Feedback Training Framework

    Authors: Zhilin Du, Haozhen Li, Zhenyu Liu, Shilong Fan, Xinyu Gu, Lin Zhang

    Abstract: The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  42. arXiv:2312.01639  [pdf, other

    cs.SE

    On the Effectiveness of Large Language Models in Domain-Specific Code Generation

    Authors: Yalan Lin, Meng Chen, Yuhan Hu, Hongyu Zhang, Chengcheng Wan, Zhao Wei, Yong Xu, Juhong Wang, Xiaodong Gu

    Abstract: Large language models (LLMs) such as ChatGPT have shown remarkable capabilities in code generation. Despite the great achievement, they rely on enormous training data to acquire a broad spectrum of open-domain knowledge. Besides, their evaluation revolves around open-domain benchmarks like HumanEval, which primarily consist of programming contests. Therefore, it is hard to fully characterize the i… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Preprint submitted to ACM Transactions on Software Engineering and Methodology

  43. arXiv:2311.17428  [pdf, other

    cs.CV cs.AI

    SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action Segmentation

    Authors: Qi Liu, Xinchen Liu, Kun Liu, Xiaoyan Gu, Wu Liu

    Abstract: Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential contributions of sparse IoT sensor signals, which can be crucial for achieving accurate recognition, have not been fully explored. To make up for… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  44. arXiv:2311.16918  [pdf, other

    cs.CV cs.AI

    RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

    Authors: Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han

    Abstract: Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to… ▽ More

    Submitted 24 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Page: https://aigc3d.github.io/richdreamer/

  45. arXiv:2311.05770  [pdf, other

    cs.CV

    PolyMaX: General Dense Prediction with Mask Transformer

    Authors: Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen

    Abstract: Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or regression (continuous outputs). This per-pixel prediction paradigm has remained popular due to the prevalence of fully convolutional networks. However, on the recent frontier of segmentation task, the community has been… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  46. arXiv:2311.04150   

    cs.HC

    What Makes a Fantastic Passenger-Car Driver in Urban Contexts?

    Authors: Yueteng Yu, Zhijie Yi, Xinyu Yang, Mengdi Chu, Junrong Lu, Xiang Chang, Yiyao Liu, **gli Qin, Ye **, Jialin Song, Xingrui Gu, Jirui Yuan, Guyue Zhou, Jiangtao Gong

    Abstract: The accurate evaluation of the quality of driving behavior is crucial for optimizing and implementing autonomous driving technology in practice. However, there is no comprehensive understanding of good driving behaviors currently. In this paper, we sought to understand driving behaviors from the perspectives of both drivers and passengers. We invited 10 expert drivers and 14 novice drivers to comp… ▽ More

    Submitted 12 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Part of the content of the paper will be modified. One of the authors has recommended its withdrawal due to personal reasons

  47. arXiv:2311.01226  [pdf, other

    cs.CV

    Optimal Transport-Guided Conditional Score-Based Diffusion Models

    Authors: Xiang Gu, Liwei Yang, Jian Sun, Zongben Xu

    Abstract: Conditional score-based diffusion model (SBDM) is for conditional generation of target data with paired data as condition, and has achieved great success in image translation. However, it requires the paired data as condition, and there would be insufficient paired data provided in real-world applications. To tackle the applications with partially paired or even unpaired dataset, we propose a nove… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted in NeurIPS 2023

  48. arXiv:2310.18992  [pdf, other

    cs.CL cs.AI

    Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders

    Authors: Qianren Mao, Shaobo Zhao, Jiarui Li, Xiaolei Gu, Shizhu He, Bo Li, Jianxin Li

    Abstract: Pre-trained sentence representations are crucial for identifying significant sentences in unsupervised document extractive summarization. However, the traditional two-step paradigm of pre-training and sentence-ranking, creates a gap due to differing optimization objectives. To address this issue, we argue that utilizing pre-trained embeddings derived from a process specifically designed to optimiz… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Accepted by the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

  49. arXiv:2310.15662  [pdf, other

    cs.LG

    Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting

    Authors: Linxiao Yang, Rui Ren, Xinyue Gu, Liang Sun

    Abstract: Electric load forecasting is an indispensable component of electric power system planning and management. Inaccurate load forecasting may lead to the threat of outages or a waste of energy. Accurate electric load forecasting is challenging when there is limited data or even no data, such as load forecasting in holiday, or under extreme weather conditions. As high-stakes decision-making usually fol… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  50. arXiv:2310.14423  [pdf, other

    cs.LG

    A Quadratic Synchronization Rule for Distributed Deep Learning

    Authors: Xinran Gu, Kaifeng Lyu, Sanjeev Arora, **gzhao Zhang, Longbo Huang

    Abstract: In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models. Local gradient methods, such as Local SGD, address this issue by allowing workers to compute locally for $H$ steps without synchronizing with others, hence reducing communication frequency. While… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: camera-ready version for ICLR'24