Skip to main content

Showing 1–50 of 509 results for author: Zheng, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16853  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI q-bio.BM

    GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

    Authors: Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, Liwei Wang

    Abstract: Molecular modeling, a central topic in quantum mechanics, aims to accurately calculate the properties and simulate the behaviors of molecular systems. The molecular model is governed by physical laws, which impose geometric constraints such as invariance and equivariance to coordinate rotation and translation. While numerous deep learning approaches have been developed to learn molecular represent… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 25 pages, 13 tables, l figure; ICML 2024 camera ready version

  2. arXiv:2406.16578  [pdf, other

    cs.RO cs.AI

    QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

    Authors: Ye Wang, Yuting Mei, Sipeng Zheng, Qin **

    Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-ma… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under review

  3. arXiv:2406.15303  [pdf, other

    cs.CV

    ADR: Attention Diversification Regularization for Mitigating Overfitting in Multiple Instance Learning based Whole Slide Image Classification

    Authors: Yunlong Zhang, Zhongyi Shui, Yunxuan Sun, Honglin Li, **gxiong Li, Chenglu Zhu, Sunyi Zheng, Lin Yang

    Abstract: Multiple Instance Learning (MIL) has demonstrated effectiveness in analyzing whole slide images (WSIs), yet it often encounters overfitting challenges in real-world applications. This paper reveals the correlation between MIL's performance and the entropy of attention values. Based on this observation, we propose Attention Diversity Regularization (ADR), a simple but effective technique aimed at p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zi** Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.12738  [pdf, other

    cs.CL cs.AI

    Large Language Model as a Universal Clinical Multi-task Decoder

    Authors: Yujiang Wu, Hongjian Song, Jiawen Zhang, Xumeng Wen, Shun Zheng, Jiang Bian

    Abstract: The development of effective machine learning methodologies for enhancing the efficiency and accuracy of clinical systems is crucial. Despite significant research efforts, managing a plethora of diversified clinical tasks and adapting to emerging new tasks remain significant challenges. This paper presents a novel paradigm that employs a pre-trained large language model as a universal clinical mul… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in progress

  6. FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal Healthcare Prediction

    Authors: Muhao Xu, Zhenfeng Zhu, Youru Li, Shuai Zheng, Yawei Zhao, Kunlun He, Yao Zhao

    Abstract: Multimodal electronic health record (EHR) data can offer a holistic assessment of a patient's health status, supporting various predictive healthcare tasks. Recently, several studies have embraced the multitask learning approach in the healthcare domain, exploiting the inherent correlations among clinical tasks to predict multiple outcomes simultaneously. However, existing methods necessitate samp… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024 (Research Track)

  7. arXiv:2406.11274  [pdf, other

    cs.CL

    Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

    Abstract: The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 1 figure

  8. arXiv:2406.11169   

    eess.AS cs.SD

    Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

    Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

    Abstract: Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: We update this paper to an earlier paper

  9. arXiv:2406.10985  [pdf, other

    cs.CL

    Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

    Authors: Weiyao Luo, Suncong Zheng, Heming Xia, Weikang Wang, Yan Lei, Tianyu Liu, Shuang Chen, Zhifang Sui

    Abstract: Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.10948  [pdf

    cs.LG cs.AI

    Incorporating uncertainty quantification into travel mode choice modeling: a Bayesian neural network (BNN) approach and an uncertainty-guided active survey framework

    Authors: Shuwen Zheng, Zhou Fang, Liang Zhao

    Abstract: Existing deep learning approaches for travel mode choice modeling fail to inform modelers about their prediction uncertainty. Even when facing scenarios that are out of the distribution of training data, which implies high prediction uncertainty, these approaches still provide deterministic answers, potentially leading to misguidance. To address this limitation, this study introduces the concept o… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.10724  [pdf, other

    eess.IV cs.CV cs.LG

    Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

    Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

    Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in 38th Annual Small Satellite Conference

  12. arXiv:2406.07661  [pdf, other

    cs.CV cs.RO

    ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones

    Authors: Anurag Ghosh, Robert Tamburo, Shen Zheng, Juan R. Alvarez-Padilla, Hailiang Zhu, Michael Cardei, Nicholas Dunn, Christoph Mertz, Srinivasa G. Narasimhan

    Abstract: Perceiving and navigating through work zones is challenging and under-explored, even with major strides in self-driving research. An important reason is the lack of open datasets for develo** new algorithms to address this long-tailed scenario. We propose the ROADWork dataset to learn how to recognize, observe and analyze and drive through work zones. We find that state-of-the-art foundation mod… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  13. arXiv:2406.06542  [pdf, other

    cs.AR cs.LG

    vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

    Authors: Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang

    Abstract: IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). However, the memory of MCU is usually 2-3 orders of magnitude smaller than mobile devices, which makes it challenging to map DNNs onto MCUs. Previous work separates memory management and kernel implementation for MCU and relies on coarse-graine… ▽ More

    Submitted 1 May, 2024; originally announced June 2024.

  14. arXiv:2406.05647  [pdf, other

    eess.SP cs.ET

    Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

    Authors: Ruiqi Liu, Shuang Zheng, Qingqing Wu, Yifan Jiang, Nan Zhang, Yuanwei Liu, Marco Di Renzo, and George C. Alexandropoulos

    Abstract: Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research inv… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, submitted to an IEEE Magazine

  15. arXiv:2406.04520  [pdf, other

    cs.CL cs.AI

    NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

    Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  16. arXiv:2406.02983  [pdf, other

    cs.RO cs.AI

    FREA: Feasibility-Guided Generation of Safety-Critical Scenarios with Reasonable Adversariality

    Authors: Keyu Chen, Yuheng Lei, Hao Cheng, Haoran Wu, Wenchao Sun, Sifa Zheng

    Abstract: Generating safety-critical scenarios, which are essential yet difficult to collect at scale, offers an effective method to evaluate the robustness of autonomous vehicles (AVs). Existing methods focus on optimizing adversariality while preserving the naturalness of scenarios, aiming to achieve a balance through data-driven approaches. However, without an appropriate upper bound for adversariality,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 19 pages. Under review

  17. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  18. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  19. arXiv:2405.19775  [pdf, other

    cs.CV

    Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network

    Authors: Sizhe Zheng, Pan Gao, Peng Zhou, Jie Qin

    Abstract: Style transfer aims to render an image with the artistic features of a style image, while maintaining the original structure. Various methods have been put forward for this task, but some challenges still exist. For instance, it is difficult for CNN-based methods to handle global information and long-range dependencies between input images, for which transformer-based methods have been proposed. A… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 11 pages, 11 figures, to be published in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2024)

  20. arXiv:2405.19620  [pdf, other

    cs.CV

    SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

    Authors: Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, Sifa Zheng

    Abstract: The well-established modular autonomous driving system is decoupled into different standalone tasks, e.g. perception, prediction and planning, suffering from information loss and error accumulation across modules. In contrast, end-to-end paradigms unify multi-tasks into a fully differentiable framework, allowing for optimization in a planning-oriented spirit. Despite the great potential of end-to-… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  21. arXiv:2405.18253  [pdf, other

    cs.LG cs.GT

    Truthful Dataset Valuation by Pointwise Mutual Information

    Authors: Shuran Zheng, Yongchan Kwon, Xuan Qi, James Zou

    Abstract: A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to poten… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  22. arXiv:2405.17719  [pdf, other

    cs.CV

    EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

    Authors: Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin **

    Abstract: Egocentric video-language pretraining is a crucial paradigm to advance the learning of egocentric hand-object interactions (EgoHOI). Despite the great success on existing testbeds, these benchmarks focus more on closed-set visual concepts or limited scenarios. Due to the occurrence of diverse EgoHOIs in the real world, we propose an open-vocabulary benchmark named EgoHOIBench to reveal the diminis… ▽ More

    Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/xuboshen/EgoNCEpp

  23. arXiv:2405.16606  [pdf, other

    cs.SI

    Link Prediction on Textual Edge Graphs

    Authors: Chen Ling, Zhuofeng Li, Yuntong Hu, Zheng Zhang, Zhongyuan Liu, Shuang Zheng, Liang Zhao

    Abstract: Textual-edge Graphs (TEGs), characterized by rich text annotations on edges, are increasingly significant in network science due to their ability to capture rich contextual information among entities. Existing works have proposed various edge-aware graph neural networks (GNNs) or let language models directly make predictions. However, they often fall short of fully capturing the contextualized sem… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  24. arXiv:2405.14866  [pdf, other

    cs.CV

    Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

    Authors: Hanzhang Tu, Ruizhi Shao, Xue Dong, Shunyuan Zheng, Hao Zhang, Lili Chen, Meili Wang, Wenyu Li, Siyan Ma, Sheng** Zhang, Boyao Zhou, Yebin Liu

    Abstract: In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Paper accepted by SIGGRAPH 2024. Project page: http://118.178.32.38/c/Tele-Aloha/

  25. arXiv:2405.09055  [pdf, other

    cs.CL

    A safety realignment framework via subspace-oriented model fusion for large language models

    Authors: Xin Yi, Shunfan Zheng, Linlin Wang, Xiaoling Wang, Liang He

    Abstract: The current safeguard mechanisms for large language models (LLMs) are indeed susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine-tuning on apparently benign data for downstream tasks can jeopardize safety. One potential solution is to conduct safety fine-tuning subsequent to downstream fine-tuning. However, there's a risk of catastrophic forgetting during saf… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  26. arXiv:2405.06959  [pdf, other

    cs.RO

    AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenoty** and Pose Estimation

    Authors: Xingxu Li, Nan Ma, Yiheng Han, Shun Yang, Siyi Zheng

    Abstract: To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenoty** and pose estimation. Specifically, In phenoty**, the detection, association, and maturity estimation of tomato trusses and individual fruits are… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA),7 pages, 3 figures

  27. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  28. arXiv:2405.01312  [pdf, other

    cs.DB cs.CR

    Privacy-Enhanced Database Synthesis for Benchmark Publishing

    Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao

    Abstract: Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating syn… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  29. arXiv:2405.00957  [pdf, other

    cs.LG cs.AI cs.SI

    IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors

    Authors: Shenghe Zheng, Hongzhi Wang, Xianglong Liu

    Abstract: Graph Neural Networks (GNNs) demonstrate excellent performance on graphs, with their core idea about aggregating neighborhood information and learning from labels. However, the prevailing challenges in most graph datasets are twofold of Insufficient High-Quality Labels and Lack of Neighborhoods, resulting in weak GNNs. Existing data augmentation methods designed to address these two issues often t… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 18 pages

  30. arXiv:2405.00796  [pdf, other

    cs.SE

    Does Using Bazel Help Speed Up Continuous Integration Builds?

    Authors: Shenyu Zheng, Bram Adams, Ahmed E. Hassan

    Abstract: A long continuous integration (CI) build forces developers to wait for CI feedback before starting subsequent development activities, leading to time wasted. In addition to a variety of build scheduling and test selection heuristics studied in the past, new artifact-based build technologies like Bazel have built-in support for advanced performance optimizations such as parallel build and increment… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  31. arXiv:2404.19429  [pdf, other

    cs.DC cs.LG

    Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlap**

    Authors: Chenyu Jiang, Ye Tian, Zhen Jia, Shuai Zheng, Chuan Wu, Yida Wang

    Abstract: The Mixture-of-Expert (MoE) technique plays a crucial role in expanding the size of DNN model parameters. However, it faces the challenge of extended all-to-all communication latency during the training process. Existing methods attempt to mitigate this issue by overlap** all-to-all with expert computation. Yet, these methods frequently fall short of achieving sufficient overlap, consequently re… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 11 pages, 16 figures. Published in MLSys'24

  32. arXiv:2404.13816  [pdf, other

    cs.CV

    Neural Radiance Field in Autonomous Driving: A Survey

    Authors: Lei He, Leheng Li, Wenchao Sun, Zeyu Han, Yichen Liu, Sifa Zheng, Jianqiang Wang, Keqiang Li

    Abstract: Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuou… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  33. arXiv:2404.13408  [pdf, other

    cs.CV

    AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation

    Authors: Yang Yang, Shunyi Zheng

    Abstract: The advancement of deep learning has driven notable progress in remote sensing semantic segmentation. Attention mechanisms, while enabling global modeling and utilizing contextual information, face challenges of high computational costs and require window-based operations that weaken capturing long-range dependencies, hindering their effectiveness for remote sensing image processing. In this lette… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  34. arXiv:2404.07941  [pdf, other

    cs.NE cs.AI cs.LG

    SiGNN: A Spike-induced Graph Neural Network for Dynamic Graph Representation Learning

    Authors: Dong Chen, Shuai Zheng, Muhao Xu, Zhenfeng Zhu, Yao Zhao

    Abstract: In the domain of dynamic graph representation learning (DGRL), the efficient and comprehensive capture of temporal evolution within real-world networks is crucial. Spiking Neural Networks (SNNs), known as their temporal dynamics and low-power characteristic, offer an efficient solution for temporal processing in DGRL task. However, owing to the spike-based information encoding mechanism of SNNs, e… ▽ More

    Submitted 11 March, 2024; originally announced April 2024.

  35. arXiv:2404.07503  [pdf, other

    cs.CL

    Best Practices and Lessons Learned on Synthetic Data for Language Models

    Authors: Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, **meng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

    Abstract: The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challeng… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  36. arXiv:2404.00551  [pdf, other

    stat.ML cs.LG

    Convergence of Continuous Normalizing Flows for Learning Probability Distributions

    Authors: Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

    Abstract: Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear inter… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 60 pages, 3 tables, and 3 figures

    MSC Class: 62G05; 68T07

  37. arXiv:2403.17155  [pdf, other

    cs.CL cs.CR

    Task-Agnostic Detector for Insertion-Based Backdoor Attacks

    Authors: Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha, Chao Chen

    Abstract: Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering ta… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Findings of NAACL 2024

  38. arXiv:2403.12712  [pdf, other

    cs.CV cs.LG

    Addressing Source Scale Bias via Image War** for Domain Adaptation

    Authors: Shen Zheng, Anurag Ghosh, Srinivasa G. Narasimhan

    Abstract: In visual recognition, scale bias is a key challenge due to the imbalance of object and image size distribution inherent in real scene datasets. Conventional solutions involve injecting scale invariance priors, oversampling the dataset at different scales during training, or adjusting scale at inference. While these strategies mitigate scale bias to some extent, their ability to adapt across diver… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  39. arXiv:2403.10778  [pdf, other

    cs.CV

    HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection

    Authors: Shibiao Xu, ShuChen Zheng, Wenhao Xu, Rongtao Xu, Changwei Wang, Jiguang Zhang, Xiaoqiang Teng, Ao Li, Li Guo

    Abstract: Infrared small object detection is an important computer vision task involving the recognition and localization of tiny objects in infrared images, which usually contain only a few pixels. However, it encounters difficulties due to the diminutive size of the objects and the generally complex backgrounds in infrared images. In this paper, we propose a deep learning method, HCF-Net, that significant… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  40. arXiv:2403.10051  [pdf, other

    cs.DB

    Accelerating Regular Path Queries over Graph Database with Processing-in-Memory

    Authors: Ruoyan Ma, Shengan Zheng, Guifeng Wang, ** Pu, Yifan Hua, Wentao Wang, Linpeng Huang

    Abstract: Regular path queries (RPQs) in graph databases are bottlenecked by the memory wall. Emerging processing-in-memory (PIM) technologies offer a promising solution to dispatch and execute path matching tasks in parallel within PIM modules. We present Moctopus, a PIM-based data management system for graph databases that supports efficient batch RPQs and graph updates. Moctopus employs a PIM-friendly dy… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  41. arXiv:2403.09072  [pdf, other

    cs.CV cs.AI cs.CL

    UniCode: Learning a Unified Codebook for Multimodal Large Language Models

    Authors: Sipeng Zheng, Bohan Zhou, Yicheng Feng, Ye Wang, Zongqing Lu

    Abstract: In this paper, we propose \textbf{UniCode}, a novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals. This innovation addresses a critical limitation in existing MLLMs: their reliance on a text-only codebook, which restricts MLLM's ability to generate images and texts i… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 14 pages, 2 figures, 11 tables

  42. arXiv:2403.08650  [pdf, other

    cs.CV

    Data Augmentation in Human-Centric Vision

    Authors: Wentao Jiang, Yige Zhang, Shaozhong Zheng, Si Liu, Shuicheng Yan

    Abstract: This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks, a first of its kind in the field. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection, addressing the significant challenges posed by overfitting and limited training data in these domains. Our work categorizes… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  43. arXiv:2403.05874  [pdf, other

    cs.CV cs.RO

    SPAFormer: Sequential 3D Part Assembly with Transformers

    Authors: Boshen Xu, Sipeng Zheng, Qin **

    Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/xuboshen/SPAFormer

  44. POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

    Authors: Boshen Xu, Sipeng Zheng, Qin **

    Abstract: We humans are good at translating third-person observations of hand-object interactions (HOI) into an egocentric view. However, current methods struggle to replicate this ability of view adaptation from third-person to first-person. Although some approaches attempt to learn view-agnostic representation from large-scale video datasets, they ignore the relationships among multiple third-person views… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM MM 2023. Project page: https://xuboshen.github.io/

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023). Association for Computing Machinery, New York, NY, USA, 2807-2816

  45. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  46. arXiv:2403.05131  [pdf, other

    cs.AI cs.CV

    Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

    Authors: Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, **gyao Zheng, Lik-Hang Lee, Tae-Ho Kim, Choong Seon Hong, Chaoning Zhang

    Abstract: The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu… ▽ More

    Submitted 7 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: First complete survey on Text-to-Video Generation, 44 pages, 20 figures

  47. arXiv:2402.18252  [pdf, other

    cs.CL cs.AI

    Towards Generalist Prompting for Large Language Models by Mental Models

    Authors: Haoxiang Guan, Jiyan He, Shuxin Zheng, En-Hong Chen, Weiming Zhang, Nenghai Yu

    Abstract: Large language models (LLMs) have demonstrated impressive performance on many tasks. However, to achieve optimal performance, specially designed prompting methods are still needed. These methods either rely on task-specific few-shot examples that require a certain level of domain knowledge, or are designed to be simple but only perform well on a few types of tasks. In this work, we attempt to intr… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  48. arXiv:2402.16181  [pdf, other

    cs.LG cs.AI

    How Can LLM Guide RL? A Value-Based Approach

    Authors: Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin **, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang

    Abstract: Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. However, RL algorithms may require extensive trial-and-error interactions to collect useful feedback for improvement. On the other hand, recent developments in large language models (LLMs) have showcased impressive capabilities in language… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  49. arXiv:2402.15872  [pdf, ps, other

    cs.GT

    Differentially Private Bayesian Persuasion

    Authors: Yuqi Pan, Zhiwei Steven Wu, Haifeng Xu, Shuran Zheng

    Abstract: The tension between persuasion and privacy preservation is common in real-world settings. Online platforms should protect the privacy of web users whose data they collect, even as they seek to disclose information about these data to selling advertising spaces. Similarly, hospitals may share patient data to attract research investments with the obligation to preserve patients' privacy. To deal wit… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  50. arXiv:2402.15746  [pdf, other

    cs.CV cs.AI cs.MM

    Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT

    Authors: Sixiao Zheng, **gyang Huo, Yu Wang, Yanwei Fu

    Abstract: With the rise of short video platforms represented by TikTok, the trend of users expressing their creativity through photos and videos has increased dramatically. However, ordinary users lack the professional skills to produce high-quality videos using professional creation software. To meet the demand for intelligent and user-friendly video creation tools, we propose the Dynamic Visual Compositio… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: Project Page: https://sixiaozheng.github.io/IntelligentDirector/