Skip to main content

Showing 1–50 of 235 results for author: Gao, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16083  [pdf, other

    eess.IV cs.CV

    Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

    Authors: Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong

    Abstract: Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 17 pages,7 figures

  2. arXiv:2406.15758  [pdf, other

    cs.LG cs.DC

    EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

    Authors: Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin

    Abstract: Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.08204  [pdf, other

    cs.CV

    Diffusion-Promoted HDR Video Reconstruction

    Authors: Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

    Abstract: High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Arxiv Preprint

  4. arXiv:2406.07532  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Hearing Anything Anywhere

    Authors: Mason Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu, Jiajun Wu

    Abstract: Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024. The first two authors contributed equally. Project page: https://masonlwang.com/hearinganythinganywhere/

    ACM Class: I.2.10; I.4.8

  5. arXiv:2406.07393  [pdf, other

    cs.CL

    Limited Out-of-Context Knowledge Reasoning in Large Language Models

    Authors: Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  7. arXiv:2406.01853  [pdf, other

    cs.LG cs.AI cs.MA

    Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy

    Authors: Riqiang Gao, Florin C. Ghesu, Simon Arberet, Shahab Basiri, Esa Kuusela, Martin Kraus, Dorin Comaniciu, Ali Kamen

    Abstract: In contemporary radiotherapy planning (RTP), a key module leaf sequencing is predominantly addressed by optimization-based approaches. In this paper, we propose a novel deep reinforcement learning (DRL) model termed as Reinforced Leaf Sequencer (RLS) in a multi-agent framework for leaf sequencing. The RLS model offers improvements to time-consuming iterative optimization steps via large-scale trai… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  8. arXiv:2405.16865  [pdf, other

    q-bio.NC cs.LG stat.ML

    An Investigation of Conformal Isometry Hypothesis for Grid Cells

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: This paper investigates the conformal isometry hypothesis as a potential explanation for the emergence of hexagonal periodic patterns in the response maps of grid cells. The hypothesis posits that the activities of the population of grid cells form a high-dimensional vector in the neural space, representing the agent's self-position in 2D physical space. As the agent moves in the 2D physical space… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.19192

  9. arXiv:2405.16852  [pdf, other

    cs.LG cs.AI stat.ML

    EM Distillation for One-step Diffusion Models

    Authors: Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

    Abstract: While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Disti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  11. arXiv:2405.14475  [pdf, other

    cs.CV cs.AI

    MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

    Authors: Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu

    Abstract: While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including B… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  12. arXiv:2405.13206  [pdf, other

    cs.CV

    Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

    Authors: Rong Gao, Xin Liu, Bohao Xing, Zitong Yu, Bjorn W. Schuller, Heikki Kälviäinen

    Abstract: In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethin… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  13. arXiv:2405.13045  [pdf, other

    cs.HC cs.AI

    CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

    Authors: Chin-Yi Cheng, Ruiqi Gao, Forrest Huang, Yang Li

    Abstract: Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design i… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  14. arXiv:2405.10314  [pdf, other

    cs.CV

    CAT3D: Create Anything in 3D with Multi-View Diffusion Models

    Authors: Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, Ben Poole

    Abstract: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent nov… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://cat3d.github.io

  15. arXiv:2405.03950  [pdf, other

    cs.LG cs.AI

    Relating-Up: Advancing Graph Neural Networks through Inter-Graph Relationships

    Authors: Qi Zou, Na Yu, Daoliang Zhang, Wei Zhang, Rui Gao

    Abstract: Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a p… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 16 pages, 6 figures, 9 tables

  16. arXiv:2404.10763  [pdf, other

    cs.AI cs.CL cs.CV

    LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

    Authors: Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

    Abstract: Diffusion models have exhibited remarkable capabilities in text-to-image generation. However, their performance in image-to-text generation, specifically image captioning, has lagged behind Auto-Regressive (AR) models, casting doubt on their applicability for such tasks. In this work, we revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. With… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  17. arXiv:2404.10595  [pdf, other

    cs.CV

    Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

    Authors: Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, Xu Jia

    Abstract: Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this paper, we propose CODA-LM, the very first benchmark for the autom… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Project Page: https://coda-dataset.github.io/coda-lm/

  18. arXiv:2404.01296  [pdf, other

    cs.CV

    MagicMirror: Fast and High-Quality Avatar Generation with a Constrained Search Space

    Authors: Armand Comas-Massagué, Di Qiu, Menglei Chai, Marcel Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark Matthews, Paulo Gotardo, Octavia Camps, Sergio Orts-Escolano, Thabo Beeler

    Abstract: We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts to enhance user engagement and customization. Central to our approach are key innovations aimed at overcoming the challenges in photo-realistic avatar synthesis. Firstly, we utilize a conditional Neural Radiance Fields (NeRF) model, trained on a large-scale unannotated multi-view dataset, to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  19. arXiv:2403.19221  [pdf, other

    cs.CV cs.AI

    Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

    Authors: Sishuo Chen, Lei Li, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu Sun, Lu Hou

    Abstract: Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of a single auxiliary modality, which is impractical given the diversity and unpredictable nature of real-world scenarios. To this end, we propose a Miss… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/lancopku/MR-VPC

  20. Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

    Authors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke Martens

    Abstract: Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.16848  [pdf, other

    cs.CV

    Multiple Object Tracking as ID Prediction

    Authors: Ruopeng Gao, Yijun Zhang, Limin Wang

    Abstract: In Multiple Object Tracking (MOT), tracking-by-detection methods have stood the test for a long time, which split the process into two parts according to the definition: object detection and association. They leverage robust single-frame detectors and treat object association as a post-processing step through hand-crafted heuristic algorithms and surrogate tasks. However, the nature of heuristic t… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 71.4 HOTA on DanceTrack (with CrowdHuman), 67.5/70.0 HOTA on DanceTrack built upon Deformable DETR and DAB-Deformable DETR respectively (without additional data). The code repository will be created within several days

  22. arXiv:2403.14822  [pdf, other

    stat.ML cs.LG math.OC

    Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets

    Authors: Jie Wang, Rui Gao, Yao Xie

    Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 26 pages, 2 figures

  23. arXiv:2403.13304  [pdf, other

    cs.CV

    DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

    Authors: Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang

    Abstract: Current perceptive models heavily depend on resource-intensive datasets, prompting the need for innovative solutions. Leveraging recent advances in diffusion models, synthetic data, by constructing image inputs from various annotations, proves beneficial for downstream tasks. While prior methods have separately addressed generative and perceptive models, DetDiffusion, for the first time, harmonize… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  24. arXiv:2403.12636  [pdf, other

    cs.LG stat.ML

    A Practical Guide to Statistical Distances for Evaluating Generative Models in Science

    Authors: Sebastian Bischoff, Alana Darcher, Michael Deistler, Richard Gao, Franziska Gerken, Manuel Gloeckler, Lisa Haxel, Jaivardhan Kapoor, Janne K Lappalainen, Jakob H Macke, Guy Moss, Matthijs Pals, Felix Pei, Rachel Rapp, A Erdem Sağtekin, Cornelius Schröder, Auguste Schulz, Zinovia Stefanidi, Shoji Toyota, Linda Ulmer, Julius Vetter

    Abstract: Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the samples these models generate? This work aims to provide an accessible entry point to understanding popular notions of statistical distances, requiring only foundati… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  25. Holistic HMI Design for Automated Vehicles: Bridging In-Vehicle and External Communication

    Authors: Haoyu Dong, Tram Thi Minh Tran, Pavlo Bazilinskyy, Marius Hoggenmüller, Debargha Dey, Silvia Cazacu, Mervyn Franssen, Ruolin Gao

    Abstract: As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic approach to HMI designs, which promotes the integration of both in-vehicle user and external road user perspectives. This approach aims to create a u… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  26. arXiv:2403.09363  [pdf, other

    cs.CV

    Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

    Authors: Fan Wan, Xingyu Miao, Haoran Duan, **g**g Deng, Rui Gao, Yang Long

    Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  27. arXiv:2403.02867  [pdf, other

    cs.SI cs.LG

    Scalable Continuous-time Diffusion Framework for Network Inference and Influence Estimation

    Authors: Keke Huang, Ruize Gao, Bogdan Cautis, Xiaokui Xiao

    Abstract: The study of continuous-time information diffusion has been an important area of research for many applications in recent years. When only the diffusion traces (cascades) are accessible, cascade-based network inference and influence estimation are two essential problems to explore. Alas, existing methods exhibit limited capability to infer and process networks with more than a few thousand nodes,… ▽ More

    Submitted 20 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  28. arXiv:2403.01446  [pdf, other

    cs.CV

    GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

    Authors: Yijun Yang, Ruiyuan Gao, Xiao Yang, Jianyuan Zhong, Qiang Xu

    Abstract: Recent advancements in Text-to-Image (T2I) models have raised significant safety concerns about their potential misuse for generating inappropriate or Not-Safe-For-Work (NSFW) contents, despite existing countermeasures such as NSFW classifiers or model fine-tuning for inappropriate concept removal. Addressing this challenge, our study unveils GuardT2I, a novel moderation framework that adopts a ge… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  29. arXiv:2402.17718  [pdf

    cs.LG eess.SP

    Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

    Authors: Vispi Karkaria, Anthony Goeckner, Ru**g Zha, Jie Chen, Jian**g Zhang, Qi Zhu, Jian Cao, Robert X. Gao, Wei Chen

    Abstract: Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 Pages, 10 Figures, 1 Table, NAMRC Conference

  30. arXiv:2402.07808  [pdf, other

    cs.LG

    Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation

    Authors: Julius Vetter, Guy Moss, Cornelius Schröder, Richard Gao, Jakob H. Macke

    Abstract: Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations - an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid so… ▽ More

    Submitted 15 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  31. arXiv:2402.06841  [pdf

    eess.IV cs.CV

    Point cloud-based registration and image fusion between cardiac SPECT MPI and CTA

    Authors: Shaojie Tang, Penpen Miao, Xingyu Gao, Yu Zhong, Dantong Zhu, Haixing Wen, Zhihui Xu, Qiuyue Wei, Hong** Yao, Xin Huang, Rui Gao, Chen Zhao, Weihua Zhou

    Abstract: A method was proposed for the point cloud-based registration and image fusion between cardiac single photon emission computed tomography (SPECT) myocardial perfusion images (MPI) and cardiac computed tomography angiograms (CTA). Firstly, the left ventricle (LV) epicardial regions (LVERs) in SPECT and CTA images were segmented by using different U-Net neural networks trained to generate the point c… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  32. arXiv:2402.00575  [pdf, other

    cs.CV

    Diffusion-based Light Field Synthesis

    Authors: Ruisheng Gao, Yutong Liu, Zeyu Xiao, Zhiwei Xiong

    Abstract: Light fields (LFs), conducive to comprehensive scene radiance recorded across angular dimensions, find wide applications in 3D reconstruction, virtual reality, and computational photography.However, the LF acquisition is inevitably time-consuming and resource-intensive due to the mainstream acquisition strategy involving manual capture or laborious software synthesis.Given such a challenge, we int… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 11 pages,9 figures

  33. arXiv:2401.01217  [pdf, other

    cs.DC

    KCES: A Workflow Containerization Scheduling Scheme Under Cloud-Edge Collaboration Framework

    Authors: Chenggang Shan, Runze Gao, Qinghua Han, Zhen Yang, **hui Zhang, Yuanqing Xia

    Abstract: As more IoT applications gradually move towards the cloud-edge collaborative mode, the containerized scheduling of workflows extends from the cloud to the edge. However, given the high delay of the communication network, loose coupling of structure, and resource heterogeneity between cloud and edge, workflow containerization scheduling in the cloud-edge scenarios faces the difficulty of resource c… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  34. arXiv:2312.12870  [pdf, other

    cs.CV

    The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

    Authors: Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

    Abstract: In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on learning about behaviors that directly involve the camera wearer, we introduce the Ego-Exocentric Conversational Graph Prediction problem, marking th… ▽ More

    Submitted 3 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  35. arXiv:2312.02981  [pdf, other

    cs.CV

    ReconFusion: 3D Reconstruction with Diffusion Priors

    Authors: Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, Aleksander Holynski

    Abstract: 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for nove… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://reconfusion.github.io/

  36. arXiv:2312.00820  [pdf, other

    cs.LG cs.AI

    Non-Cross Diffusion for Semantic Consistency

    Authors: Ziyang Zheng, Ruiyuan Gao, Qiang Xu

    Abstract: In diffusion models, deviations from a straight generative flow are a common issue, resulting in semantic inconsistencies and suboptimal generations. To address this challenge, we introduce `Non-Cross Diffusion', an innovative approach in generative modeling for learning ordinary differential equation (ODE) models. Our methodology strategically incorporates an ascending dimension of input to effec… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  37. arXiv:2312.00651  [pdf, other

    cs.CV cs.AI

    TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models

    Authors: Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Guo Zhou, Hua Yao, Dit-Yan Yeung, Huchuan Lu, Xu Jia

    Abstract: Despite remarkable achievements in video synthesis, achieving granular control over complex dynamics, such as nuanced movement among multiple interacting objects, still presents a significant hurdle for dynamic world modeling, compounded by the necessity to manage appearance and disappearance, drastic scale changes, and ensure consistency for instances across frames. These challenges hinder the de… ▽ More

    Submitted 20 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  38. arXiv:2311.17516  [pdf, other

    cs.CR cs.CV

    MMA-Diffusion: MultiModal Attack on Diffusion Models

    Authors: Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu

    Abstract: In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effecti… ▽ More

    Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024. Our codes and benchmarks are available at https://github.com/cure-lab/MMA-Diffusion

  39. arXiv:2311.17404  [pdf, other

    cs.CV cs.AI cs.CL

    VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

    Authors: Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou

    Abstract: The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of static visual shortcuts. To remedy this issue, we present VITATECS, a diagnostic VIdeo-Text dAtaset for the evaluation of TEmporal Concept underStandin… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 23 pages, 6 figures, 18 tables, data is available at https://github.com/lscpku/VITATECS

  40. arXiv:2311.07955  [pdf, other

    cs.CV cs.AI

    Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons

    Authors: Chenjie Zhao, Ryan Wen Liu, **gxiang Qu, Ruobin Gao

    Abstract: With the advancement of maritime unmanned aerial vehicles (UAVs) and deep learning technologies, the application of UAV-based object detection has become increasingly significant in the fields of maritime industry and ocean engineering. Endowed with intelligent sensing capabilities, the maritime UAVs enable effective and efficient maritime surveillance. To further promote the development of mariti… ▽ More

    Submitted 14 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 32 pages, 18 figures

  41. arXiv:2311.04533  [pdf, ps, other

    cs.DS

    Improved Approximations for Ultrametric Violation Distance

    Authors: Moses Charikar, Ruiquan Gao

    Abstract: We study the Ultrametric Violation Distance problem introduced by Cohen-Addad, Fan, Lee, and Mesmay [FOCS, 2022]. Given pairwise distances $x\in \mathbb{R}_{>0}^{\binom{[n]}{2}}$ as input, the goal is to modify the minimum number of distances so as to make it a valid ultrametric. In other words, this is the problem of fitting an ultrametric to given data, where the quality of the fit is measured b… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: SODA 2024

  42. arXiv:2311.03517  [pdf, other

    cs.SD cs.CV eess.AS

    SoundCam: A Dataset for Finding Humans Using Room Acoustics

    Authors: Mason Wang, Samuel Clarke, Jui-Hsien Wang, Ruohan Gao, Jiajun Wu

    Abstract: A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions. A room's acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or roughly inferred from recordings of natural signals present in the room. Variations in the positions of objects in a room can effect measurable changes… ▽ More

    Submitted 15 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: In NeurIPS 2023 Datasets and Benchmarks Track. Project page: https://masonlwang.com/soundcam/. Wang and Clarke contributed equally to this work

  43. arXiv:2311.01813  [pdf, other

    cs.CV

    FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

    Authors: Yuanxin Liu, Lei Li, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu Sun, Lu Hou

    Abstract: Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two critical problems. Firstly, existing studies lack fine-grained evaluation of T2V models on different categories of text prompts. Although some benchmar… ▽ More

    Submitted 26 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  44. arXiv:2311.01708  [pdf, ps, other

    cs.LG math.NA

    Physics-Informed Generator-Encoder Adversarial Networks with Latent Space Matching for Stochastic Differential Equations

    Authors: Ruisong Gao, Min Yang, ** Zhang

    Abstract: We propose a new class of physics-informed neural networks, called Physics-Informed Generator-Encoder Adversarial Networks, to effectively address the challenges posed by forward, inverse, and mixed problems in stochastic differential equations. In these scenarios, while the governing equations are known, the available data consist of only a limited set of snapshots for system parameters. Our mode… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: 24 pages

  45. arXiv:2311.01454  [pdf, other

    cs.RO cs.AI

    NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

    Authors: Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, ** Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu

    Abstract: We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals. Through this interface, humans communicate their intended objects of interest and actions to the robots using electroencephalography (EEG). Our novel system demonstrates success in an exp… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  46. arXiv:2310.19192  [pdf, other

    q-bio.NC cs.LG stat.ML

    Emergence of Grid-like Representations by Training Recurrent Networks with Conformal Normalization

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: Grid cells in the entorhinal cortex of mammalian brains exhibit striking hexagon grid firing patterns in their response maps as the animal (e.g., a rat) navigates in a 2D open environment. In this paper, we study the emergence of the hexagon grid patterns of grid cells based on a general recurrent neural network (RNN) model that captures the navigation process. The responses of grid cells collecti… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

  47. arXiv:2310.12415  [pdf, other

    cs.SE

    SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum

    Authors: Yi Song, Xihao Zhang, Xiaoyuan Xie, Songqiang Chen, Quanming Liu, Ruizhi Gao

    Abstract: Failure indexing is a longstanding crux in software testing and debugging, the goal of which is to automatically divide failures (e.g., failed test cases) into distinct groups according to the culprit root causes, as such multiple faults in a faulty program can be handled independently and simultaneously. This community has long been plagued by two challenges: 1) The effectiveness of division is s… ▽ More

    Submitted 2 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

  48. arXiv:2310.12026  [pdf, other

    stat.ML cs.LG stat.AP

    Nonparametric Discrete Choice Experiments with Machine Learning Guided Adaptive Design

    Authors: Mingzhang Yin, Ruijiang Gao, Weiran Lin, Steven M. Shugan

    Abstract: Designing products to meet consumers' preferences is essential for a business's success. We propose the Gradient-based Survey (GBS), a discrete choice experiment for multiattribute product design. The experiment elicits consumer preferences through a sequence of paired comparisons for partial profiles. GBS adaptively constructs paired comparison questions based on the respondents' previous choices… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  49. arXiv:2310.11163  [pdf, other

    cs.CL

    IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing Interactive Machine Translation Systems

    Authors: Xu Huang, Zhirui Zhang, Ruize Gao, Yichao Du, Lemao Liu, Gou** Huang, Shuming Shi, Jiajun Chen, Shujian Huang

    Abstract: We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform that enables researchers to quickly build IMT systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. IMTLab treats the whole interactive translation process as a task-oriented dialogue with a human-in-the-loop setting, in which human intervention… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP2023

  50. arXiv:2310.08824  [pdf, other

    cs.HC stat.ML

    Confounding-Robust Policy Improvement with Human-AI Teams

    Authors: Ruijiang Gao, Mingzhang Yin

    Abstract: Human-AI collaboration has the potential to transform various domains by leveraging the complementary strengths of human experts and Artificial Intelligence (AI) systems. However, unobserved confounding can undermine the effectiveness of this collaboration, leading to biased and unreliable outcomes. In this paper, we propose a novel solution to address unobserved confounding in human-AI collaborat… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 24 pages