Skip to main content

Showing 1–50 of 180 results for author: Cai, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  2. arXiv:2406.19154  [pdf

    cs.LG physics.ao-ph

    Advancing operational PM2.5 forecasting with dual deep neural networks (D-DNet)

    Authors: Shengjuan Cai, Fangxin Fang, Vincent-Henri Peuch, Mihai Alexe, Ionel Michael Navon, Yanghua Wang

    Abstract: PM2.5 forecasting is crucial for public health, air quality management, and policy development. Traditional physics-based models are computationally demanding and slow to adapt to real-time conditions. Deep learning models show potential in efficiency but still suffer from accuracy loss over time due to error accumulation. To address these challenges, we propose a dual deep neural network (D-DNet)… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.15782  [pdf, other

    cs.SC cs.LO

    A Local Search Algorithm for MaxSMT(LIA)

    Authors: Xiang He, Bohan Li, Mengyu Zhao, Shaowei Cai

    Abstract: MaxSAT modulo theories (MaxSMT) is an important generalization of Satisfiability modulo theories (SMT) with various applications. In this paper, we focus on MaxSMT with the background theory of Linear Integer Arithmetic, denoted as MaxSMT(LIA). We design the first local search algorithm for MaxSMT(LIA) called PairLS, based on the following novel ideas. A novel operator called pairwise operator is… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  4. arXiv:2406.11503  [pdf, other

    cs.CV cs.CL

    GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

    Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

    Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.08152  [pdf, other

    cs.CV

    CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

    Authors: Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jie** Ye

    Abstract: The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two fram… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  6. arXiv:2406.04523  [pdf, other

    cs.CL cs.LG

    Proofread: Fixes All Errors with One Tap

    Authors: Renjie Liu, Yanxiang Zhang, Yun Zhu, Haicheng Sun, Yuanbo Zhang, Michael Xuelin Huang, Shanqing Cai, Lei Meng, Shumin Zhai

    Abstract: The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' ty** experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to mode… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 2 tables

  7. arXiv:2405.17414  [pdf, other

    cs.CV cs.GR

    Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

    Authors: Zhengfei Kuang, Shengqu Cai, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein

    Abstract: Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  8. arXiv:2405.09883  [pdf, other

    cs.CV

    RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

    Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, **gkuan Song, Jie** Ye

    Abstract: We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within… ▽ More

    Submitted 19 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Technical report. 32 pages, 21 figures, 13 tables. https://github.com/xiaosu-zhu/RoScenes

  9. arXiv:2405.09111  [pdf, other

    cs.RO cs.AI

    CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving

    Authors: Dechen Gao, Shuangyu Cai, Hanchu Zhou, Hang Wang, Iman Soltani, Junshan Zhang

    Abstract: To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform fo… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Dechen Gao, Shuangyu Cai, Hanchu Zhou, Hang Wang contributed equally

  10. arXiv:2405.03946  [pdf

    cs.SI

    Association between centrality and flourishing trait: analyzing student co-occurrence networks drawn from dining activities

    Authors: Yi Cao, Shimin Cai, Xiaorong Shen, Tao Zhou

    Abstract: Comprehending the association between social capabilities and individual psychological traits is paramount for educational administrators. Presently, many studies heavily depend on online questionnaires and self-reported data, while analysis of the connection between offline social networks and mental health status remains scarce. By leveraging a public dataset encompassing on-campus dining activi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures, 1 Table

  11. arXiv:2405.03924  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: An AI-powered Autonomous Data System

    Authors: Beng Chin Ooi, Shaofeng Cai, Gang Chen, Kian Lee Tan, Yuncheng Wu, Xiaokui Xiao, Naili Xing, Cong Yue, Lingze Zeng, Meihui Zhang, Zhanhao Zhao

    Abstract: In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, sel… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  12. arXiv:2405.00568  [pdf, other

    cs.DB cs.AI

    Powering In-Database Dynamic Model Slicing for Structured Data Analytics

    Authors: Lingze Zeng, Naili Xing, Shaofeng Cai, Gang Chen, Beng Chin Ooi, Jian Pei, Yuncheng Wu

    Abstract: Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  13. arXiv:2405.00482  [pdf, other

    cs.CR cs.LG

    PackVFL: Efficient HE Packing for Vertical Federated Learning

    Authors: Liu Yang, Shuowei Cai, Di Chai, Junxue Zhang, Han Tian, Yilun **, Kun Guo, Kai Chen, Qiang Yang

    Abstract: As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartex… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 12 pages excluding references

  14. arXiv:2404.16387  [pdf, other

    cs.LO

    Revisiting Restarts of CDCL: Should the Search Information be Preserved?

    Authors: Xindi Zhang, Zhihan Chen, Shaowei Cai

    Abstract: SAT solvers are indispensable in formal verification for hardware and software with many important applications. CDCL is the most widely used framework for modern SAT solvers, and restart is an essential technique of CDCL. When restarting, CDCL solvers cancel the current variable assignment while maintaining the branching order, variable phases, and learnt clauses. This type of restart is referred… ▽ More

    Submitted 27 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  15. arXiv:2404.09654  [pdf, other

    cs.CV cs.MM

    Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Junran Wu

    Abstract: Large vision-language models (LVLMs) are markedly proficient in deriving visual representations guided by natural language. Recent explorations have utilized LVLMs to tackle zero-shot visual anomaly detection (VAD) challenges by pairing images with textual descriptions indicative of normal and abnormal conditions, referred to as anomaly prompts. However, existing approaches depend on static anomal… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.08412  [pdf, other

    physics.flu-dyn cs.AI

    PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction

    Authors: Siming Shan, Pengkai Wang, Song Chen, Jiaxu Liu, Chao Xu, Shengze Cai

    Abstract: The use of machine learning in fluid dynamics is becoming more common to expedite the computation when solving forward and inverse problems of partial differential equations. Yet, a notable challenge with existing convolutional neural network (CNN)-based methods for data fidelity enhancement is their reliance on specific low-fidelity data patterns and distributions during the training phase. In ad… ▽ More

    Submitted 9 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 22 pages

  17. arXiv:2403.19501  [pdf, other

    cs.CV

    RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method

    Authors: Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang

    Abstract: Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR2024, Project website: http://www.lidarhumanmotion.net/reli11d/

  18. arXiv:2403.14346  [pdf, other

    cs.CV

    Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images

    Authors: Yujian Liu, Ruoxuan Wu, Xinjie Shen, Zihuang Lu, Lingyu Liang, Haiyu Zhou, Shipu Xu, Shaoai Cai, Shidang Xu

    Abstract: In the realm of digital pathology, multi-magnification Multiple Instance Learning (multi-mag MIL) has proven effective in leveraging the hierarchical structure of Whole Slide Images (WSIs) to reduce information loss and redundant data. However, current methods fall short in bridging the domain gap between pretrained models and medical imaging, and often fail to account for spatial relationships ac… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 14 pages, 7 figures

  19. arXiv:2403.14135  [pdf, other

    eess.IV cs.CV

    Powerful Lossy Compression for Noisy Images

    Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

    Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  20. arXiv:2403.10318  [pdf, other

    cs.LG

    Anytime Neural Architecture Search on Tabular Data

    Authors: Naili Xing, Shaofeng Cai, Zhao**g Luo, Beng Chin Ooi, Jian Pei

    Abstract: The increasing demand for tabular data analysis calls for transitioning from manual architecture design to Neural Architecture Search (NAS). This transition demands an efficient and responsive anytime NAS approach that is capable of returning current optimal architectures within any given time budget while progressively enhancing architecture quality with increased budget allocation. However, the… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  21. arXiv:2403.06568  [pdf, other

    cs.AI

    Better Understandings and Configurations in MaxSAT Local Search Solvers via Anytime Performance Analysis

    Authors: Furong Ye, Chuan Luo, Shaowei Cai

    Abstract: Though numerous solvers have been proposed for the MaxSAT problem, and the benchmark environment such as MaxSAT Evaluations provides a platform for the comparison of the state-of-the-art solvers, existing assessments were usually evaluated based on the quality, e.g., fitness, of the best-found solutions obtained within a given running time budget. However, concerning solely the final obtained solu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  22. arXiv:2403.05182  [pdf, other

    cs.HC cs.GR

    ViboPneumo: A Vibratory-Pneumatic Finger-Worn Haptic Device for Altering Perceived Texture Roughness in Mixed Reality

    Authors: Shaoyu Cai, Zhenlin Chen, Haichen Gao, Ya Huang, Qi Zhang, Xinge Yu, Kening Zhu

    Abstract: Extensive research has been done in haptic feedback for texture simulation in virtual reality (VR). However, it is challenging to modify the perceived tactile texture of existing physical objects which usually serve as anchors for virtual objects in mixed reality (MR). In this paper, we present ViboPneumo, a finger-worn haptic device that uses vibratory-pneumatic feedback to modulate (i.e., increa… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures

  23. arXiv:2403.01414  [pdf, other

    cs.CV

    Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

    Authors: Yujie Lu, Long Wan, Nayu Ding, Yulong Wang, Shuhan Shen, Shen Cai, Lin Gao

    Abstract: Neural implicit representation of geometric shapes has witnessed considerable advancements in recent years. However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned distance field (UDF) for arbitrary shapes, routinely suffer from degradation of reconstruction accuracy when converting to explicit surface points and mes… ▽ More

    Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR 2024

  24. arXiv:2402.18008  [pdf, other

    cs.CV

    Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations

    Authors: Shen Cai, Zhanhao Wu, Lingxi Guo, Jiachun Wang, Siyu Zhang, Junchi Yan, Shuhan Shen

    Abstract: In this paper, we present two fast and interpretable decomposition methods for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and Affine-Core-Affine (ACA) transformations respectively. Under the minimal $4$-point configuration, the first and the last similarity transformations in SKS are computed by two anchor points on target and source planes, respectively. Then, the other two… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  25. arXiv:2402.10705  [pdf, other

    cs.AI

    AutoSAT: Automatically Optimize SAT Solvers via Large Language Models

    Authors: Yiwen Sun, Xianyin Zhang, Shiyu Huang, Shaowei Cai, BingZhen Zhang, Ke Wei

    Abstract: Heuristics are crucial in SAT solvers, but no heuristic rules are suitable for all SAT problems. Therefore, it is helpful to refine specific heuristics for specific problems. In this context, we present AutoSAT, a novel framework for automatically optimizing heuristics in SAT solvers. AutoSAT is based on Large Language Models (LLMs) which is able to autonomously generate codes, conduct evaluation,… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  26. arXiv:2402.06158  [pdf, other

    cs.DS cs.AI cs.IR

    Assortment Planning with Sponsored Products

    Authors: Shaojie Tang, Shuzhang Cai, **g Yuan, Kai Han

    Abstract: In the rapidly evolving landscape of retail, assortment planning plays a crucial role in determining the success of a business. With the rise of sponsored products and their increasing prominence in online marketplaces, retailers face new challenges in effectively managing their product assortment in the presence of sponsored products. Remarkably, previous research in assortment planning largely o… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  27. arXiv:2401.11740  [pdf, other

    cs.CV cs.LG

    Multi-level Cross-modal Alignment for Image Clustering

    Authors: Li** Qiu, Qin Zhang, Xiaojun Chen, Shaotian Cai

    Abstract: Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pre-training model could produce poor-quality pseudo-labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel \textbf{Multi-level Cross-modal Alignmen… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  28. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

    Authors: Susan Lin, Jeremy Warner, J. D. Zamfirescu-Pereira, Matthew G. Lee, Sauhard Jain, Michael Xuelin Huang, Piyawat Lertvittayakumjorn, Shanqing Cai, Shumin Zhai, Björn Hartmann, Can Liu

    Abstract: Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates key… ▽ More

    Submitted 7 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: To appear at ACM CHI 2024

  29. METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Wenqiao Zhang

    Abstract: Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams.… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  30. arXiv:2312.14574  [pdf, other

    cs.CV cs.LG

    MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

    Authors: Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, Xiaoxiao Li

    Abstract: Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are rele… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  31. arXiv:2312.14327  [pdf, other

    cs.CL

    Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion

    Authors: Katrin Tomanek, Shanqing Cai, Subhashini Venugopalan

    Abstract: Abbreviation expansion is a strategy used to speed up communication by limiting the amount of ty** and using a language model to suggest expansions. Here we look at personalizing a Large Language Model's (LLM) suggestions based on prior conversations to enhance the relevance of predictions, particularly when the user data is small (~1000 samples). Specifically, we compare fine-tuning, prompt-tun… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  32. arXiv:2312.13530  [pdf, other

    cs.CR cs.AI cs.LG

    HW-V2W-Map: Hardware Vulnerability to Weakness Map** Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion

    Authors: Yu-Zheng Lin, Muntasir Mamun, Muhtasim Alam Chowdhury, Shuyu Cai, Mingyu Zhu, Banafsheh Saber Latibari, Kevin Immanuel Gubbi, Najmeh Nazari Bavarsad, Arjun Caputo, Avesta Sasan, Houman Homayoun, Setareh Rafatirad, Pratik Satam, Soheil Salehi

    Abstract: The escalating complexity of modern computing frameworks has resulted in a surge in the cybersecurity vulnerabilities reported to the National Vulnerability Database (NVD) by practitioners. Despite the fact that the stature of NVD is one of the most significant databases for the latest insights into vulnerabilities, extracting meaningful trends from such a large amount of unstructured data is stil… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 22 pages, 10 pages appendix, 10 figures, Submitted to ACM TODAES

  33. arXiv:2312.01532  [pdf, other

    cs.HC cs.CL

    Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

    Authors: Shanqing Cai, Subhashini Venugopalan, Katie Seaver, Xiang Xiao, Katrin Tomanek, Sri Jalasutram, Meredith Ringel Morris, Shaun Kane, Ajit Narayanan, Robert L. MacDonald, Emily Kornman, Daniel Vance, Blair Casey, Steve M. Gleason, Philip Q. Nelson, Michael P. Brenner

    Abstract: Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thi… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  34. arXiv:2312.01409  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

    Authors: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein

    Abstract: Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hinderin… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Project page: https://primecai.github.io/generative_rendering/

  35. arXiv:2311.14249  [pdf, other

    cs.SC

    Efficient Local Search for Nonlinear Real Arithmetic

    Authors: Zhonghan Wang, Bohua Zhan, Bohan Li, Shaowei Cai

    Abstract: Local search has recently been applied to SMT problems over various arithmetic theories. Among these, nonlinear real arithmetic poses special challenges due to its uncountable solution space and potential need to solve higher-degree polynomials. As a consequence, existing work on local search only considered fragments of the theory. In this work, we analyze the difficulties and propose ways to add… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Full version of VMCAI'2024 publication

  36. arXiv:2311.05997  [pdf, other

    cs.AI

    JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

    Authors: Zihao Wang, Shaofei Cai, Anji Liu, Yonggang **, **bing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

    Abstract: Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progr… ▽ More

    Submitted 30 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: update project page

  37. arXiv:2310.08240  [pdf, ps, other

    cs.CL

    Who Said That? Benchmarking Social Media AI Detection

    Authors: Wanyun Cui, Linqiu Zhang, Qianle Wang, Shuyang Cai

    Abstract: AI-generated text has proliferated across various online platforms, offering both transformative prospects and posing significant risks related to misinformation and manipulation. Addressing these challenges, this paper introduces SAID (Social media AI Detection), a novel benchmark developed to assess AI-text detection models' capabilities in real social media platforms. It incorporates real AI-ge… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  38. arXiv:2310.08235  [pdf, other

    cs.AI cs.LG

    GROOT: Learning to Follow Instructions by Watching Gameplay Videos

    Authors: Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

    Abstract: We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while… ▽ More

    Submitted 28 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

  39. arXiv:2310.00029   

    cs.AI cs.GT cs.LG cs.RO

    Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation

    Authors: Zhen Liu, Hang Gao, Hao Ma, Shuo Cai, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong

    Abstract: Autonomous vehicle (AV) evaluation has been the subject of increased interest in recent years both in industry and in academia. This paper focuses on the development of a novel framework for generating adversarial driving behavior of background vehicle interfering against the AV to expose effective and rational risky events. Specifically, the adversarial behavior is learned by a reinforcement lear… ▽ More

    Submitted 14 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: We find there is expression error in III.A. A correction edition will be offered

  40. arXiv:2309.15940  [pdf, other

    cs.RO cs.CV

    Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

    Authors: Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric **g, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias

    Abstract: We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG. This paper has been accepted by CoRL2023

  41. arXiv:2308.14774  [pdf, other

    eess.AS cs.SD eess.SP q-bio.QM

    EEG-Derived Voice Signature for Attended Speaker Detection

    Authors: Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

    Abstract: \textit{Objective:} Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of t… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 8 pages, 2 figures

  42. arXiv:2308.00929  [pdf, other

    cs.CV

    Towards Discriminative Representation with Meta-learning for Colonoscopic Polyp Re-Identification

    Authors: Suncheng Xiang, Qingzhong Chen, Shilun Cai, Chengfeng Zhou, Crystal Cai, Sijia Du, Zhengjie Zhang, Yunshi Zhong, Dahong Qian

    Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory ret… ▽ More

    Submitted 28 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  43. arXiv:2307.08576  [pdf

    q-bio.NC cs.LG

    A Study on the Performance of Generative Pre-trained Transformer (GPT) in Simulating Depressed Individuals on the Standardized Depressive Symptom Scale

    Authors: Si** Cai, Nanfeng Zhang, Jiaying Zhu, Yanjie Liu, Yong** Zhou

    Abstract: Background: Depression is a common mental disorder with societal and economic burden. Current diagnosis relies on self-reports and assessment scales, which have reliability issues. Objective approaches are needed for diagnosing depression. Objective: Evaluate the potential of GPT technology in diagnosing depression. Assess its ability to simulate individuals with depression and investigate the inf… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  44. arXiv:2307.02599  [pdf, other

    cs.CL cs.AI

    Evade ChatGPT Detectors via A Single Space

    Authors: Shuyang Cai, Wanyun Cui

    Abstract: ChatGPT brings revolutionary social value but also raises concerns about the misuse of AI-generated text. Consequently, an important question is how to detect whether texts are generated by ChatGPT or by human. Existing detectors are built upon the assumption that there are distributional gaps between human-generated and AI-generated text. These gaps are typically identified using statistical info… ▽ More

    Submitted 13 October, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

  45. arXiv:2305.15159  [pdf, other

    cs.IR

    Collaborative Recommendation Model Based on Multi-modal Multi-view Attention Network: Movie and literature cases

    Authors: Zheng Hu, Shi-Min Cai, Jun Wang, Tao Zhou

    Abstract: The existing collaborative recommendation models that use multi-modal information emphasize the representation of users' preferences but easily ignore the representation of users' dislikes. Nevertheless, modelling users' dislikes facilitates comprehensively characterizing user profiles. Thus, the representation of users' dislikes should be integrated into the user modelling when we construct a col… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  46. arXiv:2305.15145  [pdf, other

    cs.IR

    Bert4XMR: Cross-Market Recommendation with Bidirectional Encoder Representations from Transformer

    Authors: Zheng Hu, Satoshi Nakagawa, Shi-Min Cai, Fuji Ren

    Abstract: Real-world multinational e-commerce companies, such as Amazon and eBay, serve in multiple countries and regions. Some markets are data-scarce, while others are data-rich. In recent years, cross-market recommendation (XMR) has been proposed to bolster data-scarce markets by leveraging auxiliary information from data-rich markets. Previous XMR algorithms have employed techniques such as sharing bott… ▽ More

    Submitted 26 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  47. arXiv:2305.15030  [pdf, other

    cs.CV eess.IV

    Make Lossy Compression Meaningful for Low-Light Images

    Authors: Shilv Cai, Liqun Chen, Sheng Zhong, Luxin Yan, Jiahuan Zhou, Xu Zou

    Abstract: Low-light images frequently occur due to unavoidable environmental influences or technical limitations, such as insufficient lighting or limited exposure time. To achieve better visibility for visual perception, low-light image enhancement is usually adopted. Besides, lossy image compression is vital for meeting the requirements of storage and transmission in computer vision applications. To touch… ▽ More

    Submitted 24 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by AAAI 2024

    ACM Class: I.4.2; I.4.3

  48. arXiv:2305.00188  [pdf, other

    math.OC cs.AI

    New Characterizations and Efficient Local Search for General Integer Linear Programming

    Authors: Peng Lin, Shaowei Cai, Mengchuan Zou, **kun Lin

    Abstract: Integer linear programming (ILP) models a wide range of practical combinatorial optimization problems and significantly impacts industry and management sectors. This work proposes new characterizations of ILP with the concept of boundary solutions. Motivated by the new characterizations, we develop a new local search algorithm Local-ILP, which is efficient for solving general ILP validated on a la… ▽ More

    Submitted 1 March, 2024; v1 submitted 29 April, 2023; originally announced May 2023.

    MSC Class: 90C10 (Primary); 90C06 (Secondary) ACM Class: I.2.8; G.2.0

  49. arXiv:2304.08051  [pdf, other

    math.OC cs.LG cs.MA

    Accelerated Distributed Aggregative Optimization

    Authors: Jiaxu Liu, Song Chen, Shengze Cai, Chao Xu

    Abstract: In this paper, we investigate a distributed aggregative optimization problem in a network, where each agent has its own local cost function which depends not only on the local state variable but also on an aggregated function of state variables from all agents. To accelerate the optimization process, we combine heavy ball and Nesterov's accelerated methods with distributed aggregative gradient tra… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  50. arXiv:2303.15671  [pdf, other

    cs.CV

    Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval

    Authors: Qingzhong Chen, Shilun Cai, Crystal Cai, Zefang Yu, Dahong Qian, Suncheng Xiang

    Abstract: Colonoscopic video retrieval, which is a critical part of polyp treatment, has great clinical significance for the prevention and treatment of colorectal cancer. However, retrieval models trained on action recognition datasets usually produce unsatisfactory retrieval results on colonoscopic datasets due to the large domain gap between them. To seek a solution to this problem, we construct a large-… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by ICME 2023