Skip to main content

Showing 1–50 of 474 results for author: Cai, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02203  [pdf, other

    cs.CL cs.AI

    Automatic Adaptation Rule Optimization via Large Language Models

    Authors: Yusei Ishimizu, Jialong Li, **glue Xu, **yu Cai, Hitoshi Iba, Kenji Tei

    Abstract: Rule-based adaptation is a foundational approach to self-adaptation, characterized by its human readability and rapid response. However, building high-performance and robust adaptation rules is often a challenge because it essentially involves searching the optimal design in a complex (variables) space. In response, this paper attempt to employ large language models (LLMs) as a optimizer to constr… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.00908  [pdf, other

    cs.CL cs.AI

    FineSurE: Fine-grained Summarization Evaluation using LLMs

    Authors: Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour

    Abstract: Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can onl… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 (main, long)

  3. arXiv:2406.19435  [pdf, other

    cs.CV

    A Sanity Check for AI-generated Image Detection

    Authors: Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

    Abstract: With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://shilinyan99.github.io/AIDE Code: https://github.com/shilinyan99/AIDE

  4. arXiv:2406.14927  [pdf, other

    cs.CV cs.RO

    Gaussian-Informed Continuum for Physical Property Identification and Simulation

    Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

    Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  5. arXiv:2406.12846  [pdf, other

    cs.CV

    DrVideo: Document Retrieval Based Long Video Understanding

    Authors: Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai

    Abstract: Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: difficulty in locating key information and performing long-range reasoning. Thus, we propose DrVideo, a document-retrieval-based system designed for long… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 11 pages

  6. arXiv:2406.09680  [pdf, other

    cs.LG cs.DC

    Heterogeneous Federated Learning with Convolutional and Spiking Neural Networks

    Authors: Yingchao Yu, Yu** Yan, Jisong Cai, Yaochu **

    Abstract: Federated learning (FL) has emerged as a promising paradigm for training models on decentralized data while safeguarding data privacy. Most existing FL systems, however, assume that all machine learning models are of the same type, although it becomes more likely that different edge devices adopt different types of AI models, including both conventional analogue artificial neural networks (ANNs) a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, FL@FM-IJCAI'24

  7. arXiv:2406.09041  [pdf, other

    cs.CL cs.AI cs.LG

    ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

    Authors: **g Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang

    Abstract: The typical process for develo** LLMs involves pre-training a general foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts. Serving these experts poses challenges, as loading all experts onto devices is impractical, and frequent switching between experts in response to user requests incurs substantial I/O costs, increasing latency and expe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Tech report

  8. arXiv:2406.05641  [pdf, other

    cs.CV

    PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction

    Authors: Shangyu Chen, Zizheng Pan, Jianfei Cai, Dinh Phung

    Abstract: Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with divers… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  9. arXiv:2406.05588  [pdf, other

    cs.CL cs.AI cs.LG

    CERET: Cost-Effective Extrinsic Refinement for Text Generation

    Authors: Jason Cai, Hang Su, Monica Sunkara, Igor Shalyminov, Saab Mansour

    Abstract: Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: The source code and data samples are released at https://github.com/amazon-science/CERET-LLM-refine

  10. arXiv:2406.04101  [pdf, other

    cs.CV

    How Far Can We Compress Instant-NGP-Based NeRF?

    Authors: Yihang Chen, Qianyi Wu, Mehrtash Harandi, Jianfei Cai

    Abstract: In recent years, Neural Radiance Field (NeRF) has demonstrated remarkable capabilities in representing 3D scenes. To expedite the rendering process, learnable explicit representations have been introduced for combination with implicit NeRF representation, which however results in a large storage space requirement. In this paper, we introduce the Context-based NeRF Compression (CNC) framework, whic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://yihangchen-ee.github.io/project_cnc/ Code: https://github.com/yihangchen-ee/cnc/. We further propose a 3DGS compression method HAC, which is based on CNC: https://yihangchen-ee.github.io/project_hac/

    Journal ref: CVPR 2024

  11. arXiv:2406.00985  [pdf, other

    cs.CV

    MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models

    Authors: Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, Siwei Lyu

    Abstract: Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially f… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  12. arXiv:2405.09463  [pdf, other

    cs.CV

    Gaze-DETR: Using Expert Gaze to Reduce False Positives in Vulvovaginal Candidiasis Screening

    Authors: Yan Kong, Sheng Wang, Jiangdong Cai, Zihao Zhao, Zhenrong Shen, Yonghao Li, Manman Fei, Qian Wang

    Abstract: Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positi… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: MICCAI-2024 early accept. Our code is available at https://github.com/YanKong0408/Gaze-DETR

  13. arXiv:2405.09153  [pdf, other

    cs.CL cs.LG

    Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser

    Authors: Jon Z. Cai, Kristin Wright-Bettner, Martha Palmer, Guergana K. Savova, James H. Martin

    Abstract: This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME)… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to the 6th Clinical NLP Workshop at NAACL, 2024

  14. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  15. arXiv:2405.03806  [pdf, other

    cs.HC

    In Situ AI Prototy**: Infusing Multimodal Prompts into Mobile Settings with MobileMaker

    Authors: Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Vivian Tsai, Michael Terry, Carrie J. Cai

    Abstract: Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototy** AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-c… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  16. arXiv:2405.02876  [pdf, ps, other

    cs.NE cs.LG

    Exploring the Improvement of Evolutionary Computation via Large Language Models

    Authors: **yu Cai, **glue Xu, Jialong Li, Takuto Ymauchi, Hitoshi Iba, Kenji Tei

    Abstract: Evolutionary computation (EC), as a powerful optimization algorithm, has been applied across various domains. However, as the complexity of problems increases, the limitations of EC have become more apparent. The advent of large language models (LLMs) has not only transformed natural language processing but also extended their capabilities to diverse fields. By harnessing LLMs' vast knowledge and… ▽ More

    Submitted 23 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: accepted by GECCO 2024

  17. arXiv:2405.02858  [pdf, ps, other

    cs.SI cs.CL

    Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation

    Authors: **yu Cai, Jialong Li, Mingyue Zhang, Munan Li, Chen-Shu Wang, Kenji Tei

    Abstract: Social media platforms such as Twitter, Reddit, and Sina Weibo play a crucial role in global communication but often encounter strict regulations in geopolitically sensitive regions. This situation has prompted users to ingeniously modify their way of communicating, frequently resorting to coded language in these regulated social media environments. This shift in communication is not merely a stra… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE WCCI 2024

  18. arXiv:2405.01047  [pdf, ps, other

    math.OC cs.GT

    Optimal Pricing for Linear-Quadratic Games with Nonlinear Interaction Between Agents

    Authors: Jiamin Cai, Chenyue Zhang, Hoi-To Wai

    Abstract: This paper studies a class of network games with linear-quadratic payoffs and externalities exerted through a strictly concave interaction function. This class of game is motivated by the diminishing marginal effects with peer influences. We analyze the optimal pricing strategy for this class of network game. First, we prove the existence of a unique Nash Equilibrium (NE). Second, we study the opt… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 7 pages, 2 figures, accepted by IEEE Control Systems Letters

  19. arXiv:2404.18033  [pdf, other

    cs.CV

    Exposing Text-Image Inconsistency Using Diffusion Models

    Authors: Mingzhen Huang, Shan Jia, Zhou Zhou, Yan Ju, Jialing Cai, Siwei Lyu

    Abstract: In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  20. arXiv:2404.14305  [pdf, other

    cs.HC

    "I Upload...All Types of Different Things to Say, the World of Blindness Is More Than What They Think It Is": A Study of Blind TikTokers' Identity Work from a Flourishing Perspective

    Authors: Yao Lyu, Jie Cai, Bryan Dosono, Davis Yadav, John M. Carroll

    Abstract: Identity work in Human-Computer Interaction (HCI) has focused on the marginalized group to explore designs to support their asset (what they have). However, little has been explored specifically on the identity work of people with disabilities, specifically, visual impairments. In this study, we interviewed 45 BlindTokers (blind users on TikTok) from various backgrounds to understand their identit… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: ACM CSCW

  21. arXiv:2404.12759  [pdf, other

    cs.LG

    decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

    Authors: Yi Guo, Fanliu Kong, Xiaoyang Li, Hui Li, Wei Chen, Xiaogang Tian, **** Cai, Yang Zhang, Shouda Liu

    Abstract: Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of the overhead inside a large model, weight only quantization can lead to large gains. However, existing quantization schemes suffer from significant accuracy degr… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: quantization for deep models

  22. arXiv:2404.09000  [pdf, other

    eess.IV cs.CV cs.LG

    MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images

    Authors: Yingjie Xi, Boyuan Cheng, **gyao Cai, Jian Jun Zhang, Xiaosong Yang

    Abstract: The human whole-body X-rays could offer a valuable reference for various applications, including medical diagnostics, digital animation modeling, and ergonomic design. The traditional method of obtaining X-ray information requires the use of CT (Computed Tomography) scan machines, which emit potentially harmful radiation. Thus it faces a significant limitation for realistic applications because it… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  23. arXiv:2404.07949  [pdf, other

    cs.CV

    Taming Stable Diffusion for Text to 360° Panorama Image Generation

    Authors: Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

    Abstract: Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to gen… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project Page: https://chengzhag.github.io/publication/panfusion Code: https://github.com/chengzhag/PanFusion

  24. "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

    Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

    Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Journal ref: "We Need Structured Output": Towards User-centered Constraints on LLM Output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), May 11-16, 2024, Honolulu, HI, USA

  25. arXiv:2404.06395  [pdf, other

    cs.CL cs.LG

    MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

    Authors: Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The burgeoning interest in develo** Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: revise according to peer review

  26. arXiv:2404.05016  [pdf, other

    cs.CV

    Hyperbolic Learning with Synthetic Captions for Open-World Detection

    Authors: Fanjie Kong, Yanbei Chen, Jiarui Cai, Davide Modolo

    Abstract: Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are extremely expensive to collect. Instead, we propose to transfer knowledge from vision-language models (VLMs) to enrich the open-vocabulary description… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  27. arXiv:2404.04629  [pdf, other

    cs.CV

    DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

    Authors: Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

    Abstract: Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains. However, their potential in multi-sensor fusion remains largely unexplored. In this work, we introduce DifFUSER, a novel approach that leverages diffusion models for multi-modal fusion in 3D object detection and BEV map segmentation. Benefiting from the i… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 23 pages

  28. arXiv:2404.01686  [pdf, other

    cs.CV

    JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

    Authors: Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

    Abstract: Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data from multiple sensors and are required to recognize numerous objects and their movements in complex human-crowded settings. Traditional benchmarks, w… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  29. arXiv:2404.01078  [pdf, other

    cs.LG

    Energy-based Model for Accurate Shapley Value Estimation in Interpretable Deep Learning Predictive Modeling

    Authors: Cheng Lu, Jiusun Zeng, Yu Xia, **hui Cai, Shihua Luo

    Abstract: As a favorable tool for explainable artificial intelligence (XAI), Shapley value has been widely used to interpret deep learning based predictive models. However, accurate and efficient estimation of Shapley value is difficult since the computation load grows exponentially with the increase of input features. Most existing accelerated estimation methods have to compromise on estimation accuracy wi… ▽ More

    Submitted 5 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  30. arXiv:2404.00269  [pdf, other

    cs.CV

    IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

    Authors: Yushuang Wu, Luyue Shi, Junhao Cai, Weihao Yuan, Lingteng Qiu, Zilong Dong, Liefeng Bo, Shuguang Cui, Xiaoguang Han

    Abstract: Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes im… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  31. arXiv:2403.19902  [pdf, other

    cs.CV

    Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification

    Authors: Jianfeng Cai, Yue Ma, Zhixi Feng, Shuyuan Yang

    Abstract: Polarimetric synthetic aperture radar (PolSAR) image interpretation is widely used in various fields. Recently, deep learning has made significant progress in PolSAR image classification. Supervised learning (SL) requires a large amount of labeled PolSAR data with high quality to achieve better performance, however, manually labeled data is insufficient. This causes the SL to fail into overfitting… ▽ More

    Submitted 3 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  32. arXiv:2403.19213  [pdf, other

    cs.CV

    Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting

    Authors: Weihao Jiang, Zhaozhi Xie, Yuxiang Lu, Longjie Qi, **gyong Cai, Hiroyuki Uchiyama, Bin Chen, Yue Ding, Hongtao Lu

    Abstract: Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes su… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  33. arXiv:2403.15407  [pdf, other

    cs.CL cs.AI

    X-AMR Annotation Tool

    Authors: Shafiuddin Rehan Ahmed, Jon Z. Cai, Martha Palmer, James H. Martin

    Abstract: This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting a… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: EACL 2024 System Demonstration

  34. arXiv:2403.14627  [pdf, other

    cs.CV

    MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

    Authors: Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

    Abstract: We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane swee** in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian prim… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://donydchen.github.io/mvsplat Code: https://github.com/donydchen/mvsplat

  35. arXiv:2403.14530  [pdf, other

    cs.CV

    HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

    Authors: Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. T… ▽ More

    Submitted 2 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Project Page: https://yihangchen-ee.github.io/project_hac/ Code: https://github.com/YihangChen-ee/HAC

  36. arXiv:2403.13417  [pdf, other

    cs.CV

    Diversified and Personalized Multi-rater Medical Image Segmentation

    Authors: Yicheng Wu, Xiangde Luo, Zhe Xu, Xiaoqing Guo, Lie Ju, Zongyuan Ge, Wenjun Liao, Jianfei Cai

    Abstract: Annotation ambiguity due to inherent data uncertainties such as blurred boundaries in medical scans and different observer expertise and preferences has become a major obstacle for training deep-learning based medical image segmentation models. To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentati… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  37. arXiv:2403.12396  [pdf, other

    cs.CV cs.RO

    OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

    Authors: Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

    Abstract: This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2403.11544  [pdf, ps, other

    cs.LG

    RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model

    Authors: Junyi Fan, Yuxuan Han, Jialin Zeng, Jian-Feng Cai, Yang Wang, Yang Xiang, Jiheng Zhang

    Abstract: Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  39. arXiv:2403.10191  [pdf, other

    cs.CV

    Generative Region-Language Pretraining for Open-Ended Object Detection

    Authors: Chuang Lin, Yi Jiang, Lizhen Qu, Zehuan Yuan, Jianfei Cai

    Abstract: In recent research, significant attention has been devoted to the open-vocabulary object detection task, aiming to generalize beyond the limited number of classes labeled during training and detect objects described by arbitrary category names at inference. Compared with conventional object detection, open vocabulary object detection largely extends the object detection categories. However, it rel… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  40. arXiv:2403.07942  [pdf, other

    cs.CR cs.CV

    Attacking Transformers with Feature Diversity Adversarial Perturbation

    Authors: Chenxing Gao, Hang Zhou, Junqing Yu, YuTeng Ye, Jiale Cai, Junle Wang, Wei Yang

    Abstract: Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  41. A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation

    Authors: Yao Lyu, He Zhang, Shuo Niu, Jie Cai

    Abstract: Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being appli… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted at CHI LBW 2024

  42. Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms

    Authors: Jie Cai, Aashka Patel, Azadeh Naderi, Donghee Yvette Wohn

    Abstract: Social media users may perceive moderation decisions by the platform differently, which can lead to frustration and dropout. This study investigates users' perceived justice and fairness of online moderation decisions when they are exposed to various illegal versus legal scenarios, retributive versus restorative moderation strategies, and user-moderated versus commercially moderated platforms. We… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by CHI LBW 2024

  43. arXiv:2403.04073  [pdf, other

    cs.CL cs.AI

    Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

    Authors: Jianfeng He, Hang Su, Jason Cai, Igor Shalyminov, Hwanjun Song, Saab Mansour

    Abstract: Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 10 figures

  44. arXiv:2402.18936  [pdf, ps, other

    cs.NI eess.SP

    Energy-Efficient UAV Swarm Assisted MEC with Dynamic Clustering and Scheduling

    Authors: Jialiuyuan Li, Jiayuan Chen, Changyan Yi, Tong Zhang, Kun Zhu, Jun Cai

    Abstract: In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically clust… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  45. arXiv:2402.18927  [pdf, other

    cs.CV cs.MM cs.NI

    Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

    Authors: Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai

    Abstract: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  46. arXiv:2402.14707  [pdf, other

    cs.CV

    Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening

    Authors: Zhenrong Shen, Manman Fei, Xin Wang, Jiangdong Cai, Sheng Wang, Lichi Zhang, Qian Wang

    Abstract: Automatic thin-prep cytologic test (TCT) screening can assist pathologists in finding cervical abnormality towards accurate and efficient cervical cancer diagnosis. Current automatic TCT screening systems mostly involve abnormal cervical cell detection, which generally requires large-scale and diverse training data with high-quality annotations to achieve promising performance. Pathological image… ▽ More

    Submitted 25 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  47. arXiv:2402.14167  [pdf, other

    cs.CV cs.LG

    T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

    Authors: Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

    Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling tra… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  48. arXiv:2402.13868  [pdf, other

    cs.DS cs.DM math.CO

    A Uniformly Random Solution to Algorithmic Redistricting

    Authors: **-Yi Cai, Jacob Kruse, Kenneth Mayer, Daniel P. Szabo

    Abstract: The process of drawing electoral district boundaries is known as political redistricting. Within this context, gerrymandering is the practice of drawing these boundaries such that they unfairly favor a particular political party, often leading to unequal representation and skewed electoral outcomes. One of the few ways to detect gerrymandering is by algorithmically sampling redistricting plans. Pr… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 17 Pages

  49. arXiv:2402.12761  [pdf, other

    cs.LG cs.CR

    FGAD: Self-boosted Knowledge Distillation for An Effective Federated Graph Anomaly Detection Framework

    Authors: **yu Cai, Yunhe Zhang, Zhoumin Lu, Wenzhong Guo, See-kiong Ng

    Abstract: Graph anomaly detection (GAD) aims to identify anomalous graphs that significantly deviate from other ones, which has raised growing attention due to the broad existence and complexity of graph-structured data in many real-world scenarios. However, existing GAD methods usually execute with centralized training, which may lead to privacy leakage risk in some sensitive cases, thereby impeding collab… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  50. arXiv:2402.11500  [pdf, other

    cs.GT cs.NI

    A Three-Party Repeated Coalition Formation Game for PLS in Wireless Communications with IRSs

    Authors: Haipeng Zhou, Ruoyang Chen, Changyan Yi, Juan Li, Jun Cai

    Abstract: In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE WCNC 2024