Skip to main content

Showing 1–50 of 302 results for author: Cao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19394  [pdf, other

    cs.CV

    HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

    Authors: Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

    Abstract: Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional sup… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19247  [pdf, other

    cs.CV

    Local Manifold Learning for No-Reference Image Quality Assessment

    Authors: Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

    Abstract: Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often negl… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.17755  [pdf, other

    cs.CL

    Accelerating Clinical Evidence Synthesis with Large Language Models

    Authors: Zifeng Wang, Lang Cao, Benjamin Danek, Yichi Zhang, Qiao **, Zhiyong Lu, Jimeng Sun

    Abstract: Automatic medical discovery by AI is a dream of many. One step toward that goal is to create an AI model to understand clinical studies and synthesize clinical evidence from the literature. Clinical evidence synthesis currently relies on systematic reviews of clinical trials and retrospective analyses from medical literature. However, the rapid expansion of publications presents challenges in effi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.17601  [pdf, other

    cs.CV

    Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

    Authors: Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

    Abstract: Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/imlixinyang/director3d

  5. arXiv:2406.17413  [pdf, other

    cs.CV

    Depth-Guided Semi-Supervised Instance Segmentation

    Authors: Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

    Abstract: Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framewo… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 4 tables

  6. arXiv:2406.14424  [pdf, other

    cs.DC cs.LG

    CascadeServe: Unlocking Model Cascades for Inference Serving

    Authors: Ferdi Kossmann, Ziniu Wu, Alex Turk, Nesime Tatbul, Lei Cao, Samuel Madden

    Abstract: Machine learning (ML) models are increasingly deployed to production, calling for efficient inference serving systems. Efficient inference serving is complicated by two challenges: (i) ML models incur high computational costs, and (ii) the request arrival rates of practical applications have frequent, high, and sudden variations which make it hard to correctly provision hardware. Model cascades ar… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 17 pages, 13 figures

  7. arXiv:2406.13372  [pdf, other

    cs.AI

    Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

    Authors: Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-conne… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  8. arXiv:2406.12178  [pdf, other

    cs.CV

    FCA-RAC: First Cycle Annotated Repetitive Action Counting

    Authors: Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao

    Abstract: Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.08311  [pdf, other

    cs.LG cs.AI

    Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework

    Authors: Ruibo Tu, Zineb Senane, Lele Cao, Cheng Zhang, Hedvig Kjellström, Gustav Eje Henter

    Abstract: Tabular synthesis models remain ineffective at capturing complex dependencies, and the quality of synthetic data is still insufficient for comprehensive downstream tasks, such as prediction under distribution shifts, automated decision-making, and cross-table understanding. A major challenge is the lack of prior knowledge about underlying structures and high-order relationships in tabular data. We… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.05913  [pdf, other

    cs.NI eess.SP

    Revisiting Multi-User Downlink in IEEE 802.11ax: A Designers Guide to MU-MIMO

    Authors: Liu Cao, Lyutianyang Zhang, Sumit Roy, Sian **

    Abstract: Downlink (DL) Multi-User (MU) Multiple Input Multiple Output (MU-MIMO) is a key technology that allows multiple concurrent data transmissions from an Access Point (AP) to a selected sub-set of clients for higher network efficiency in IEEE 802.11ax. However, DL MU-MIMO feature is typically turned off as the default setting in AP vendors' products, that is, turning on the DL MU-MIMO may not help inc… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 7 pages, 6 figures, magazine paper

  11. arXiv:2406.00085  [pdf, other

    eess.IV cs.LG q-bio.NC

    Augmentation-based Unsupervised Cross-Domain Functional MRI Adaptation for Major Depressive Disorder Identification

    Authors: Yunling Ma, Chaojun Zhang, Xiaochuan Wang, Qianqian Wang, Liang Cao, Limei Zhang, Mingxia Liu

    Abstract: Major depressive disorder (MDD) is a common mental disorder that typically affects a person's mood, cognition, behavior, and physical health. Resting-state functional magnetic resonance imaging (rs-fMRI) data are widely used for computer-aided diagnosis of MDD. While multi-site fMRI data can provide more data for training reliable diagnostic models, significant cross-site data heterogeneity would… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  12. arXiv:2405.18810  [pdf, other

    cs.CV cs.AI

    UniPTS: A Unified Framework for Proficient Post-Training Sparsity

    Authors: **g**g Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji

    Abstract: Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  13. arXiv:2405.18706  [pdf, other

    cs.CV

    FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

    Authors: You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji

    Abstract: The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces sta… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  14. arXiv:2405.17596  [pdf, other

    cs.CV

    GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

    Authors: Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

    Abstract: 3D open-vocabulary scene understanding, crucial for advancing augmented reality and robotic applications, involves interpreting and locating specific regions within a 3D space as directed by natural language instructions. To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussia… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project page is available at https://goi-hyperplane.github.io/

  15. arXiv:2405.16412  [pdf, other

    cs.CL cs.LG

    KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

    Authors: Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han

    Abstract: Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-gu… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  16. arXiv:2405.15775  [pdf, other

    cs.RO

    AI Robots and Humanoid AI: Review, Perspectives and Directions

    Authors: Longbing Cao

    Abstract: In the approximately century-long journey of robotics, humanoid robots made their debut around six decades ago. The rapid advancements in generative AI, large language models (LLMs), and large multimodal models (LMMs) have reignited interest in humanoids, steering them towards real-time, interactive, and multimodal designs and applications. This resurgence unveils boundless opportunities for AI ro… ▽ More

    Submitted 19 March, 2024; originally announced May 2024.

    Comments: 37 pages, 5 figures, 1 table

  17. arXiv:2405.15268  [pdf, other

    cs.LG

    ParamReL: Learning Parameter Space Representation via Progressively Encoding Bayesian Flow Networks

    Authors: Zhangkai Wu, Xuhui Fan, ** Li, Zhilin Zhao, Hui Chen, Longbing Cao

    Abstract: The recently proposed Bayesian Flow Networks~(BFNs) show great potential in modeling parameter spaces, offering a unified strategy for handling continuous, discretized, and discrete data. However, BFNs cannot learn high-level semantic representation from the parameter space since {common encoders, which encode data into one static representation, cannot capture semantic changes in parameters.} Thi… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.14696  [pdf, other

    cs.CL cs.AI cs.DB

    A Declarative System for Optimizing AI Workloads

    Authors: Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Gerardo Vitagliano

    Abstract: A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or metrics from image and video corpora. Today's models can accomplish these tasks with high accuracy… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 29 pages, 9 figures

    ACM Class: H.2.3; I.2.5

  19. arXiv:2405.10890  [pdf, other

    astro-ph.IM astro-ph.GA cs.AI

    A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model

    Authors: Mingxiang Fu, Yu Song, Jiameng Lv, Liang Cao, Peng Jia, Nan Li, Xiangru Li, Jifeng Liu, A-Li Luo, Bo Qiu, Shiyin Shen, Liang** Tu, Lili Wang, Shoulin Wei, Haifeng Yang, Zhen** Yi, Zhiqiang Zou

    Abstract: The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. He… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 26 pages, 10 figures, to be published on Chinese Physics C

  20. arXiv:2405.09874  [pdf, other

    cs.CV

    Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

    Authors: Xinyang Li, Zhangyu Lai, Linning Xu, Jianfei Guo, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji

    Abstract: We present Dual3D, a novel text-to-3D generation framework that generates high-quality 3D assets from texts in only $1$ minute.The key component is a dual-mode multi-view latent diffusion model. Given the noisy multi-view latents, the 2D mode can efficiently denoise them with a single latent denoising network, while the 3D mode can generate a tri-plane neural surface for consistent rendering-based… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dual3d.github.io

  21. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  22. Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask

    Authors: Zineb Senane, Lele Cao, Valentin Leonhard Buchner, Yusuke Tashiro, Lei You, Pawel Herman, Mats Nordahl, Ruibo Tu, Vilhelm von Ehrenheim

    Abstract: Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based method… ▽ More

    Submitted 17 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Published as a full paper by KDD 2024 Research Track (12 pages as main paper and 11 pages as appendix). Source code available at https://github.com/llcresearch/TSDE

    ACM Class: G.3; I.6.5; I.2.4

  23. arXiv:2405.00227  [pdf, other

    cs.NI

    Optimized Non-Primary Channel Access Design in IEEE 802.11bn

    Authors: Dongyu Wei, Liu Cao, Lyutianyang Zhang, Xiangyu Gao, Hao Yin

    Abstract: The IEEE 802.11 standards, culminating in IEEE 802.11be (Wi-Fi 7), have significantly expanded bandwidth capacities from 20 MHz to 320 MHz, marking a crucial evolution in wireless access technology. Despite these advancements, the full potential of these capacities remains largely untapped due to inefficiencies in channel management, in particular, the underutilization of secondary (non-primary) c… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 6 pages, 5 figures

  24. arXiv:2404.16033  [pdf, other

    cs.CV cs.CL

    Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

    Authors: Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

    Abstract: With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The project page is available at https://ggg0919.github.io/cantor/

  25. arXiv:2404.15657  [pdf, other

    cs.LG cs.AI

    FedSI: Federated Subnetwork Inference for Efficient Uncertainty Quantification

    Authors: Hui Chen, Hengyu Liu, Zhangkai Wu, Xuhui Fan, Longbing Cao

    Abstract: While deep neural networks (DNNs) based personalized federated learning (PFL) is demanding for addressing data heterogeneity and shows promising performance, existing methods for federated learning (FL) suffer from efficient systematic uncertainty quantification. The Bayesian DNNs-based PFL is usually questioned of either over-simplified model structures or high computational and memory costs. In… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  26. arXiv:2404.15141  [pdf, other

    cs.CV cs.AI

    CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

    Authors: Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

    Abstract: Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapol… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  27. arXiv:2404.14949  [pdf, other

    cs.CV

    Multi-Modal Prompt Learning on Blind Image Quality Assessment

    Authors: Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

    Abstract: Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semant… ▽ More

    Submitted 18 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  28. arXiv:2404.13921  [pdf, other

    cs.CV

    NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation

    Authors: Chi Huang, Xinyang Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

    Abstract: As a preliminary work, NeRF-Det unifies the tasks of novel view synthesis and 3D perception, demonstrating that perceptual tasks can benefit from novel view synthesis methods like NeRF, significantly improving the performance of indoor multi-view 3D object detection. Using the geometry MLP of NeRF to direct the attention of detection head to crucial parts and incorporating self-supervised loss fro… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  29. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  30. arXiv:2404.11093  [pdf, other

    quant-ph cs.LG

    Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems

    Authors: Long Cao, Liwei Ge, Daochi Zhang, Xiang Li, Yao Wang, Rui-Xue Xu, Yi**g Yan, Xiao Zheng

    Abstract: Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures

  31. arXiv:2404.10599  [pdf, other

    cs.NE

    Towards free-response paradigm: a theory on decision-making in spiking neural networks

    Authors: Zhichao Zhu, Yang Qi, Wenlian Lu, Zhigang Wang, Lu Cao, Jianfeng Feng

    Abstract: The energy-efficient and brain-like information processing abilities of Spiking Neural Networks (SNNs) have attracted considerable attention, establishing them as a crucial element of brain-inspired computing. One prevalent challenge encountered by SNNs is the trade-off between inference speed and accuracy, which requires sufficient time to achieve the desired level of performance. Drawing inspira… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 27 pages, 6 figures, 3 tables

  32. arXiv:2403.18471  [pdf, other

    cs.CV

    DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

    Authors: Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji

    Abstract: The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks. Existing face forgery datasets have limitations in generating high-quality facial images and addressing the challenges posed by evolving generative techniques. To combat this, we present DiffusionFace, the first diffusion-based face forgery… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  33. arXiv:2403.18393  [pdf, other

    cs.LG

    Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering

    Authors: Long Shi, Lei Cao, Yunshan Ye, Yu Zhao, Badong Chen

    Abstract: In the context of multi-view clustering, graph learning is recognized as a crucial technique, which generally involves constructing an adaptive neighbor graph based on probabilistic neighbors, and then learning a consensus graph to for clustering. However, they are confronted with two limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive n… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  34. Inter- and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation

    Authors: Qiangguo **, Hui Cui, Changming Sun, Yang Song, Jiangbin Zheng, Leilei Cao, Leyi Wei, Ran Su

    Abstract: Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertaint… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: Expert Systems with Applications, 2024, 238: 122093

  35. arXiv:2403.12362  [pdf, other

    cs.CV cs.LG

    DMAD: Dual Memory Bank for Real-World Anomaly Detection

    Authors: Jianlong Hu, Xu Chen, Zhenye Gan, **long Peng, Shengchuan Zhang, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Liujuan Cao, Rongrong Ji

    Abstract: Training a unified model is considered to be more suitable for practical industrial anomaly detection scenarios due to its generalization ability and storage efficiency. However, this multi-class setting, which exclusively uses normal data, overlooks the few but important accessible annotated anomalies in the real world. To address the challenge of real-world anomaly detection, we propose a new fr… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  36. arXiv:2403.11300  [pdf, other

    cs.NI

    Non-Primary Channel Access in IEEE 802.11 UHR: Comprehensive Analysis and Evaluation

    Authors: Dongyu Wei, Liu Cao, Lyutianyang Zhang, Xiangyu Gao, Hao Yin

    Abstract: The evolution of the IEEE 802.11 standards marks a significant throughput advancement in wireless access technologies, progressively increasing bandwidth capacities from 20 MHz in the IEEE 802.11a to up to 320 MHz in the latest IEEE 802.11be (Wi-Fi 7). However, the increased bandwidth capacities may not be well exploited due to inefficient bandwidth utilization on multiple channels. This issue typ… ▽ More

    Submitted 12 May, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 6 pages, 7 figures

  37. arXiv:2403.08220  [pdf, other

    math.NA cs.LG stat.CO stat.ML

    Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems

    Authors: Lianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas

    Abstract: We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (P… ▽ More

    Submitted 20 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Updated manuscript: changed title, changed format, typo correction, and minor terminology changes

  38. arXiv:2403.05832  [pdf, other

    cs.CE

    Research progress on intelligent optimization techniques for energy-efficient design of ship hull forms

    Authors: Shuwei Zhu, Siying Lv, Kaifeng Chen, Wei Fang, Leilei Cao

    Abstract: The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the le… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 30 pages, 8 figures

    MSC Class: 41C99 ACM Class: J.6; I.2.8

  39. arXiv:2403.00953  [pdf

    cs.CL cs.AI

    AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models

    Authors: Lang Cao, Jimeng Sun, Adam Cross

    Abstract: Objectives: Our objective is to create an end-to-end system called AutoRD, which automates extracting information from clinical text about rare diseases. We have conducted various tests to evaluate the performance of AutoRD and highlighted its strengths and limitations in this paper. Materials and Methods: Our system, AutoRD, is a software pipeline involving data preprocessing, entity extraction… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  40. arXiv:2402.17525  [pdf, other

    cs.CV

    Diffusion Model-Based Image Editing: A Survey

    Authors: Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, Liangliang Cao

    Abstract: Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provid… ▽ More

    Submitted 16 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  41. arXiv:2402.14973  [pdf, other

    cs.CL cs.AI cs.LG

    Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations

    Authors: Lele Cao, Valentin Buchner, Zineb Senane, Fangkai Yang

    Abstract: Multimodal Large Language Models (MLLMs) are commonly evaluated using costly annotated multimodal benchmarks. However, these benchmarks often struggle to keep pace with the rapidly advancing requirements of MLLM evaluation. We propose GenCeption, a novel and annotation-free MLLM evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflec… ▽ More

    Submitted 9 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted by the 4th Workshop on TrustNLP (Trustworthy Natural Language Processing) @ NAACL2024. Source code: https://github.com/llcresearch/GenCeption. Leaderboard: https://huggingface.co/spaces/valbuc/GenCeption

    ACM Class: I.7; I.4

  42. arXiv:2402.10064  [pdf

    cs.DC cs.LG

    Navigating the Maize: Cyclic and conditional computational graphs for molecular simulation

    Authors: Thomas Löhr, Michael Dodds, Lili Cao, Mikhail Kabeshov, Michele Assante, Jon-Paul Janet, Marco Klähn, Ola Engkvist

    Abstract: Many computational chemistry and molecular simulation workflows can be expressed as graphs. This abstraction is useful to modularize and potentially reuse existing components, as well as provide parallelization and ease reproducibility. Existing tools represent the computation as a directed acyclic graph (DAG), thus allowing efficient execution by parallelization of concurrent branches. These syst… ▽ More

    Submitted 22 January, 2024; originally announced February 2024.

  43. Graph-Skeleton: ~1% Nodes are Sufficient to Represent Billion-Scale Graph

    Authors: Linfeng Cao, Haoran Deng, Yang Yang, Chun** Wang, Lei Chen

    Abstract: Due to the ubiquity of graph data on the web, web graph mining has become a hot research spot. Nonetheless, the prevalence of large-scale web graphs in real applications poses significant challenges to storage, computational capacity and graph model design. Despite numerous studies to enhance the scalability of graph models, a noticeable gap remains between academic research and practical web grap… ▽ More

    Submitted 6 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 21 pages, 11 figures, In Proceedings of the ACM Web Conference 2024 (WWW'24)

  44. arXiv:2402.08241  [pdf, other

    cs.IR

    Causal Learning for Trustworthy Recommender Systems: A Survey

    Authors: ** Li, Shou** Wang, Qi Zhang, Longbing Cao, Fang Chen, Xiuzhen Zhang, Dietmar Jannach, Charu C. Aggarwal

    Abstract: Recommender Systems (RS) have significantly advanced online content discovery and personalized decision-making. However, emerging vulnerabilities in RS have catalyzed a paradigm shift towards Trustworthy RS (TRS). Despite numerous progress on TRS, most of them focus on data correlations while overlooking the fundamental causal nature in recommendation. This drawback hinders TRS from identifying th… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  45. arXiv:2402.05948  [pdf, other

    cs.LG cs.CL

    DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks

    Authors: Jianing He, Qi Zhang, Wei** Ding, Duoqian Miao, Jun Zhao, Liang Hu, Longbing Cao

    Abstract: Early exiting has demonstrated its effectiveness in accelerating the inference of pre-trained language models like BERT by dynamically adjusting the number of layers executed. However, most existing early exiting methods only consider local information from an individual test sample to determine their exiting indicators, failing to leverage the global information offered by sample population. This… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: 16 pages

  46. arXiv:2402.02051  [pdf, other

    cs.LG

    Nonlinear subspace clustering by functional link neural networks

    Authors: Long Shi, Lei Cao, Zhongpu Chen, Badong Chen, Yu Zhao

    Abstract: Nonlinear subspace clustering based on a feed-forward neural network has been demonstrated to provide better clustering accuracy than some advanced subspace clustering algorithms. While this approach demonstrates impressive outcomes, it involves a balance between effectiveness and computational cost. In this study, we employ a functional link neural network to transform data samples into a nonline… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  47. arXiv:2401.15770  [pdf, other

    cs.CL

    PILOT: Legal Case Outcome Prediction with Case Law

    Authors: Lang Cao, Zifeng Wang, Cao Xiao, Jimeng Sun

    Abstract: Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary… ▽ More

    Submitted 12 April, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  48. arXiv:2401.13112  [pdf, other

    cs.AI stat.ML

    Distributional Counterfactual Explanation With Optimal Transport

    Authors: Lei You, Lele Cao, Mattias Nilsson, Bo Zhao, Lei Lei

    Abstract: Counterfactual explanations (CE) are the de facto method of providing insight and interpretability in black-box decision-making models by identifying alternative input instances that lead to different outcomes. This paper extends the concept of CE to a distributional context, broadening the scope from individual data points to entire input and output distributions, named distributional counterfact… ▽ More

    Submitted 25 May, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  49. arXiv:2401.03341  [pdf, other

    cs.LG stat.ML

    Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection

    Authors: Zhangkai Wu, Longbing Cao, Qi Zhang, Junxian Zhou, Hui Chen

    Abstract: Due to their unsupervised training and uncertainty estimation, deep Variational Autoencoders (VAEs) have become powerful tools for reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based TSAD methods, either statistical or deep, tune meta-priors to estimate the likelihood probability for effectively capturing spatiotemporal dependencies in the data. However, these methods con… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  50. arXiv:2401.02705  [pdf, other

    cs.AI

    XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

    Authors: Zhitao Wang, Wei Wang, Zirao Li, Long Wang, Can Yi, Xinjie Xu, Luyang Cao, Han**g Su, Shouzhi Chen, Jun Zhou

    Abstract: In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting… ▽ More

    Submitted 10 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.