Skip to main content

Showing 1–50 of 1,163 results for author: Cao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00497  [pdf, other

    cs.CL

    LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

    Authors: Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuan**g Huang, Shuicheng Yan

    Abstract: This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.20006  [pdf, other

    cs.LG

    On the Trade-off between Flatness and Optimization in Distributed Learning

    Authors: Ying Cao, Zhaoxian Wu, Kun Yuan, Ali H. Sayed

    Abstract: This paper proposes a theoretical framework to evaluate and compare the performance of gradient-descent algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers two interesting results. F… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2406.19317  [pdf, other

    cs.LG cs.AI cs.CL

    Jump Starting Bandits with LLM-Generated Prior Knowledge

    Authors: Parand A. Alamdari, Yanshuai Cao, Kevin H. Wilson

    Abstract: We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that LLMs, pre-trained on extensive corpora rich in human knowledge and preferences, can simulate human… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.16671  [pdf, other

    cs.RO

    STAR: Swarm Technology for Aerial Robotics Research

    Authors: Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, Guillaume Sartoretti

    Abstract: In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  7. arXiv:2406.14912  [pdf, other

    cs.CV

    FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing

    Authors: Zhibo Du, Long Peng, Yang Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Moiré patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoiréing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\t… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP2024

  8. arXiv:2406.14841  [pdf, other

    cs.CR cs.DB cs.LG

    TabularMark: Watermarking Tabular Datasets for Machine Learning

    Authors: Yihao Zheng, Haocheng Xia, Junyuan Pang, **fei Liu, Kui Ren, Lingyang Chu, Yang Cao, Li Xiong

    Abstract: Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.13870  [pdf, other

    cs.CV

    Splatter a Video: Video Gaussian Representation for Versatile Processing

    Authors: Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a no… ▽ More

    Submitted 26 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.13167  [pdf, other

    cs.CL

    QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism

    Authors: Bo Wang, Heyan Huang, Yixin Cao, Jiahao Ying, Wei Tang, Chong Feng

    Abstract: While large language models (LLMs) have made notable advancements in natural language processing, they continue to struggle with processing extensive text. Memory mechanism offers a flexible solution for managing long contexts, utilizing techniques such as compression, summarization, and structuring to facilitate nuanced and efficient handling of large volumes of text. However, existing techniques… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  11. arXiv:2406.13093  [pdf, other

    cs.CV cs.AI cs.HC

    RITA: A Real-time Interactive Talking Avatars Framework

    Authors: Wuxinlin Cheng, Cheng Wan, Yupeng Cao, Sihan Chen

    Abstract: RITA presents a high-quality real-time interactive framework built upon generative models, designed with practical applications in mind. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions. By leveraging the latest advancements in generative modeling, we have developed a versatile platform that not only enhances t… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.11507  [pdf, other

    cs.CV

    Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

    Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

    Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industrial Informatics

  14. arXiv:2406.10650  [pdf, other

    stat.ML cs.LG

    The Implicit Bias of Adam on Separable Data

    Authors: Chenyang Zhang, Difan Zou, Yuan Cao

    Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  15. arXiv:2406.09447  [pdf, ps, other

    cs.IT eess.SP

    Self-Sustainable Active Reconfigurable Intelligent Surfaces for Anti-Jamming in Wireless Communications

    Authors: Yang Cao, Wenchi Cheng, **gqing Wang, Wei Zhang

    Abstract: Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power lo… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE systems journal

  16. arXiv:2406.07807  [pdf, ps, other

    cs.IT eess.SP

    Dynamic Energy-Saving Design for Double-Faced Active RIS Assisted Communications with Perfect/Imperfect CSI

    Authors: Yang Cao, Wenchi Cheng, **gqing Wang, Wei Zhang

    Abstract: Although the emerging reconfigurable intelligent surface (RIS) paves a new way for next-generation wireless communications, it suffers from inherent flaws, i.e., double-fading attenuation effects and half-space coverage limitations. The state-of-the-art double-face active (DFA)-RIS architecture is proposed for significantly amplifying and transmitting incident signals in full-space. Despite the ef… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE TWC

  17. arXiv:2406.07333  [pdf, other

    cs.CV

    Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

    Authors: Haiming Yao, Wei Luo, Yunkang Cao, Yiheng Zhang, Wenyong Yu, Weiming Shen

    Abstract: Texture surface anomaly detection finds widespread applications in industrial settings. However, existing methods often necessitate gathering numerous samples for model training. Moreover, they predominantly operate within a close-set detection framework, limiting their ability to identify anomalies beyond the training dataset. To tackle these challenges, this paper introduces a novel zero-shot te… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SUBMISSION TO IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

  18. arXiv:2406.07255  [pdf, other

    cs.CV eess.IV

    Towards Realistic Data Generation for Real-World Super-Resolution

    Authors: Long Peng, Wenbo Li, Ren**g Pei, **g**g Ren, Xueyang Fu, Yang Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producin… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  19. arXiv:2406.07176  [pdf, other

    cs.CV

    RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection

    Authors: Yuqi Cheng, Yunkang Cao, Rui Chen, Weiming Shen

    Abstract: Robustness against noisy imaging is crucial for practical image anomaly detection systems. This study introduces a Robust Anomaly Detection (RAD) dataset with free views, uneven illuminations, and blurry collections to systematically evaluate the robustness of current anomaly detection methods. Specifically, RAD aims to identify foreign objects on working platforms as anomalies. The collection pro… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  20. arXiv:2406.05980  [pdf, other

    cs.CV

    Causality-inspired Latent Feature Augmentation for Single Domain Generalization

    Authors: Jian Xu, Chaojie Ji, Yankai Cao, Ye Li, Ruxin Wang

    Abstract: Single domain generalization (Single-DG) intends to develop a generalizable model with only one single training domain to perform well on other unknown target domains. Under the domain-hungry configuration, how to expand the coverage of source domain and find intrinsic causal features across different distributions is the key to enhancing the models' generalization ability. Existing methods mainly… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  21. arXiv:2406.05436  [pdf, other

    cs.NE

    Introducing Competitive Mechanism to Differential Evolution for Numerical Optimization

    Authors: Rui Zhong, Yang Cao, Enzhi Zhang, Masaharu Munetomo

    Abstract: This paper introduces a novel competitive mechanism into differential evolution (DE), presenting an effective DE variant named competitive DE (CDE). CDE features a simple yet efficient mutation strategy: DE/winner-to-best/1. Essentially, the proposed DE/winner-to-best/1 strategy can be recognized as an intelligent integration of the existing mutation strategies of DE/rand-to-best/1 and DE/cur-to-b… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by The 30th Int'l Conf on Parallel and Distributed Processing Techniques and Applications (PDPTA'24)

  22. arXiv:2406.05433  [pdf, other

    cs.NE

    Large Language Model Assisted Adversarial Robustness Neural Architecture Search

    Authors: Rui Zhong, Yang Cao, Jun Yu, Masaharu Munetomo

    Abstract: Motivated by the potential of large language models (LLMs) as optimizers for solving combinatorial optimization problems, this paper proposes a novel LLM-assisted optimizer (LLMO) to address adversarial robustness neural architecture search (ARNAS), a specific application of combinatorial optimization. We design the prompt using the standard CRISPE framework (i.e., Capacity and Role, Insight, Stat… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by The 6th International Conference on Data-driven Optimization of Complex Systems (DOCS)

  23. arXiv:2406.05374  [pdf, other

    cs.CL

    Planning Like Human: A Dual-process Framework for Dialogue Planning

    Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Ming Liu, Zerui Chen, Bing Qin

    Abstract: In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to their reactive nature. Traditional approaches to enhance dialogue planning in LLMs, ranging from elaborate prompt engineering to the integration of policy networks, either face efficiency issues or d… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures, ACL 2024 main conference

  24. arXiv:2406.05033  [pdf, other

    cs.LG math.OC

    Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

    Authors: Si Yi Meng, Antonio Orvieto, Daniel Yiming Cao, Christopher De Sa

    Abstract: We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  25. arXiv:2406.04758  [pdf, other

    cs.CL

    Think out Loud: Emotion Deducing Explanation in Dialogues

    Authors: Jiangnan Li, Zheng Lin, Lanrui Wang, Qingyi Si, Yanan Cao, Mo Yu, Peng Fu, Wei** Wang, Jie Zhou

    Abstract: Humans convey emotions through daily dialogues, making emotion understanding a crucial step of affective intelligence. To understand emotions in dialogues, machines are asked to recognize the emotion for an utterance (Emotion Recognition in Dialogues, ERD); based on the emotion, then find causal utterances for the emotion (Emotion Cause Extraction in Dialogues, ECED). The setting of the two tasks… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  26. arXiv:2406.04687  [pdf, other

    cs.LG cs.CV

    LogiCode: an LLM-Driven Framework for Logical Anomaly Detection

    Authors: Yiheng Zhang, Yunkang Cao, Xiaohao Xu, Weiming Shen

    Abstract: This paper presents LogiCode, a novel framework that leverages Large Language Models (LLMs) for identifying logical anomalies in industrial settings, moving beyond traditional focus on structural inconsistencies. By harnessing LLMs for logical reasoning, LogiCode autonomously generates Python codes to pinpoint anomalies such as incorrect component quantities or missing elements, marking a signific… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  27. arXiv:2406.04573  [pdf

    cs.CV

    Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection

    Authors: Yiheng Zhang, Yunkang Cao, Tianhang Zhang, Weiming Shen

    Abstract: This study targets Multi-Lighting Image Anomaly Detection (MLIAD), where multiple lighting conditions are utilized to enhance imaging quality and anomaly detection performance. While numerous image anomaly detection methods have been proposed, they lack the capacity to handle multiple inputs for a single sample, like multi-lighting images in MLIAD. Hence, this study proposes Attention Fusion Rever… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  28. arXiv:2406.04253  [pdf, other

    cs.CV

    A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

    Authors: Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong

    Abstract: 3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures

  29. arXiv:2406.04208  [pdf, other

    cs.LG cs.AI

    Aligning Agents like Large Language Models

    Authors: Adam Jelley, Yuhan Cao, Dave Bignell, Sam Devlin, Tabish Rashid

    Abstract: Training agents to behave as desired in complex 3D environments from high-dimensional sensory information is challenging. Imitation learning from diverse human behavior provides a scalable approach for training an agent with a sensible behavioral prior, but such an agent may not perform the specific behaviors of interest when deployed. To address this issue, we draw an analogy between the undesira… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  30. arXiv:2406.04113  [pdf, other

    cs.CL

    Uncovering Limitations of Large Language Models in Information Seeking from Tables

    Authors: Chaoxu Pang, Yixuan Cao, Chunhao Yang, ** Luo

    Abstract: Tables are recognized for their high information density and widespread usage, serving as essential sources of information. Seeking information from tables (TIS) is a crucial capability for Large Language Models (LLMs), serving as the foundation of knowledge-based Q&A systems. However, this field presently suffers from an absence of thorough and reliable evaluation. This paper introduces a more re… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  31. arXiv:2406.03963  [pdf, other

    cs.CL

    A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

    Authors: Wei Tang, Yixin Cao, Jiahao Ying, Bo Wang, Yuyue Zhao, Yong Liao, Pengyuan Zhou

    Abstract: Retrieval-Augmented Generation (RAG) is an effective solution to supplement necessary knowledge to large language models (LLMs). Targeting its bottleneck of retriever performance, "generate-then-read" pipeline is proposed to replace the retrieval stage with generation from the LLM itself. Although promising, this research direction is underexplored and still cannot work in the scenario when source… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL'24 (Findings)

  32. arXiv:2406.03701  [pdf, other

    cs.MM

    Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction

    Authors: Meishan Zhang, Hao Fei, Bin Wang, Shengqiong Wu, Yixin Cao, Fei Li, Min Zhang

    Abstract: In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this work for the first time introduces the concept of grounded Multimodal Universal Information Extraction (MUIE), providing a unified task framework to… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  33. arXiv:2406.03519  [pdf, other

    cs.LG cs.CR cs.DC

    Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

    Authors: Saber Malekmohammadi, Yaoliang Yu, Yang Cao

    Abstract: High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. The latter has been tried to achieve by using differential privacy in FL (DPFL). There is often heterogeneity in clients privacy requirements, and existing DPFL works either assume uniform privacy requirements for clients or are not ap… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  34. arXiv:2406.02472  [pdf, other

    cs.CL

    Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

    Authors: Zhihan Zhang, Yixin Cao, Chenchen Ye, Yunshan Ma, Lizi Liao, Tat-Seng Chua

    Abstract: The digital landscape is rapidly evolving with an ever-increasing volume of online news, emphasizing the need for swift and precise analysis of complex events. We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE). This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event c… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  35. arXiv:2406.02309  [pdf, other

    cs.LG

    Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing

    Authors: Youwei Shu, Xi Xiao, Derui Wang, Yuxin Cao, Siji Chen, Jason Xue, Linyi Li, Bo Li

    Abstract: Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of tw… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ICML 2024 Poster

  36. arXiv:2406.01078  [pdf, other

    cs.CV

    CUT: A Controllable, Universal, and Training-Free Visual Anomaly Generation Framework

    Authors: Han Sun, Yunkang Cao, Olga Fink

    Abstract: Visual anomaly detection (AD) inherently faces significant challenges due to the scarcity of anomalous data. Although numerous works have been proposed to synthesize anomalous samples, the generated samples often lack authenticity or can only reflect the distribution of the available training data samples. In this work, we propose CUT: a Controllable, Universal and Training-free visual anomaly gen… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages excluding appendix

  37. arXiv:2406.00830  [pdf, other

    cs.CV

    Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

    Authors: Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

    Abstract: Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the pr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Code Page: https://github.com/yangcaoai/CoDA_NeurIPS2023 This paper has been submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) for possible publication

  38. arXiv:2406.00045  [pdf, other

    cs.CL cs.LG

    Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

    Authors: Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, **ghui Chen

    Abstract: Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracti… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

  39. arXiv:2405.20576  [pdf, other

    cs.CR

    Federated Graph Analytics with Differential Privacy

    Authors: Shang Liu, Yang Cao, Takao Murakami, Weiran Liu, Seng Pei Liew, Tsubasa Takahashi, **fei Liu, Masatoshi Yoshikawa

    Abstract: Collaborative graph analysis across multiple institutions is becoming increasingly popular. Realistic examples include social network analysis across various social platforms, financial transaction analysis across multiple banks, and analyzing the transmission of infectious diseases across multiple hospitals. We define the federated graph analytics, a new problem for collaborative graph analytics… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 13 pages

  40. arXiv:2405.20174  [pdf, other

    cs.LG math.AG

    Tropical Expressivity of Neural Networks

    Authors: Shiv Bhatia, Yueqi Cao, Paul Lezeau, Anthea Monod

    Abstract: We propose an algebraic geometric framework to study the expressivity of linear activation neural networks. A particular quantity that has been actively studied in the field of deep learning is the number of linear regions, which gives an estimate of the information capacity of the architecture. To study and evaluate information capacity and expressivity, we work in the setting of tropical geometr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  41. arXiv:2405.19256  [pdf, other

    cs.LG math.NA

    Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

    Authors: Zhiqiang Cai, Yu Cao, Yuanfei Huang, Xiang Zhou

    Abstract: Sampling invariant distributions from an Ito diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in both biased and correlated samples. Current deep learning-based method solves the stationary Fokker--Planck equation to determine the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 24 pages,10 figures

  42. arXiv:2405.19100  [pdf, other

    cs.CV

    Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

    Authors: Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras

    Abstract: Current facial expression recognition (FER) models are often designed in a supervised learning manner and thus are constrained by the lack of large-scale facial expression images with high-quality annotations. Consequently, these models often fail to generalize well, performing poorly on unseen images in inference. Vision-language-based zero-shot models demonstrate a promising potential for addres… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The code and pre-trained models are available at https://github.com/zengqunzhao/Exp-CLIP

  43. arXiv:2405.18156  [pdf, other

    cs.CV

    VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

    Authors: Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu

    Abstract: Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  44. arXiv:2405.17769  [pdf, other

    cs.RO cs.CV

    Microsaccade-inspired Event Camera for Robotics

    Authors: Botao He, Ze Wang, Yuan Zhou, **gxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, Cornelia Fermuller

    Abstract: Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore c… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published on Science Robotics June 2024 issue

  45. arXiv:2405.17529  [pdf, other

    cs.LG cs.CR

    Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails

    Authors: Haichao Sha, Yang Cao, Yong Liu, Yuncheng Wu, Ruixuan Liu, Hong Chen

    Abstract: Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clip** mechanisms to optimize training performa… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  46. arXiv:2405.17234  [pdf, other

    cs.AI cs.LG

    Benchmarking General-Purpose In-Context Learning

    Authors: Fan Wang, Chuan Lin, Yang Cao, Yu Kang

    Abstract: In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly, without relying on any artificially crafted optimization techniques. In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, namely General-Purpose In-Context Learning (GPICL). To this end, we introdu… ▽ More

    Submitted 26 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  47. arXiv:2405.16754  [pdf, other

    cs.RO

    Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

    Authors: Youqi Pan, Wugen Zhou, Yingdian Cao, Hongbin Zha

    Abstract: Visual-inertial odometry (VIO) has demonstrated remarkable success due to its low-cost and complementary sensors. However, existing VIO methods lack the generalization ability to adjust to different environments and sensor attributes. In this paper, we propose Adaptive VIO, a new monocular visual-inertial odometry that combines online continual learning with traditional nonlinear optimization. Ada… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  48. arXiv:2405.16542  [pdf, other

    cs.AI cs.CY

    Mamba4KT:An Efficient and Effective Mamba-based Knowledge Tracing Model

    Authors: Yang Cao, Wei Zhang

    Abstract: Knowledge tracing (KT) enhances student learning by leveraging past performance to predict future performance. Current research utilizes models based on attention mechanisms and recurrent neural network structures to capture long-term dependencies and correlations between exercises, aiming to improve model accuracy. Due to the growing amount of data in smart education scenarios, this poses a chall… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  49. arXiv:2405.16150  [pdf, other

    cs.CL

    5W1H Extraction With Large Language Models

    Authors: Yang Cao, Yangsong Lan, Feiyan Zhai, Piji Li

    Abstract: The extraction of essential news elements through the 5W1H framework (\textit{What}, \textit{When}, \textit{Where}, \textit{Why}, \textit{Who}, and \textit{How}) is critical for event extraction and text summarization. The advent of Large language models (LLMs) such as ChatGPT presents an opportunity to address language-related tasks through simple prompts without fine-tuning models with much time… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: IJCNN 2024

  50. arXiv:2405.15426  [pdf, other

    cs.CR

    AuthNet: Neural Network with Integrated Authentication Logic

    Authors: Yuling Cai, Fan Xiang, Guozhu Meng, Yinzhi Cao, Kai Chen

    Abstract: Model stealing, i.e., unauthorized access and exfiltration of deep learning models, has become one of the major threats. Proprietary models may be protected by access controls and encryption. However, in reality, these measures can be compromised due to system breaches, query-based model extraction or a disgruntled insider. Security hardening of neural networks is also suffering from limits, for e… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.