Skip to main content

Showing 1–50 of 118 results for author: Cao, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  2. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.13939  [pdf, other

    cs.CV

    2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

    Authors: Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, **g Liu

    Abstract: Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task focuses more on the motion in video content for language-guided video object segmentation, requiring an enhanced ability to model longer temporal, moti… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.10248  [pdf, other

    cs.CL cs.AI

    On the Worst Prompt Performance of Large Language Models

    Authors: Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, Wai Lam

    Abstract: The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  5. arXiv:2406.09669  [pdf, other

    cs.CR

    Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

    Authors: Changjiang Li, Ren Pang, Bochuan Cao, **ghui Chen, Fenglong Ma, Shouling Ji, Ting Wang

    Abstract: Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the security risks of these practices themselves remain largely unexplored, which is highly concerning. To bridge this gap, this work investigates t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.04802  [pdf, other

    cs.CV cs.LG

    Predictive Dynamic Fusion

    Authors: Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu

    Abstract: Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability.… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 21 pages, 7 figures

  7. arXiv:2406.02378  [pdf, other

    cs.CL

    On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

    Authors: Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Kristen Johnson, Jiliang Tang, Rongrong Wang

    Abstract: Large Language Models (LLMs) can improve their responses when instructed to do so, a capability known as self-correction. When these instructions lack specific details about the issues in the response, this is referred to as leveraging the intrinsic self-correction capability. The empirical success of self-correction can be found in various applications, e.g., text detoxification and social bias m… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 7 figures

  8. arXiv:2406.02291  [pdf, other

    cs.NI eess.SP

    A deep-learning-based MAC for integrating channel access, rate adaptation and channel switch

    Authors: Jiantao Xin, Wei Xu, Bin Cao, Taotao Wang, Shengli Zhang

    Abstract: With increasing density and heterogeneity in unlicensed wireless networks, traditional MAC protocols, such as carrier-sense multiple access with collision avoidance (CSMA/CA) in Wi-Fi networks, are experiencing performance degradation. This is manifested in increased collisions and extended backoff times, leading to diminished spectrum efficiency and protocol coordination. Addressing these issues,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2406.02239  [pdf, other

    cs.NI

    Decentralized Physical Infrastructure Network (DePIN): Challenges and Opportunities

    Authors: Zhibin Lin, Taotao Wang, Long Shi, Shengli Zhang, Bin Cao

    Abstract: The widespread use of the Internet has posed challenges to existing centralized physical infrastructure networks. Issues such as data privacy risks, service disruptions, and substantial expansion costs have emerged. To address these challenges, an innovative network architecture called Decentralized Physical Infrastructure Network (DePIN) has emerged. DePIN leverages blockchain technology to decen… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2406.01252  [pdf, other

    cs.CL cs.AI stat.ML

    Towards Scalable Automated Alignment of LLMs: A Survey

    Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

    Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2406.00045  [pdf, other

    cs.CL cs.LG

    Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

    Authors: Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, **ghui Chen

    Abstract: Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracti… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

  12. arXiv:2405.20404  [pdf, other

    cs.CL cs.LG

    XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

    Authors: Yurui Chang, Bochuan Cao, Yujia Wang, **ghui Chen, Lu Lin

    Abstract: Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of elucidating and explaining the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  13. arXiv:2405.14023  [pdf, other

    cs.LG

    WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response

    Authors: Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, **ghui Chen

    Abstract: The recent breakthrough in large language models (LLMs) such as ChatGPT has revolutionized production processes at an unprecedented pace. Alongside this progress also comes mounting concerns about LLMs' susceptibility to jailbreaking attacks, which leads to the generation of harmful or unsafe content. While safety alignment measures have been implemented in LLMs to mitigate existing jailbreak atte… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  14. arXiv:2405.12979  [pdf, other

    cs.CV

    OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

    Authors: Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo

    Abstract: The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  15. arXiv:2405.11276  [pdf, other

    cs.CV

    Visible and Clear: Finding Tiny Objects in Difference Map

    Authors: Bing Cao, Haiyu Yao, Pengfei Zhu, Qinghua Hu

    Abstract: Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  16. arXiv:2404.16248  [pdf, other

    cs.CL cs.AI

    URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

    Authors: Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han

    Abstract: Linking a claim to grounded references is a critical ability to fulfill human demands for authentic and reliable information. Current studies are limited to specific tasks like information retrieval or semantic matching, where the claim-reference relationships are unique and fixed, while the referential knowledge linking (RKL) in real-world can be much more diverse and complex. In this paper, we p… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  17. arXiv:2404.15677  [pdf, other

    cs.CV

    CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

    Authors: Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia

    Abstract: Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consi… ▽ More

    Submitted 27 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Code will be released very soon: https://github.com/qinghew/CharacterFactory

  18. arXiv:2404.14831  [pdf, other

    cs.DB cs.CL cs.IR

    Towards Universal Dense Blocking for Entity Resolution

    Authors: Tianshu Wang, Hongyu Lin, Xianpei Han, Xiaoyang Chen, Boxi Cao, Le Sun

    Abstract: Blocking is a critical step in entity resolution, and the emergence of neural network-based representation models has led to the development of dense blocking as a promising approach for exploring deep semantics in blocking. However, previous advanced self-supervised dense blocking approaches require domain-specific training on the target domain, which limits the benefits and rapid adaptation of t… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Code and data are available at this https://github.com/tshu-w/Uniblocker

  19. arXiv:2404.10496  [pdf, other

    cs.IR

    Spiral of Silence: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering

    Authors: Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun

    Abstract: The practice of Retrieval-Augmented Generation (RAG), which integrates Large Language Models (LLMs) with retrieval systems, has become increasingly prevalent. However, the repercussions of LLM-derived content infiltrating the web and influencing the retrieval-generation feedback loop are largely uncharted territories. In this study, we construct and iteratively run a simulation pipeline to deeply… ▽ More

    Submitted 23 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted to ACL2024

  20. arXiv:2404.06809  [pdf, other

    cs.CL

    Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

    Authors: Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun

    Abstract: The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and corre… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Our code, benchmark, and models are available at https://github.com/panruotong/CAG

  21. arXiv:2404.05981  [pdf, other

    cs.LG cs.CV

    A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics

    Authors: Bryan Bo Cao, Abhinav Sharma, Lawrence O'Gorman, Michael Coss, Shubham Jain

    Abstract: Despite accuracy and computation benchmarks being widely available to help choose among neural network models, these are usually trained on datasets with many classes, and do not give a precise idea of performance for applications of few (< 10) classes. The conventional procedure to predict performance is to train and test repeatedly on the different models and dataset variations of interest. Howe… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 13 pages, 3 figures

    MSC Class: 65D19

  22. arXiv:2403.14401  [pdf, other

    cs.CV

    Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination

    Authors: Dingchen Yang, Bowen Cao, Guang Chen, Changjun Jiang

    Abstract: Multi-modal Large Language Models (MLLMs) demonstrate remarkable success across various vision-language tasks. However, they suffer from visual hallucination, where the generated responses diverge from the provided image. Are MLLMs completely oblivious to accurate visual cues when they hallucinate? Our investigation reveals that the visual branch may simultaneously advocate both accurate and non-e… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  23. arXiv:2403.12494  [pdf, other

    cs.CV

    Task-Customized Mixture of Adapters for General Image Fusion

    Authors: Pengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu

    Abstract: General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusio… ▽ More

    Submitted 23 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  24. arXiv:2402.18243  [pdf, other

    cs.CL

    Learning or Self-aligning? Rethinking Instruction Fine-tuning

    Authors: Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun

    Abstract: Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potentia… ▽ More

    Submitted 2 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  25. arXiv:2402.18068  [pdf, other

    cs.CV

    SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model

    Authors: Bin Cao, Jianhao Yuan, Yexin Liu, Jian Li, Shuyang Sun, **g Liu, Bo Zhao

    Abstract: In the rapidly evolving area of image synthesis, a serious challenge is the presence of complex artifacts that compromise perceptual realism of synthetic images. To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizin… ▽ More

    Submitted 4 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  26. arXiv:2402.17532  [pdf, other

    cs.CL

    Retrieval is Accurate Generation

    Authors: Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

    Abstract: Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retr… ▽ More

    Submitted 16 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  27. arXiv:2402.14281  [pdf, other

    cs.CV

    A Landmark-Aware Visual Navigation Dataset

    Authors: Faith Johnson, Bryan Bo Cao, Kristin Dana, Shubham Jain, Ashwin Ashok

    Abstract: Map representation learned by expert demonstrations has shown promising research value. However, recent advancements in the visual navigation field face challenges due to the lack of human datasets in the real world for efficient supervised representation learning of the environments. We present a Landmark-Aware Visual Navigation (LAVN) dataset to allow for supervised learning of human-centric exp… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  28. arXiv:2402.12498  [pdf, other

    cs.CV cs.LG cs.RO

    Feudal Networks for Visual Navigation

    Authors: Faith Johnson, Bryan Bo Cao, Kristin Dana, Shubham Jain, Ashwin Ashok

    Abstract: Visual navigation follows the intuition that humans can navigate without detailed maps. A common approach is interactive exploration while building a topological graph with images at nodes that can be used for planning. Recent variations learn from passive videos and can navigate using complex social and semantic cues. However, a significant number of training videos are needed, large graphs are u… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  29. arXiv:2312.10611  [pdf, other

    cs.CV cs.AI

    Bi-directional Adapter for Multi-modal Tracking

    Authors: Bing Cao, Junliang Guo, Pengfei Zhu, Qinghua Hu

    Abstract: Due to the rapid development of computer vision, single-modal (RGB) object tracking has made significant progress in recent years. Considering the limitation of single imaging sensor, multi-modal images (RGB, Infrared, etc.) are introduced to compensate for this deficiency for all-weather object tracking in complex environments. However, as acquiring sufficient multi-modal tracking data is hard wh… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024. Code is available at https://github.com/SparkTempest/BAT

  30. arXiv:2312.09057  [pdf, other

    cs.CR cs.AI cs.CV

    On the Difficulty of Defending Contrastive Learning against Backdoor Attacks

    Authors: Changjiang Li, Ren Pang, Bochuan Cao, Zhaohan Xi, **ghui Chen, Shouling Ji, Ting Wang

    Abstract: Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers. However, thus far it remains under-explored how contrastive backdoor attacks fundamentally differ from their supervised counterparts, which impedes the development of effective… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: USENIX Security 24

  31. arXiv:2312.05603  [pdf, other

    cs.CL cs.AI

    Sim-GPT: Text Similarity via GPT Annotated Data

    Authors: Shuhe Wang, Beiming Cao, Shengyu Zhang, Xiaoya Li, Jiwei Li, Fei Wu, Guoyin Wang, Eduard Hovy

    Abstract: Due to the lack of a large collection of high-quality labeled sentence pairs with textual similarity scores, existing approaches for Semantic Textual Similarity (STS) mostly rely on unsupervised techniques or training signals that are only partially correlated with textual similarity, e.g., NLI-based datasets. To tackle this issue, in this paper, we propose the strategy of measuring text similarit… ▽ More

    Submitted 12 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  32. arXiv:2312.00027  [pdf, other

    cs.CR cs.AI cs.CL

    Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

    Authors: Yuanpu Cao, Bochuan Cao, **ghui Chen

    Abstract: Recent developments in Large Language Models (LLMs) have manifested significant advancements. To facilitate safeguards against malicious exploitation, a body of research has concentrated on aligning LLMs with human preferences and inhibiting their generation of inappropriate content. Unfortunately, such alignments are often vulnerable: fine-tuning with a minimal amount of harmful data can easily u… ▽ More

    Submitted 8 June, 2024; v1 submitted 15 November, 2023; originally announced December 2023.

  33. arXiv:2311.15243  [pdf, other

    cs.CV cs.AI cs.LG

    ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

    Authors: Yichen Bai, Zongbo Han, Changqing Zhang, Bing Cao, Xiaoheng Jiang, Qinghua Hu

    Abstract: Out-of-distribution (OOD) detection methods often exploit auxiliary outliers to train model identifying OOD samples, especially discovering challenging outliers from auxiliary outliers dataset to improve OOD detection. However, they may still face limitations in effectively distinguishing between the most challenging OOD samples that are much like in-distribution (ID) data, i.e., \idlike samples.… ▽ More

    Submitted 22 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Journal ref: CVPR 2024

  34. arXiv:2311.11990  [pdf

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Machine-Learned Atomic Cluster Expansion Potentials for Fast and Quantum-Accurate Thermal Simulations of Wurtzite AlN

    Authors: Guang Yang, Yuan-Bin Liu, Lei Yang, Bing-Yang Cao

    Abstract: Using the atomic cluster expansion (ACE) framework, we develop a machine learning interatomic potential for fast and accurately modelling the phonon transport properties of wurtzite aluminum nitride. The predictive power of the ACE potential against density functional theory (DFT) is demonstrated across a broad range of properties of w-AlN, including ground-state lattice parameters, specific heat… ▽ More

    Submitted 21 January, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  35. arXiv:2311.11375  [pdf, other

    cs.CL

    ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

    Authors: Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou

    Abstract: Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue systems. However, the inevitable errors from automatic speech recognition (ASR) usually impair the understanding performance and lead to error propagation. Although there are some attempts to address this problem through contrastive learning, they (1) treat clean manual transcripts and ASR transcripts equally w… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  36. arXiv:2310.19248  [pdf, other

    cs.CV cs.AI cs.CR

    IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

    Authors: Bochuan Cao, Changjiang Li, Ting Wang, **yuan Jia, Bo Li, **ghui Chen

    Abstract: Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 21 pages, 11 figures, 9 tables. Accepted by NeurIPS 2023

  37. ViFiT: Reconstructing Vision Trajectories from IMU and Wi-Fi Fine Time Measurements

    Authors: Bryan Bo Cao, Abrar Alali, Hansi Liu, Nicholas Meegan, Marco Gruteser, Kristin Dana, Ashwin Ashok, Shubham Jain

    Abstract: Tracking subjects in videos is one of the most widely used functions in camera-based IoT applications such as security surveillance, smart city traffic safety enhancement, vehicle to pedestrian communication and so on. In the computer vision domain, tracking is usually achieved by first detecting subjects with bounding boxes, then associating detected bounding boxes across video frames. For many I… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 22 pages, 12 figures, 9 tables. MobiCom 2023 ISACom

    ACM Class: I.4.9; C.2.m

  38. arXiv:2310.01581  [pdf, other

    cs.LG cs.AI cs.CR

    On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

    Authors: Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, **yuan Jia, **ghui Chen, Dinghao Wu

    Abstract: Large Language Models (LLMs) have achieved unprecedented performance in Natural Language Generation (NLG) tasks. However, many existing studies have shown that they could be misused to generate undesired content. In response, before releasing LLMs for public access, model developers usually align those language models through Supervised Fine-Tuning (SFT) or Reinforcement Learning with Human Feedba… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  39. arXiv:2310.00057  [pdf, other

    cs.CE

    A multi-fidelity deep operator network (DeepONet) for fusing simulation and monitoring data: Application to real-time settlement prediction during tunnel construction

    Authors: Chen Xu, Ba Trung Cao, Yong Yuan, Günther Meschke

    Abstract: Ground settlement prediction during the process of mechanized tunneling is of paramount importance and remains a challenging research topic. Typically, two paradigms are existing: a physics-driven approach utilizing process-oriented computational simulation models for the tunnel-soil interaction and the settlement prediction, and a data-driven approach employing machine learning techniques to esta… ▽ More

    Submitted 12 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

  40. arXiv:2309.14348  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM

    Authors: Bochuan Cao, Yuanpu Cao, Lu Lin, **ghui Chen

    Abstract: Recently, Large Language Models (LLMs) have made significant advancements and are now widely used across various domains. Unfortunately, there has been a rising concern that LLMs can be misused to generate harmful or malicious content. Though a line of research has focused on aligning LLMs with human values and preventing them from producing inappropriate content, such alignments are usually vulne… ▽ More

    Submitted 11 June, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: 19 Pages, 5 Figures, 8 Tables. Accepted by ACL 2024

  41. arXiv:2309.01858  [pdf, other

    cs.CV

    Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations

    Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, Bingyi Cao, Mário Lipovský, Pelin Dogan-Schönberger, Grzegorz Makosa, Boris Bluntschli, Mojtaba Seyedhosseini, Ondřej Chum, André Araujo

    Abstract: Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific d… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: ICCV 2023 Accepted

  42. arXiv:2309.01212  [pdf, other

    cs.SD eess.AS

    NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

    Authors: Wen Wang, Dongchao Yang, Qichen Ye, Bowen Cao, Yuexian Zou

    Abstract: The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen noisy scenes. Technical routes for the DM-based SE methods can be summarized into three types: task-adapted diffusion process formulation, generator-plus-condition… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  43. arXiv:2308.15703  [pdf, other

    cs.IR cs.LG

    Fragment and Integrate Network (FIN): A Novel Spatial-Temporal Modeling Based on Long Sequential Behavior for Online Food Ordering Click-Through Rate Prediction

    Authors: Jun Li, **gjian Wang, Hongwei Wang, Xing Deng, Jielong Chen, Bing Cao, Zekun Wang, Guanjie Xu, Ge Zhang, Feng Shi, Hualei Liu

    Abstract: Spatial-temporal information has been proven to be of great significance for click-through rate prediction tasks in online Location-Based Services (LBS), especially in mainstream food ordering platforms such as DoorDash, Uber Eats, Meituan, and Ele.me. Modeling user spatial-temporal preferences with sequential behavior data has become a hot topic in recommendation systems and online advertising. H… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted by CIKM 2023 Applied Research Paper

  44. arXiv:2308.13057  [pdf, other

    cs.CV

    Data-Side Efficiencies for Lightweight Convolutional Neural Networks

    Authors: Bryan Bo Cao, Lawrence O'Gorman, Michael Coss, Shubham Jain

    Abstract: We examine how the choice of data-side attributes for two important visual tasks of image classification and object detection can aid in the choice or design of lightweight convolutional neural networks. We show by experimentation how four data attributes - number of classes, object color, image resolution, and object scale affect neural network model size and efficiency. Intra- and inter-class si… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: 10 pages, 5 figures, 6 tables

  45. arXiv:2308.06954  [pdf, other

    cs.CV

    Global Features are All You Need for Image Retrieval and Reranking

    Authors: Shihao Shao, Kaifeng Chen, Arjun Karpur, Qinghua Cui, Andre Araujo, Bingyi Cao

    Abstract: Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs glo… ▽ More

    Submitted 19 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: ICCV23 camera-ready + appendix

  46. arXiv:2308.04792  [pdf, ps, other

    cs.RO cs.AI

    NNPP: A Learning-Based Heuristic Model for Accelerating Optimal Path Planning on Uneven Terrain

    Authors: Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie, Baoshi Cao

    Abstract: Intelligent autonomous path planning is essential for enhancing the exploration efficiency of mobile robots operating in uneven terrains like planetary surfaces and off-road environments.In this paper, we propose the NNPP model for computing the heuristic region, enabling foundation algorithms like Astar to find the optimal path solely within this reduced search space, effectively decreasing the s… ▽ More

    Submitted 20 June, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

  47. arXiv:2306.10265  [pdf, other

    cs.CV cs.RO

    NBMOD: Find It and Grasp It in Noisy Background

    Authors: Boyuan Cao, Xinyu Zhou, Congmin Guo, Baohua Zhang, Yuchen Liu, Qianqiu Tan

    Abstract: Gras** objects is a fundamental yet important capability of robots, and many tasks such as sorting and picking rely on this skill. The prerequisite for stable gras** is the ability to correctly identify suitable gras** positions. However, finding appropriate gras** points is challenging due to the diverse shapes, varying density distributions, and significant differences between the baryce… ▽ More

    Submitted 17 June, 2023; originally announced June 2023.

  48. arXiv:2306.05301  [pdf, other

    cs.CL

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

    Authors: Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, Le Sun

    Abstract: Enabling large language models to utilize real-world tools effectively is crucial for achieving embodied intelligence. Existing approaches to tool learning have either primarily relied on extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or utilized supervised learning to train limited scopes of tools on compact models. However, it rema… ▽ More

    Submitted 7 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

  49. arXiv:2305.11038  [pdf, other

    cs.CL

    Learning In-context Learning for Named Entity Recognition

    Authors: Jiawei Chen, Yaojie Lu, Hongyu Lin, Jie Lou, Wei Jia, Dai Dai, Hua Wu, Boxi Cao, Xianpei Han, Le Sun

    Abstract: Named entity recognition in real-world applications suffers from the diversity of entity types, the emergence of new entity types, and the lack of high-quality annotations. To address the above problems, this paper proposes an in-context learning-based NER approach, which can effectively inject in-context NER ability into PLMs and recognize entities of novel types on-the-fly using only a few demon… ▽ More

    Submitted 26 May, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Main Conference

  50. arXiv:2305.09144  [pdf, other

    cs.CL cs.AI

    Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

    Authors: Boxi Cao, Qiaoyu Tang, Hongyu Lin, Shanshan Jiang, Bin Dong, Xianpei Han, Jiawei Chen, Tianshu Wang, Le Sun

    Abstract: Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forg… ▽ More

    Submitted 13 March, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted by LREC-COLING 2024