Skip to main content

Showing 1–50 of 756 results for author: Wenbo

.
  1. arXiv:2407.02158  [pdf, other

    cs.CV

    UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

    Authors: **g**g Ren, Wenbo Li, Haoyu Chen, Ren**g Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, Lei Zhu

    Abstract: Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to 6K) within a single model, while maintaining comp… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02118  [pdf, other

    cs.CL

    Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

    Authors: Wenzhen Zheng, Wenbo Pan, Xu Xu, Libo Qin, Li Yue, Ming Zhou

    Abstract: In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages

  3. arXiv:2407.01183  [pdf, other

    cs.DB

    TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval

    Authors: Wenbo Xu, Liang Yan, Peiyi Han, Haifeng Zhu, Chuanyi Liu, Shaoming Duan, Cuiyun Gao, Yingwei Liang

    Abstract: Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2406.17877  [pdf, other

    eess.SY

    Equity-aware Load Shedding Optimization

    Authors: Xin Fang, Wenbo Wang, Fei Ding

    Abstract: Load shedding is usually the last resort to balance generation and demand to maintain stable operation of the electric grid after major disturbances. Current load-shedding optimization practices focus mainly on the physical optimality of the network power flow. This might lead to an uneven allocation of load curtailment, disadvantaging some loads more than others. Addressing this oversight, this p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Contact email for corresponding and first author: [email protected]

  5. arXiv:2406.17505  [pdf, ps, other

    math.CO math-ph math.SP

    Chebyshev Moment Method for Regular Graphs II: Discrete Trace Formula

    Authors: Yulin Gong, Wenbo Li, Shi** Liu

    Abstract: We establish discrete trace formulas on a regular graph to relate its spectrum and non-backtracking walks. Our approach is based on the Chebyshev-type polynomials and we refer to this treatment as Chebyshev moment method. A key fact is that Chebyshev-type polynomials form a complete orthogonal basis with respect to the Kesten-McKay distribution. Based on this method, we further apply Cauchy's inte… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    MSC Class: 05C30; 05C31; 05C50; 05C62

  6. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

  7. arXiv:2406.16966  [pdf, other

    cs.CV cs.LG

    Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels

    Authors: Yangdi Lu, Wenbo He

    Abstract: Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training, resulting in poor generalization performance. During an early learning phase, deep neural networks have been observed to… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Noisy labels, Machine learning, Similarity Search

  8. arXiv:2406.16477  [pdf, other

    cs.CV cs.CL

    DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

    Authors: Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang

    Abstract: Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.16476  [pdf, other

    cs.CV

    ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

    Authors: Shuwei Shi, Wenbo Li, Yuechen Zhang, **gwen He, Biao Gong, Yinqiang Zheng

    Abstract: Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMast… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2406.16438  [pdf, other

    astro-ph.EP astro-ph.IM

    Constrained velocity-free control of spacecraft attitude via explicit reference governor

    Authors: Qingqing Dang, Wenbo Libo, Haichao Gui

    Abstract: This paper introduces an explicit reference governor-based control scheme tailored for addressing the velocity-free spacecraft attitude maneuver problem. This problem is subject to specific constraints, namely the pointing constraint, angular velocity constraint, and input saturation. The proposed control scheme operates in two layers, ensuring the asymptotic stability of the spacecraft's attitude… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.15982  [pdf, other

    cs.CV cs.AI cs.LG

    Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction

    Authors: Yangdi Lu, Wenbo He

    Abstract: Deep neural networks has been highly successful in data-intense computer vision applications, while such success relies heavily on the massive and clean data. In real-world scenarios, clean data sometimes is difficult to obtain. For example, in image classification and segmentation tasks, precise annotations of millions samples are generally very expensive and time-consuming. In 3D static scene re… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Computer vision, Noisy Labels, 3D reconstruction, 3D Gaussian Splats, (Work still in progress)

  12. arXiv:2406.14550  [pdf, other

    cs.CL cs.AI

    GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

    Authors: Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng

    Abstract: Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first four authors contributed equally, 27 pages

  13. arXiv:2406.13323  [pdf, other

    physics.optics

    An alkali-referenced vector spectrum analyzer for visible-light integrated photonics

    Authors: Baoqi Shi, Ming-Yang Zheng, Yunkai Zhao, Yi-Han Luo, **bao Long, Wei Sun, Wenbo Ma, Xiu-** Xie, Lan Gao, Chen Shen, Anting Wang, Wei Liang, Qiang Zhang, Junqiu Liu

    Abstract: Integrated photonics has reformed our information society by offering on-chip optical signal synthesis, processing and detection with reduced size, weight and power consumption. As such, it has been successfully established in the near-infrared (NIR) telecommunication bands. With the soaring demand in miniaturized systems for biosensing, quantum information and transportable atomic clocks, extensi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  14. When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective

    Authors: Shoujie Li, Zihan Wang, Changsheng Wu, Xiang Li, Shan Luo, Bin Fang, Fuchun Sun, Xiao-** Zhang, Wenbo Ding

    Abstract: Tactile sensors, which provide information about the physical properties of objects, are an essential component of robotic systems. The visuotactile sensing technology with the merits of high resolution and low cost has facilitated the development of robotics from environment exploration to dexterous operation. Over the years, several reviews on visuotactile sensors for robots have been presented,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing

  15. arXiv:2406.11469  [pdf, other

    eess.IV

    RMFA-Net: A Neural ISP for Real RAW to RGB Image Reconstruction

    Authors: Fei Li, Wenbo Hou, Peng Jia

    Abstract: Deep learning-based ISP algorithms have demonstrated significant potential in raw2rgb reconstruction. However, existing networks have not fully considered the specific characteristics of raw data, such as black level and CFA, which can negatively impact texture and color if mishandled. Moreover, uneven exposure in raw data is also not considered carefully, leading to adverse effects on contrast an… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.08826  [pdf, other

    cond-mat.mes-hall

    Topological Corner States in Bilayer and Trilayer Systems with Vertically Stacked Topologically Distinct Layers

    Authors: Natsuko Ishida, Motohiko Ezawa, Guangtai Lu, Wenbo Lin, Yasutomo Ota, Yasuhiko Arakawa, Satoshi Iwamoto

    Abstract: We investigate bilayer and trilayer systems composed of topologically distinct, vertically stacked layers based on the Benalcazar-Bernevig-Hughes model. We have identified a topological phase transition that significantly alters the number of the topological corner states in these systems. Additionally, we find that traditional nested Wilson loop analysis inaccurately classifies certain phases, le… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

  17. arXiv:2406.08725  [pdf, other

    cs.CR

    RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs

    Authors: Xuan Chen, Yuzhou Nie, Lu Yan, Yunshu Mao, Wenbo Guo, Xiangyu Zhang

    Abstract: Modern large language model (LLM) developers typically conduct a safety alignment to prevent an LLM from generating unethical or harmful content. Recent studies have discovered that the safety alignment of LLMs can be bypassed by jailbreaking prompts. These prompts are designed to create specific conversation scenarios with a harmful question embedded. Querying an LLM with such prompts can mislead… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. arXiv:2406.08705  [pdf, other

    cs.CR

    When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search

    Authors: Xuan Chen, Yuzhou Nie, Wenbo Guo, Xiangyu Zhang

    Abstract: Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to ``fool'' LLMs into responding to harmful questions. Early-stage jailbreaking attacks require access to model internals or significant human efforts. More advanced attacks utilize genetic algorithms for automatic and black-box attacks. However, the random nature of genetic algorithms significantly limits the effe… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.08160  [pdf, other

    cs.RO

    Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments

    Authors: Shoujie Li, Yan Huang, Changqing Guo, Tong Wu, Jiawei Zhang, Linrui Zhang, Wenbo Ding

    Abstract: The advent of simulation engines has revolutionized learning and operational efficiency for robots, offering cost-effective and swift pipelines. However, the lack of a universal simulation platform tailored for chemical scenarios impedes progress in robotic manipulation and visualization of reaction processes. Addressing this void, we present Chemistry3D, an innovative toolkit that integrates exte… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.07255  [pdf, other

    cs.CV eess.IV

    Towards Realistic Data Generation for Real-World Super-Resolution

    Authors: Long Peng, Wenbo Li, Ren**g Pei, **g**g Ren, Xueyang Fu, Yang Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producin… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.06580  [pdf, other

    cs.CL cs.AI

    Break the Chain: Large Language Models Can be Shortcut Reasoners

    Authors: Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.06096  [pdf, other

    physics.optics

    Space-Time Hopfion Crystals

    Authors: Wenbo Lin, Nilo Mata-Cervera, Yasutomo Ota, Yijie Shen, Satoshi Iwamoto

    Abstract: Hopfions, higher-dimensional topological quasiparticles with sophisticated 3D knotted spin textures discovered in condensed matter and photonic systems, show promise in high-density data storage and transfer. Here we present crystalline structures of hopfions lying in space-time constructed by spatiotemporally structured light. A practical methodology using bichromatic structured light beams or di… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 6 pages, 4 figures

  23. arXiv:2406.05874  [pdf, other

    cs.CR

    Stealthy Targeted Backdoor Attacks against Image Captioning

    Authors: Wenshu Fan, Hongwei Li, Wenbo Jiang, Meng Hao, Shui Yu, Xiao Zhang

    Abstract: In recent years, there has been an explosive growth in multimodal learning. Image captioning, a classical multimodal task, has demonstrated promising applications and attracted extensive research attention. However, recent studies have shown that image caption models are vulnerable to some security threats such as backdoor attacks. Existing backdoor attacks against image captioning typically pair… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  24. arXiv:2406.05759  [pdf, ps, other

    math.CO math.PR math.SP

    Chebyshev Moment Method for Regular Graphs I: Kesten-McKay and Semicircle distributions

    Authors: Yulin Gong, Wenbo Li, Shi** Liu

    Abstract: We develop the Chebyshev moment method to study the spectrum of regular graphs, motivated by the work of Serré. By this method, we give an elementary proof of the weak convergence to the Kesten-McKay distribution for the normalized spectral measures of random $N$-lifts in probability as $N$ tends to infinity. For a sequence of random $(q_n+1)$-regular graphs $G_n$ with $n$ vertices, we show that i… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    MSC Class: 05C31; 05C50; 05C80; 60B20

  25. arXiv:2406.05491  [pdf, other

    cs.CV cs.CR

    One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

    Authors: Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Shutao Xia, Ke Xu

    Abstract: Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  26. arXiv:2406.03879  [pdf, other

    cs.LG cs.CV

    Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure

    Authors: Minghao Yang, Linlin Gao, Pengyuan Li, Wenbo Li, Yihong Dong, Zhiying Cui

    Abstract: Current structured pruning methods often result in considerable accuracy drops due to abrupt network changes and loss of information from pruned structures. To address these issues, we introduce the Decay Pruning Method (DPM), a novel smooth pruning approach with a self-rectifying mechanism. DPM consists of two key components: (i) Smooth Pruning: It converts conventional single-step pruning into m… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  27. arXiv:2406.03873  [pdf, other

    cs.LG cs.AI cs.CV

    Quantum Implicit Neural Representations

    Authors: Jiaming Zhao, Wenbo Qiao, Peng Zhang, Hui Gao

    Abstract: Implicit neural representations have emerged as a powerful paradigm to represent signals such as images and sounds. This approach aims to utilize neural networks to parameterize the implicit function of the signal. However, when representing implicit functions, traditional neural networks such as ReLU-based multilayer perceptrons face challenges in accurately modeling high-frequency components of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: This paper was accepted by icml 2024

  28. arXiv:2406.02166  [pdf, other

    cs.SD cs.CL eess.AS

    Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

    Authors: Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou

    Abstract: There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  29. arXiv:2406.02147  [pdf, other

    cs.CV

    UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

    Authors: Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang

    Abstract: 3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  30. arXiv:2406.01884  [pdf, other

    cs.CV

    Rank-based No-reference Quality Assessment for Face Swap**

    Authors: Xinghui Zhou, Wenbo Zhou, Tianyi Wei, Shen Chen, Tai** Yao, Shouhong Ding, Weiming Zhang, Nenghai Yu

    Abstract: Face swap** has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swap** methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  31. arXiv:2406.01375  [pdf, other

    cs.CL

    D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    Authors: Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

    Abstract: Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  32. arXiv:2406.01359  [pdf, other

    cs.CL cs.SE

    R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    Authors: Ken Deng, Jiaheng Liu, He Zhu, Congnan Liu, **gxin Li, Jiakai Wang, Peng Zhao, Chenchen Zhang, Yanan Wu, Xueqiao Yin, Yuanxing Zhang, Wenbo Su, Bangyu Xiang, Tiezheng Ge, Bo Zheng

    Abstract: Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  33. arXiv:2406.00449  [pdf, other

    eess.IV cs.CV

    Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging

    Authors: Jiahua Dong, Hui Yin, Hongliu Li, Wenbo Li, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan

    Abstract: Deep unfolding methods have made impressive progress in restoring 3D hyperspectral images (HSIs) from 2D measurements through convolution neural networks or Transformers in spectral compressive imaging. However, they cannot efficiently capture long-range dependencies using global receptive fields, which significantly limits their performance in HSI reconstruction. Moreover, these methods may suffe… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  34. arXiv:2405.21023  [pdf, other

    math.OC cs.AI

    Compact Optimality Verification for Optimization Proxies

    Authors: Wenbo Chen, Haoruo Zhao, Mathieu Tanneau, Pascal Van Hentenryck

    Abstract: Recent years have witnessed increasing interest in optimization proxies, i.e., machine learning models that approximate the input-output map** of parametric optimization problems and return near-optimal feasible solutions. Following recent work by (Nellikkath & Chatzivasileiadis, 2021), this paper reconsiders the optimality verification problem for optimization proxies, i.e., the determination o… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: International Conference on Machine Learning 2024

  35. arXiv:2405.20725  [pdf, other

    cs.AI cs.CV

    GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

    Authors: Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu

    Abstract: Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, rese… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  36. arXiv:2405.20653  [pdf, other

    cs.AI

    Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens

    Authors: Jiahao Yu, Haozheng Luo, Jerry Yao-Chieh Hu, Wenbo Guo, Han Liu, Xinyu Xing

    Abstract: Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question. Existing jailbreaking attacks require either human experts or leveraging complicated algorithms to craft jailbreaking prompts… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  37. arXiv:2405.20279  [pdf, other

    cs.CV cs.AI eess.IV

    CV-VAE: A Compatible Video VAE for Latent Generative Video Models

    Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

    Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

  38. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  39. arXiv:2405.18679  [pdf, other

    cs.CV

    Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

    Authors: Juntao Zhang, Kun Bian, Peng Cheng, Wenbo An, Jianning Liu, Jun Zhou

    Abstract: In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences such as language understanding. Therefore, building efficient and general-purpose visual backbones based on SSMs is a promising direction. Compared to traditional convolutional neural networks (CNNs) and Vision Transfor… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  40. arXiv:2405.17811  [pdf, other

    cs.GR cs.CV

    Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

    Authors: Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan

    Abstract: Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not hi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page here: https://gaoxiangjun.github.io/mani_gs/

  41. arXiv:2405.17102  [pdf, other

    cs.CV cs.RO

    DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge

    Authors: Yifan Mao, Ming Li, Jian Liu, Jiayang Liu, Zihan Qin, Chunxi Chu, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu

    Abstract: Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Outstanding Champion in the RoboDepth Challenge (ICRA24) https://robodrive-24.github.io/

  42. arXiv:2405.16783  [pdf, other

    cs.CR cs.AI cs.LG

    TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models

    Authors: Yuzhou. Nie, Yanting. Wang, **yuan. Jia, Michael J. De Lucia, Nathaniel D. Bastian, Wenbo. Guo, Dawn. Song

    Abstract: One key challenge in backdoor attacks against large foundation models is the resource limits. Backdoor attacks usually require retraining the target model, which is impractical for very large foundation models. Existing backdoor attacks are mainly designed for supervised classifiers or small foundation models (e.g., BERT). None of these attacks has successfully compromised a very large foundation… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  43. arXiv:2405.15304  [pdf, other

    cs.LG cs.CV

    Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

    Authors: Yongliang Wu, Shiji Zhou, Mingzhuo Yang, Lianzhe Wang, Wenbo Zhu, Heng Chang, Xiao Zhou, Xu Yang

    Abstract: Current text-to-image diffusion models have achieved groundbreaking results in image generation tasks. However, the unavoidable inclusion of sensitive information during pre-training introduces significant risks such as copyright infringement and privacy violations in the generated images. Machine Unlearning (MU) provides a effective way to the sensitive concepts captured by the model, has been sh… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  44. arXiv:2405.15210  [pdf

    cond-mat.str-el

    Spin chirality engineering induced giant topological Hall effect in a kagome magnet

    Authors: Wei Xia, Shihao Zhang, Jian Yuan, Yurui Wei, Haonan Wang, Hong Du, Xiangqi Liu, Jiangteng Guo, Zicheng Tao, Ke Qu, Xia Wang, Xuerong Liu, Wenbo Wang, **guang Cheng, Yulin Chen, Jianpeng Liu, Ruidan Zhong, Xuewen Fu, Zhenzhong Yang, Yanfeng Guo

    Abstract: The ferrimagnet TbMn6Sn6 has attracted vast attention, because its pristine Mn kagome lattice with strong spin-orbit coupling and out-of-plane Tb-Mn exchange supports quantum-limit Chern topological magnetism which can be described by the simple spinless Haldane model. We unveil herein that engineering the pristine kagome lattice through partial replacement of Mn by nonmagnetic Cr which tends to c… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 33 pages,4 main figures and 16 SI figures

  45. arXiv:2405.13571  [pdf, other

    cs.CV

    Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD

    Authors: Wenbo Sui, Daniel Lichau, Josselin Lefèvre, Harold Phelippeau

    Abstract: Recent studies of multi-modal Industrial Anomaly Detection (IAD) based on point clouds and RGB images indicated the importance of exploiting redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multi-modal IAD in practical production lines remains a work in progress that requires consideration of the trade-offs between costs and benefits… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  46. arXiv:2405.13446  [pdf, ps, other

    math.AG

    Effective gonality theorem on weight-one syzygies of algebraic curves

    Authors: Wenbo Niu, **hyung Park

    Abstract: In 1986, Green-Lazarsfeld raised the gonality conjecture asserting that the gonality $\operatorname{gon}(C)$ of a smooth projective curve $C$ of genus $g\geq 2$ can be read off from weight-one syzygies of a sufficiently positive line bundle $L$ on $C$, and also proposed possible least degree of such a line bundle. In 2015, Ein-Lazarsfeld proved the conjecture when $\operatorname{deg} L$ is suffici… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 21 pages, comments are welcome

  47. arXiv:2405.13124  [pdf, other

    astro-ph.GA astro-ph.SR

    The Pristine survey -- XXVI. The very metal-poor Galaxy: Chemodynamics through the follow-up of the Pristine-Gaia synthetic catalogue

    Authors: Akshara Viswanathan, Zhen Yuan, Anke Ardern-Arentsen, Else Starkenburg, Nicolas F. Martin, Kris Youakim, Rodrigo A. Ibata, Federico Sestito, Tadafumi Matsuno, Carlos Allende Prieto, Freya Barwell, Manuel Bayer, Amandine Doliva-Dolinsky, Emma Fernandez-Alvar, Pablo M. Galan-de Anta, Kiran Jhass, Nicolas Longeard, Jose Maria Arroyo-Polonio, Pol Massana, Martin Montelius, Samuel Rusterucci, Judith Santos, Guillaume F. Thomas, Sara Vitali, Wenbo Wu , et al. (5 additional authors not shown)

    Abstract: The Pristine-\textit{Gaia} synthetic catalogue provides reliable photometric metallicities for $\sim$30 million FGK stars using the Pristine survey model and Gaia XP spectra. We perform the first low-to-medium-resolution spectroscopic follow-up of bright (G<15) and distant (up to 35 kpc) very and extremely metal-poor (V/EMP, [Fe/H]<-2.5) red giant branch stars from this. We use Isaac Newton Telesc… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Submitted to A&A. 17 pages (9 figures) + 3 pages (3 figures) in Appendix. Comments are very welcome! The catalogue and 1D spectra will be made available public after acceptance and before upon reasonable request to the first author

  48. arXiv:2405.12669  [pdf, other

    cs.CL

    A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

    Authors: Huangjun Shen, Liangying Shao, Wenbo Li, Zhibin Lan, Zhanyu Liu, **song Su

    Abstract: In recent years, multi-modal machine translation has attracted significant interest in both academia and industry due to its superior performance. It takes both textual and visual modalities as inputs, leveraging visual context to tackle the ambiguities in source texts. In this paper, we begin by offering an exhaustive overview of 99 prior works, comprehensively summarizing representative studies… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  49. arXiv:2405.11895  [pdf, other

    cs.LG eess.SY

    Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins

    Authors: Yanlei Yin, Lihua Wang, Wenbo Wang, Dinh Thai Hoang

    Abstract: In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  50. arXiv:2405.11681  [pdf, other

    stat.ME math.ST

    Distributed Tensor Principal Component Analysis

    Authors: Elynn Chen, Xi Chen, Wenbo **g, Yichen Zhang

    Abstract: As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this paper investigates tensor PCA within a distributed framework where direct data pool… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.