Skip to main content

Showing 1–50 of 162 results for author: Du, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01552  [pdf

    cs.NI physics.optics

    High Spectral-Efficiency, Ultra-low MIMO SDM Transmission over a Field-Deployed Multi-Core OAM Fiber

    Authors: Junyi Liu, Zengquan Xu, Shuqi Mo, Yuming Huang, Yining Huang, Zhenhua Li, Yuying Guo, Lei Shen, Shuo Xu, Ran Gao, Cheng Du, Qian Feng, Jie Luo, Jie Liu, Siyuan Yu

    Abstract: Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM… ▽ More

    Submitted 29 April, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures

  2. arXiv:2407.01067  [pdf, other

    cs.AI cs.CL cs.CV cs.HC cs.LG

    Human-like object concept representations emerge naturally in multimodal large language models

    Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, **peng Li, Shuang Qiu, Le Chang, Huiguang He

    Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vas… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.00362  [pdf, other

    cs.CV cs.AI

    JSCDS: A Core Data Selection Method with Jason-Shannon Divergence for Caries RGB Images-Efficient Learning

    Authors: Peiliang Zhang, Yujia Tong, Chenghu Du, Chao Che, Yongjun Zhu

    Abstract: Deep learning-based RGB caries detection improves the efficiency of caries identification and is crucial for preventing oral diseases. The performance of deep learning models depends on high-quality data and requires substantial training resources, making efficient deployment challenging. Core data selection, by eliminating low-quality and confusing data, aims to enhance training efficiency withou… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted in KDD 2024 Workshop AIDSH

  4. arXiv:2406.18844  [pdf, other

    cs.CV

    Revisiting Backdoor Attacks against Large Vision-Language Models

    Authors: Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Ee-Chien Chang, Xiaochun Cao

    Abstract: Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor atta… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 24 pages, 8 figures

  5. arXiv:2406.16903  [pdf

    cs.HC cs.AI cs.CL cs.LG

    Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction

    Authors: Changyu Du, Stavros Nousias, André Borrmann

    Abstract: Facing increasingly complex BIM authoring software and the accompanying expensive learning costs, designers often seek to interact with the software in a more intelligent and lightweight manner. They aim to automate modeling workflows, avoiding obstacles and difficulties caused by software usage, thereby focusing on the design process itself. To address this issue, we proposed an LLM-based autonom… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  6. arXiv:2406.10237  [pdf

    cs.IR cs.CE cs.CL cs.HC cs.LG

    Towards commands recommender system in BIM authoring tool using transformers

    Authors: Changyu Du, Zihan Deng, Stavros Nousias, André Borrmann

    Abstract: The complexity of BIM software presents significant barriers to the widespread adoption of BIM and model-based design within the Architecture, Engineering, and Construction (AEC) sector. End-users frequently express concerns regarding the additional effort required to create a sufficiently detailed BIM model when compared with conventional 2D drafting. This study explores the potential of sequenti… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  7. arXiv:2406.09760  [pdf, other

    cs.CL cs.LG

    Bootstrap** Language Models with DPO Implicit Rewards

    Authors: Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

    Abstract: Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.09136  [pdf, other

    cs.CL cs.LG

    Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

    Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

    Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT dec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  9. arXiv:2406.04295  [pdf, other

    cs.CV

    Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

    Authors: Jiayi Guo, Junhao Zhao, Chunjiang Ge, Chaoqun Du, Zanlin Ni, Shiji Song, Humphrey Shi, Gao Huang

    Abstract: Test-time adaptation (TTA) aims to enhance the performance of source-domain pretrained models when tested on unknown shifted target domains. Traditional TTA methods primarily adapt model weights based on target data streams, making model performance sensitive to the amount and order of target data. Recently, diffusion-driven TTA methods have demonstrated strong performance by using an unconditiona… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: GitHub: https://github.com/SHI-Labs/Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment

  10. arXiv:2406.01288  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

    Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, **g Jiang, Min Lin

    Abstract: Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting specia… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2405.21018  [pdf, other

    cs.LG cs.CL cs.CR

    Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

    Authors: Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, **dong Gu, Yang Liu, Xiaochun Cao, Min Lin

    Abstract: Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milesto… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2405.20600  [pdf, other

    cs.AI

    Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning

    Authors: Kaicheng Fu, Changde Du, Xiaoyu Chen, Jie Peng, Huiguang He

    Abstract: Emotion decoding plays an important role in affective human-computer interaction. However, previous studies ignored the dynamic real-world scenario, where human experience a blend of multiple emotions which are incrementally integrated into the model, leading to the multi-label class incremental learning (MLCIL) problem. Existing methods have difficulty in solving MLCIL issue due to notorious cata… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  13. arXiv:2405.20521  [pdf, other

    cs.CR

    SoK: Public Blockchain Sharding

    Authors: Md Mohaimin Al Barat, Shaoyu Li, Changlai Du, Y. Thomas Hou, Wen**g Lou

    Abstract: Blockchain's decentralization, transparency, and tamper-resistance properties have facilitated the system's use in various application fields. However, the low throughput and high confirmation latency hinder the widespread adoption of Blockchain. Many solutions have been proposed to address these issues, including first-layer solutions (or on-chain solutions) and second-layer solutions (or off-cha… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 18 pages

  14. arXiv:2405.18726  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

    Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

    Abstract: Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utili… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  15. arXiv:2405.16552  [pdf, other

    cs.CL cs.AI

    SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation

    Authors: Ziqin Luo, Haixia Han, Haokun Zhao, Guochao Jiang, Chengyu Du, Tingyun Li, Jiaqing Liang, Deqing Yang, Yanghua Xiao

    Abstract: Existing Large Language Models (LLMs) generate text through unidirectional autoregressive decoding methods to respond to various user queries. These methods tend to consider token selection in a simple sequential manner, making it easy to fall into suboptimal options when encountering uncertain tokens, referred to as chaotic points in our work. Many chaotic points exist in texts generated by LLMs,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: The relevant code will be released in subsequent versions

  16. arXiv:2405.07840  [pdf, other

    cs.HC cs.CL

    Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

    Authors: Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He

    Abstract: Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel m… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  17. arXiv:2405.03280  [pdf, other

    cs.CV cs.AI

    Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

    Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

    Abstract: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a map** between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of nat… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  18. arXiv:2405.03121  [pdf, other

    cs.CV cs.AI

    AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

    Authors: Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu

    Abstract: The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 14 pages, 7 figures

  19. arXiv:2404.19723  [pdf, other

    eess.AS cs.SD

    Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

    Authors: Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

    Abstract: Recent popular decoder-only text-to-speech models are known for their ability of generating natural-sounding speech. However, such models sometimes suffer from word skip** and repeating due to the lack of explicit monotonic alignment constraints. In this paper, we notice from the attention maps that some particular attention heads of the decoder-only model indicate the alignments between speech… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  20. arXiv:2404.19040  [pdf, other

    cs.CV

    GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

    Authors: Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen

    Abstract: We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Ga… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  21. arXiv:2404.18206  [pdf, other

    cs.CV

    Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation

    Authors: Cuiwei Liu, Youzhi Jiang, Chong Du, Zhaokui Li

    Abstract: Skeleton-based action recognition is vital for comprehending human-centric videos and has applications in diverse domains. One of the challenges of skeleton-based action recognition is dealing with low-quality data, such as skeletons that have missing or inaccurate joints. This paper addresses the issue of enhancing action recognition using low-quality skeletons through a general knowledge distill… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Journal ref: published in Signal Processing 2024

  22. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, **gyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  23. arXiv:2404.17484  [pdf, other

    cs.CV eess.IV

    Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

    Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

    Abstract: Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 19 pages, 5 figures

  24. arXiv:2404.16327  [pdf, other

    cs.IT eess.SP

    Generalized Step-Chirp Sequences With Flexible Bandwidth

    Authors: Cheng Du, Yi Jiang

    Abstract: Sequences with low aperiodic autocorrelation sidelobes have been extensively researched in literatures. With sufficiently low integrated sidelobe level (ISL), their power spectrums are asymptotically flat over the whole frequency domain. However, for the beam swee** in the massive multi-input multi-output (MIMO) broadcast channels, the flat spectrum should be constrained in a passband with tunab… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by 2024 IEEE International Symposium on Information Theory

  25. arXiv:2404.10315  [pdf, other

    cs.CL

    Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience

    Authors: Haixia Han, Tingyun Li, Shisong Chen, Jie Shi, Chengyu Du, Yanghua Xiao, Jiaqing Liang, Xin Lin

    Abstract: Large Language Models (LLMs) have exhibited remarkable performance across various downstream tasks, but they may generate inaccurate or false information with a confident tone. One of the possible solutions is to empower the LLM confidence expression capability, in which the confidence expressed can be well-aligned with the true probability of the generated answer being correct. However, leveragin… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  26. arXiv:2404.08707  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Model Can Continue Evolving From Mistakes

    Authors: Haokun Zhao, Haixia Han, Jie Shi, Chengyu Du, Jiaqing Liang, Yanghua Xiao

    Abstract: As world knowledge evolves and new task paradigms emerge, Continual Learning (CL) is crucial for kee** Large Language Models (LLMs) up-to-date and addressing their shortcomings. In practical applications, LLMs often require both continual instruction tuning (CIT) and continual pre-training (CPT) to adapt to new task paradigms and acquire necessary knowledge for task-solving. However, it remains… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  27. arXiv:2404.06079  [pdf, other

    eess.AS cs.AI

    The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. Report of a challenge

  28. arXiv:2403.18802  [pdf, other

    cs.CL cs.AI cs.LG

    Long-form factuality in large language models

    Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

    Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  29. arXiv:2403.10935  [pdf, other

    cs.CV

    Understanding Robustness of Visual State Space Models for Image Classification

    Authors: Chengbin Du, Yanxi Li, Chang Xu

    Abstract: Visual State Space Model (VMamba) has recently emerged as a promising architecture, exhibiting remarkable performance in various computer vision tasks. However, its robustness has not yet been thoroughly studied. In this paper, we delve into the robustness of this architecture through comprehensive investigations from multiple perspectives. Firstly, we investigate its robustness to adversarial att… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 27 pages

  30. arXiv:2403.06726  [pdf, other

    cs.LG cs.CV

    Probabilistic Contrastive Learning for Long-Tailed Visual Recognition

    Authors: Chaoqun Du, Yulin Wang, Shiji Song, Gao Huang

    Abstract: Long-tailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Such imbalance issue considerably impairs the performance of standard supervised learning algorithms, which are mainly designed for balanced training sets. Recent investigations have revealed that supervised contrastive learning exhibits promising potenti… ▽ More

    Submitted 14 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)

  31. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  32. arXiv:2402.16302  [pdf, other

    cs.LG cs.AI cs.CE

    Graph Diffusion Policy Optimization

    Authors: Yi**g Liu, Chao Du, Tianyu Pang, Chongxuan Li, Wei Chen, Min Lin

    Abstract: Recent research has made significant progress in optimizing diffusion models for specific downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph diffusion presents challenges, resulting in suboptimal performance. This paper introduces graph diffusion policy optimization (GDPO), a novel approach to op… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  33. arXiv:2402.14845  [pdf, other

    cs.CL cs.AI cs.LG

    Purifying Large Language Models by Ensembling a Small Language Model

    Authors: Tianlin Li, Qian Liu, Tianyu Pang, Chao Du, Qing Guo, Yang Liu, Min Lin

    Abstract: The emerging success of large language models (LLMs) heavily relies on collecting abundant training data from external (untrusted) sources. Despite substantial efforts devoted to data cleaning and curation, well-constructed LLMs have been reported to suffer from copyright infringement, data poisoning, and/or privacy violations, which would impede practical deployment of LLMs. In this study, we pro… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    ACM Class: I.2

  34. arXiv:2402.14172  [pdf, other

    cs.CY cs.SI

    Open Source Software Field Research: Spanning Social and Practice Networks for Re-Entering the Field

    Authors: Sean P. Goggins, Kevin Lumbard, Matt Germonprez, Caifan Du, Karthik Ram, James Howison

    Abstract: Sociotechnical research increasingly includes the social sub-networks that emerge from large-scale sociotechnical infrastructure, including the infrastructure for building open source software. This paper addresses these numerous sub-networks as advantageous for researchers. It provides a methodological synthesis focusing on how researchers can best span adjacent social sub-networks during engaged… ▽ More

    Submitted 12 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  35. arXiv:2402.13505  [pdf, other

    cs.LG cs.CV

    SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

    Authors: Chaoqun Du, Yizeng Han, Gao Huang

    Abstract: Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of model… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: ICML2024 camera-ready version

  36. arXiv:2402.12150  [pdf, other

    cs.CL cs.AI

    Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One

    Authors: Tianlin Li, Xiaoyu Zhang, Chao Du, Tianyu Pang, Qian Liu, Qing Guo, Chao Shen, Yang Liu

    Abstract: The widespread adoption of large language models (LLMs) underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents th… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    ACM Class: I.2; J.4

  37. arXiv:2402.08994  [pdf, other

    cs.CV cs.AI

    CLIP-MUSED: CLIP-Guided Multi-Subject Visual Neural Information Semantic Decoding

    Authors: Qiongyi Zhou, Changde Du, Shengpei Wang, Huiguang He

    Abstract: The study of decoding visual neural information faces challenges in generalizing single-subject decoding models to multiple subjects, due to individual differences. Moreover, the limited availability of data from a single subject has a constraining impact on model performance. Although prior multi-subject decoding methods have made significant progress, they still suffer from several limitations,… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR2024

  38. arXiv:2402.08577  [pdf, other

    cs.CL cs.CR cs.CV cs.LG cs.MM

    Test-Time Backdoor Attacks on Multimodal Large Language Models

    Authors: Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin

    Abstract: Backdoor attacks are commonly executed by contaminating training data, such that a trigger can activate predetermined harmful effects during the test phase. In this work, we present AnyDoor, a test-time backdoor attack against multimodal large language models (MLLMs), which involves injecting the backdoor into the textual modality using adversarial test images (sharing the same universal perturbat… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  39. arXiv:2402.08567  [pdf, other

    cs.CL cs.CR cs.CV cs.LG cs.MA

    Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

    Authors: Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, **g Jiang, Min Lin

    Abstract: A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use. Nonetheless, red-teaming efforts have revealed that adversarial images/prompts can jailbreak an MLLM and cause unaligned behaviors. In this work, we report an even more severe safety issue in multi-agent environments, referred to as infectious jail… ▽ More

    Submitted 3 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  40. arXiv:2402.04829  [pdf, other

    cs.CV cs.GR

    NeRF as a Non-Distant Environment Emitter in Physics-based Inverse Rendering

    Authors: **gwang Ling, Ruihan Yu, Feng Xu, Chun Du, Shuang Zhao

    Abstract: Physics-based inverse rendering enables joint optimization of shape, material, and lighting based on captured 2D images. To ensure accurate reconstruction, using a light model that closely resembles the captured environment is essential. Although the widely adopted distant environmental lighting model is adequate in many cases, we demonstrate that its inability to capture spatially varying illumin… ▽ More

    Submitted 1 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: SIGGRAPH 2024. Project page and video: https://nerfemitterpbir.github.io/

  41. arXiv:2402.02084  [pdf, other

    cs.CL

    Revisiting the Markov Property for Machine Translation

    Authors: Cunxiao Du, Hao Zhou, Zhaopeng Tu, **g Jiang

    Abstract: In this paper, we re-examine the Markov property in the context of neural machine translation. We design a Markov Autoregressive Transformer~(MAT) and undertake a comprehensive assessment of its performance across four WMT benchmarks. Our findings indicate that MAT with an order larger than 4 can generate translations with quality on par with that of conventional autoregressive transformers. In ad… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: EACL (Findings)

  42. arXiv:2402.02082  [pdf, other

    cs.CL

    GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

    Authors: Cunxiao Du, **g Jiang, Xu Yuanchen, Jiawei Wu, Sicheng Yu, Yongqi Li, Shenggui Li, Kai Xu, Liqiang Nie, Zhaopeng Tu, Yang You

    Abstract: Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs. In this study, we introduce GliDe and CaPE, two low-hassle modifications to vanilla speculative decoding to further improve the decoding speed of a frozen LLM. Specifically, GliDe is a modified draft model architecture that reuses the cached keys and values fro… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  43. arXiv:2401.17256  [pdf, other

    cs.CL

    Weak-to-Strong Jailbreaking on Large Language Models

    Authors: Xuandong Zhao, Xianjun Yang, Tianyu Pang, Chao Du, Lei Li, Yu-Xiang Wang, William Yang Wang

    Abstract: Large language models (LLMs) are vulnerable to jailbreak attacks - resulting in harmful, unethical, or biased text generations. However, existing jailbreaking methods are computationally costly. In this paper, we propose the weak-to-strong jailbreaking attack, an efficient method to attack aligned LLMs to produce harmful text. Our key intuition is based on the observation that jailbroken and align… ▽ More

    Submitted 5 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  44. arXiv:2401.15381  [pdf, ps, other

    cs.IT

    Golay Complementary Sequences of Arbitrary Length and Asymptotic Existence of Hadamard Matrices

    Authors: Cheng Du, Yi Jiang

    Abstract: In this work, we construct $4$-phase Golay complementary sequence (GCS) set of cardinality $2^{3+\lceil \log_2 r \rceil}$ with arbitrary sequence length $n$, where the $10^{13}$-base expansion of $n$ has $r$ nonzero digits. Specifically, the GCS octets (eight sequences) cover all the lengths no greater than $10^{13}$. Besides, based on the representation theory of signed symmetric group, we constr… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  45. arXiv:2401.14321  [pdf, other

    eess.AS cs.SD

    VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

    Authors: Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt. However, such decoder-only TTS models lack monotonic alignment constraints, sometimes leading to hallucination issues such as mispronunciation, word skip** and repeating. To address this limitation,… ▽ More

    Submitted 29 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  46. arXiv:2401.13034  [pdf, other

    cs.LG cs.AI

    Locality Sensitive Sparse Encoding for Learning World Models Online

    Authors: Zichen Liu, Chao Du, Wee Sun Lee, Min Lin

    Abstract: Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-train… ▽ More

    Submitted 17 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  47. arXiv:2401.11943  [pdf, other

    cs.LG cs.CL cs.CR cs.CV cs.MM

    Benchmarking Large Multimodal Models against Common Corruptions

    Authors: Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin

    Abstract: This technical report aims to fill a deficiency in the assessment of large multimodal models (LMMs) by specifically examining the self-consistency of their outputs when subjected to common corruptions. We investigate the cross-modal interactions between text, image, and speech, encompassing four essential generation tasks: text-to-image, image-to-text, text-to-speech, and speech-to-text. We create… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Technical report

  48. arXiv:2401.08690  [pdf, other

    cs.LG

    Contrastive Learning with Negative Sampling Correction

    Authors: Lu Wang, Chao Du, Pu Zhao, Chuan Luo, Zhangchi Zhu, Bo Qiao, Wei Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: As one of the most effective self-supervised representation learning methods, contrastive learning (CL) relies on multiple negative pairs to contrast against each positive pair. In the standard practice of contrastive learning, data augmentation methods are utilized to generate both positive and negative pairs. While existing works have been focusing on improving the positive sampling, the negativ… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures

  49. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  50. arXiv:2311.17541  [pdf, other

    cs.AI

    TaskWeaver: A Code-First Agent Framework

    Authors: Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Large Language Models (LLMs) have shown impressive abilities in natural language understanding and generation, leading to their widespread use in applications such as chatbots and virtual assistants. However, existing LLM frameworks face limitations in handling domain-specific data analytics tasks with rich data structures. Moreover, they struggle with flexibility to meet diverse user requirements… ▽ More

    Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.