Skip to main content

Showing 1–50 of 1,606 results for author: Zhu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00945  [pdf, other

    cs.LG

    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2407.00849  [pdf, other

    cs.LG

    Towards Understanding Sensitive and Decisive Patterns in Explainable AI: A Case Study of Model Interpretation in Geometric Deep Learning

    Authors: Jiajun Zhu, Siqi Miao, Rex Ying, Pan Li

    Abstract: The interpretability of machine learning models has gained increasing attention, particularly in scientific domains where high precision and accountability are crucial. This research focuses on distinguishing between two critical data patterns -- sensitive patterns (model-related) and decisive patterns (task-related) -- which are commonly used as model interpretations but often lead to confusion.… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2407.00066  [pdf, other

    cs.DC cs.AI cs.CL cs.LG

    Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

    Authors: Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon

    Abstract: Fine-tuning large language models (LLMs) with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates. This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA. Prior works optimize the design of such systems but still require continuous loading and of… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  4. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.18007  [pdf, other

    cs.MM

    Deep Mamba Multi-modal Learning

    Authors: Jian Zhu, Xin Zou, Yu Cui, Zhangmin Huang, Chenshu Hu, Bo Lyu

    Abstract: Inspired by the excellent performance of Mamba networks, we propose a novel Deep Mamba Multi-modal Learning (DMML). It can be used to achieve the fusion of multi-modal features. We apply DMML to the field of multimedia retrieval and propose an innovative Deep Mamba Multi-modal Hashing (DMMH) method. It combines the advantages of algorithm accuracy and inference speed. We validated the effectivenes… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

    Comments: Deep Mamba Multi-modal Learning; Deep Mamba Multi-modal Hashing

  6. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.16815  [pdf, other

    cs.CV

    ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

    Authors: Yufei Liu, Junshu Tang, Chu Zheng, Shijie Zhang, **kun Hao, Junwei Zhu, Dong** Huang

    Abstract: High-fidelity 3D garment synthesis from text is desirable yet challenging for digital avatar creation. Recent diffusion-based approaches via Score Distillation Sampling (SDS) have enabled new possibilities but either intricately couple with human body or struggle to reuse. We introduce ClotheDreamer, a 3D Gaussian-based method for generating wearable, production-ready 3D garment assets from text p… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project Page: https://ggxxii.github.io/clothedreamer

  8. arXiv:2406.16360  [pdf, other

    cs.CV cs.GR

    MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

    Authors: Yuxin Dai, Qi Wang, **gsen Zhu, Dianbing Xi, Yuchi Huo, Chen Qian, Ying He

    Abstract: We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 16 pages, 14 figures

  9. arXiv:2406.16321  [pdf, other

    cs.LG cs.AI

    Multimodal Graph Benchmark

    Authors: **g Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

    Abstract: Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and v… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://mm-graph-benchmark.github.io/

  10. arXiv:2406.16069  [pdf, other

    cs.CL cs.AI

    FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

    Authors: Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko

    Abstract: Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before in… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  11. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  12. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  13. arXiv:2406.15735  [pdf, other

    cs.CV cs.AI

    Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model

    Authors: Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun Zhu

    Abstract: Diffusion models have obtained substantial progress in image-to-video (I2V) generation. However, such models are not fully understood. In this paper, we report a significant but previously overlooked issue in I2V diffusion models (I2V-DMs), namely, conditional image leakage. I2V-DMs tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Project page: https://cond-image-leak.github.io/

  14. arXiv:2406.15178  [pdf, other

    cs.CL

    Hybrid Alignment Training for Large Language Models

    Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, **gbo Zhu

    Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guara… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by ACL (Findings) 2024

  15. arXiv:2406.14964  [pdf, other

    cs.CV

    VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

    Authors: Zixuan Chen, Ruijie Su, Jiahao Zhu, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Text-to-3D generation aims to create 3D assets from text-to-image diffusion models. However, existing methods face an inherent bottleneck in generation quality because the widely-used objectives such as Score Distillation Sampling (SDS) inappropriately omit U-Net jacobians for swift generation, leading to significant bias compared to the "true" gradient obtained by full denoising sampling. This bi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  16. arXiv:2406.14017  [pdf, other

    cs.IR

    EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

    Authors: Ye Wang, Jiahao Xun, Mingjie Hong, Jieming Zhu, Tao **, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

    Abstract: Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024. Source code available at https://reczoo.github.io/EAGER

  17. arXiv:2406.13977  [pdf, other

    eess.IV cs.CV

    Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning

    Authors: Tingyi Lin, Pengju Lyu, Jie Zhang, Yuqing Wang, Cheng Wang, Jianjun Zhu

    Abstract: Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional d… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  18. arXiv:2406.13495  [pdf, other

    cs.CV

    DF40: Toward Next-Generation Deepfake Detection

    Authors: Zhiyuan Yan, Tai** Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding, Yunsheng Wu

    Abstract: We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass"… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.13378  [pdf, other

    cs.CV

    Any360D: Towards 360 Depth Anything with Unlabeled 360 Data and Möbius Spatial Augmentation

    Authors: Zidong Cao, ****g Zhu, Weiming Zhang, Lin Wang

    Abstract: Recently, Depth Anything Model (DAM) - a type of depth foundation model - reveals impressive zero-shot capacity for diverse perspective images. Despite its success, it remains an open question regarding DAM's performance on 360 images that enjoy a large field-of-view (180x360) but suffer from spherical distortions. To this end, we establish, to our knowledge, the first benchmark that aims to 1) ev… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  20. arXiv:2406.13114  [pdf, other

    cs.CL cs.AI

    Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

    Authors: Yuhang Zhou, **g Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang

    Abstract: Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of mer… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: preprint

  21. arXiv:2406.12017  [pdf, other

    stat.ML cs.LG stat.CO

    Sparsity-Constraint Optimization via Splicing Iteration

    Authors: Zezhi Wang, ** Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang

    Abstract: Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEratio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 34 pages

  22. An ethical study of generative AI from the Actor-Network Theory perspective

    Authors: Yuying Li, **chi Zhu

    Abstract: The widespread use of Generative Artificial Intelligence in the innovation and generation of communication content is mainly due to its exceptional creative ability, operational efficiency, and compatibility with diverse industries. Nevertheless, this has also sparked ethical problems, such as unauthorized access to data, biased decision-making by algorithms, and criminal use of generated content.… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

    Comments: 12 pages and one figure

    MSC Class: 94-02 ACM Class: I.2.7; I.2.1

  23. arXiv:2406.10976  [pdf, other

    cs.LG cs.CL cs.CR

    Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

    Authors: JianHao Zhu, Changze Lv, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuan**g Huang

    Abstract: Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  24. arXiv:2406.10965  [pdf, other

    cs.CL cs.SI

    DocNet: Semantic Structure in Inductive Bias Detection Models

    Authors: Jessica Zhu, Iain Cruickshank, Michel Cukier

    Abstract: News will have biases so long as people have opinions. However, as social media becomes the primary entry point for news and partisan gaps increase, it is increasingly important for informed citizens to be able to identify bias. People will be able to take action to avoid polarizing echo chambers if they know how the news they are consuming is biased. In this paper, we explore an often overlooked… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Under submission with EMNLP 2024

  25. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.10059  [pdf, ps, other

    cs.RO

    Double-Anonymous Review for Robotics

    Authors: Justin K. Yim, Paul Nadan, James Zhu, Alexandra Stutt, J. Joe Payne, Catherine Pavlov, Aaron M. Johnson

    Abstract: Prior research has investigated the benefits and costs of double-anonymous review (DAR, also known as double-blind review) in comparison to single-anonymous review (SAR) and open review (OR). Several review papers have attempted to compile experimental results in peer review research both broadly and in engineering and computer science. This document summarizes prior research in peer review that m… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Originally published August 24, 2022

  27. arXiv:2406.09408  [pdf, other

    cs.CV cs.LG

    Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

    Authors: Sheng-Yu Wang, Aaron Hertzmann, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang

    Abstract: The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image. We can define "influence" by saying that, for a given output, if a model is retrained from scratch without that output's most influential images, the model should then fail to generate that output image. Unfortunately, directly searching for these influential… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://peterwang512.github.io/AttributeByUnlearning Code: https://github.com/PeterWang512/AttributeByUnlearning

  28. arXiv:2406.09326  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

    Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu

    Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-h… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Codes and Dataset: https://agnjason.github.io/PianoMotion-page

  29. arXiv:2406.09179  [pdf, other

    cs.LG

    Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning

    Authors: Qizhou Wang, Bo Han, Puning Yang, Jianing Zhu, Tongliang Liu, Masashi Sugiyama

    Abstract: The compelling goal of eradicating undesirable data behaviors, while preserving usual model functioning, underscores the significance of machine unlearning within the domain of large language models (LLMs). Recent research has begun to approach LLM unlearning via gradient ascent (GA) -- increasing the prediction risk for those training strings targeted to be unlearned, thereby erasing their parame… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  30. arXiv:2406.08288  [pdf, other

    cs.LG

    Decoupling the Class Label and the Target Concept in Machine Unlearning

    Authors: Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, Masashi Sugiyama

    Abstract: Machine unlearning as an emerging research topic for data regulations, aims to adjust a trained model to approximate a retrained one that excludes a portion of training data. Previous studies showed that class-wise unlearning is successful in forgetting the knowledge of a target class, through gradient ascent on the forgetting data or fine-tuning with the remaining data. However, while these metho… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  31. arXiv:2406.07951  [pdf, other

    cs.CV

    DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

    Authors: Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha

    Abstract: Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as smartphones and wearable devices. Despite its potential, the lack of Image signal processing (ISP) pipeline specifically designed for HybridEVS poses… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  32. arXiv:2406.07932  [pdf, other

    cs.IR

    Counteracting Duration Bias in Video Recommendation via Counterfactual Watch Time

    Authors: Haiyuan Zhao, Guohao Cai, Jieming Zhu, Zhenhua Dong, Jun Xu, Ji-Rong Wen

    Abstract: In video recommendation, an ongoing effort is to satisfy users' personalized information needs by leveraging their logged watch time. However, watch time prediction suffers from duration bias, hindering its ability to reflect users' interests accurately. Existing label-correction approaches attempt to uncover user interests through grou** and normalizing observed watch time according to video du… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  33. arXiv:2406.07543  [pdf, other

    cs.CV

    Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

    Authors: Chenyu Yang, Xizhou Zhu, **guo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai

    Abstract: Recently, vision model pre-training has evolved from relying on manually annotated datasets to leveraging large-scale, web-crawled image-text data. Despite these advances, there is no pre-training method that effectively exploits the interleaved image-text data, which is very prevalent on the Internet. Inspired by the recent success of compression learning in natural language processing, we propos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  34. arXiv:2406.07168  [pdf, other

    cs.CL

    Teaching Language Models to Self-Improve by Learning from Language Feedback

    Authors: Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, **gbo Zhu

    Abstract: Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotati… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  35. arXiv:2406.07057  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

    Authors: Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu

    Abstract: Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchm… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 100 pages, 84 figures, 33 tables

  36. arXiv:2406.06969  [pdf, other

    q-bio.GN cs.DM

    Data mining method of single-cell omics data to evaluate a pure tissue environmental effect on gene expression level

    Authors: Daigo Okada, Jianshen Zhu, Kan Shota, Yuuki Nishimura, Kazuya Haraguchi

    Abstract: While single-cell RNA-seq enables the investigation of the celltype effect on the transcriptome, the pure tissue environmental effect has not been well investigated. The bias in the combination of tissue and celltype in the body made it difficult to evaluate the effect of pure tissue environment by omics data mining. It is important to prevent statistical confounding among discrete variables such… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  37. arXiv:2406.05797  [pdf, other

    q-bio.BM cs.AI cs.CE cs.CL cs.LG

    3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, **hua Zhu, Rui Yan

    Abstract: The integration of molecule and language has garnered increasing attention in molecular science. Recent advancements in Language Models (LMs) have demonstrated potential for the comprehensive modeling of molecule and language. However, existing works exhibit notable limitations. Most existing works overlook the modeling of 3D information, which is crucial for understanding molecular structures and… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 18 pages

  38. arXiv:2406.05070  [pdf, other

    cs.DB

    Targeted Mining Precise-positioning Episode Rules

    Authors: Jian Zhu, Xiaoye Chen, Wensheng Gan, Zefeng Chen, Philip S. Yu

    Abstract: The era characterized by an exponential increase in data has led to the widespread adoption of data intelligence as a crucial task. Within the field of data mining, frequent episode mining has emerged as an effective tool for extracting valuable and essential information from event sequences. Various algorithms have been developed to discover frequent episodes and subsequently derive episode rules… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: IEEE TETCI, 14 pages

  39. arXiv:2406.04881  [pdf, other

    cs.IT eess.SP

    MIMO Capacity Analysis and Channel Estimation for Electromagnetic Information Theory

    Authors: Jieao Zhu, Vincent Y. F. Tan, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of E… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to the IEEE TWC. In this paper, we established the discrete-continuous correspondence for electromagnetic information theory (EIT), thus enabling analytical tools in the continuous space domain to be applied to discrete space MIMO architectures. Simulation codes will be provided at http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  40. arXiv:2406.04766  [pdf, other

    cs.LG math.OC stat.ML

    Reinforcement Learning and Regret Bounds for Admission Control

    Authors: Lucas Weber, Ana Bušić, Jiamin Zhu

    Abstract: The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  41. arXiv:2406.04640  [pdf, other

    cs.LG

    LinkGPT: Teaching Large Language Models To Predict Missing Links

    Authors: Zhongmou He, **g Zhu, Shengyi Qian, Joyce Chai, Danai Koutra

    Abstract: Large Language Models (LLMs) have shown promising results on various language and vision tasks. Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs). However, most studies have focused on node classification, while the use of LLMs for link prediction (LP) remains understudied. In this work, we propose a new task on LLMs, whe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  42. arXiv:2406.04584  [pdf, other

    cs.LG cs.AI cs.CV

    CLoG: Benchmarking Continual Learning of Image Generation Models

    Authors: Haotian Zhang, Junting Zhou, Haowei Lin, Hang Ye, Jianhua Zhu, Zihao Wang, Liangcai Gao, Yizhou Wang, Yitao Liang

    Abstract: Continual Learning (CL) poses a significant challenge in Artificial Intelligence, aiming to mirror the human ability to incrementally acquire knowledge and skills. While extensive research has focused on CL within the context of classification tasks, the advent of increasingly powerful generative models necessitates the exploration of Continual Learning of Generative models (CLoG). This paper advo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  43. arXiv:2406.03718  [pdf, other

    cs.CR cs.AI cs.CL

    Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning

    Authors: Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, Hai **

    Abstract: Code Pre-trained Models (CodePTMs) based vulnerability detection have achieved promising results over recent years. However, these models struggle to generalize as they typically learn superficial map** from source code to labels instead of understanding the root causes of code vulnerabilities, resulting in poor performance in real-world scenarios beyond the training instances. To tackle this ch… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  44. arXiv:2406.02425  [pdf, other

    cs.CV cs.RO

    CoNav: A Benchmark for Human-Centered Collaborative Navigation

    Authors: Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang, Yanxia Liu, **hui Zhu, Chuang Gan, Mingkui Tan

    Abstract: Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  45. arXiv:2406.02345  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Progressive Confident Masking Attention Network for Audio-Visual Segmentation

    Authors: Yuxuan Wang, Feng Dong, **chao Zhu

    Abstract: Audio and visual signals typically occur simultaneously, and humans possess an innate ability to correlate and synchronize information from these two modalities. Recently, a challenging problem known as Audio-Visual Segmentation (AVS) has emerged, intending to produce segmentation maps for sounding objects within a scene. However, the methods proposed so far have not sufficiently integrated audio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures, submitted to IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

  46. arXiv:2406.01860  [pdf, other

    cs.CL

    Eliciting the Priors of Large Language Models using Iterated In-Context Learning

    Authors: Jian-Qiao Zhu, Thomas L. Griffiths

    Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world settings, understanding the knowledge they implicitly use when making decisions is critical. One way to capture this knowledge is in the form of Bayesian prior distributions. We develop a prompt-based workflow for eliciting prior distributions from LLMs. Our approach is based on iterated learning, a Markov chain Monte Carlo me… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  47. arXiv:2406.00750  [pdf, other

    cs.CV cs.LG

    Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models

    Authors: Wenqiang Sun, Zhengyi Wang, Shuo Chen, Yikai Wang, Zilong Chen, Jun Zhu, Jun Zhang

    Abstract: Creating 3D assets from single-view images is a complex task that demands a deep understanding of the world. Recently, feed-forward 3D generative models have made significant progress by training large reconstruction models on extensive 3D datasets, with triplanes being the preferred 3D geometry representation. However, effectively utilizing the geometric priors of triplanes, while minimizing arti… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: project can be found in: https://freeplane3d.github.io/

  48. arXiv:2406.00631  [pdf, other

    cs.CV

    MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging

    Authors: Jiaying Zhou, Mingzhou Jiang, Junde Wu, Jiayuan Zhu, Ziyue Wang, Yueming **

    Abstract: Medicine is inherently a multimodal discipline. Medical images can reflect the pathological changes of cancer and tumors, while the expression of specific genes can influence their morphological characteristics. However, most deep learning models employed for these medical tasks are unimodal, making predictions using either image data or genomic data exclusively. In this paper, we propose a multim… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  49. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, **gbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  50. arXiv:2406.00210  [pdf, other

    cs.CV

    A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies

    Authors: **chao Zhu, Yuxuan Wang, Siyuan Pan, Pengfei Wan, Di Zhang, Gao Huang

    Abstract: The Stable Diffusion Model (SDM) is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. Despite various attempts at sampler optimization, model distillation, and network quantification, these approaches typically maintain the original network architecture. The extensive parameter scale and substantial computational demands have limited research into adjusti… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

    Comments: 19 pages, 16 figures, submitted to IEEE Transactions on Neural Networks and Learning Systems