Skip to main content

Showing 1–23 of 23 results for author: Pi, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19976  [pdf, other

    cs.LG math.OC

    ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

    Authors: Rui Pan, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Xiaoyu Wang, Tong Zhang

    Abstract: Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particu… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.07502  [pdf, other

    cs.CV cs.CL

    Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

    Authors: Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang

    Abstract: Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One source is the scra** of image-text pairs from the web. Despite their abundance, these descriptions are often of low quality and noisy. Another is t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2404.09216  [pdf, other

    cs.CV

    DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

    Authors: Lewei Yao, Renjie Pi, Jianhua Han, Xiaodan Liang, Hang Xu, Wei Zhang, Zhenguo Li, Dan Xu

    Abstract: Existing open-vocabulary object detectors typically require a predefined set of categories from users, significantly confining their application scenarios. In this paper, we introduce DetCLIPv3, a high-performing detector that excels not only at both open-vocabulary object detection, but also generating hierarchical labels for detected objects. DetCLIPv3 is characterized by three core designs: 1.… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  4. arXiv:2403.17919  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

    Authors: Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang

    Abstract: The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource envir… ▽ More

    Submitted 25 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  5. arXiv:2403.08730  [pdf, other

    cs.CL cs.CV

    Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

    Authors: Renjie Pi, Tianyang Han, Wei Xiong, Jipeng Zhang, Runtao Liu, Rui Pan, Tong Zhang

    Abstract: Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards generating responses similar to their pretraining corpus, overshadowing the importance of visual information. We treat this bias as a "preference" for pretraining statistics, which hinders the model's grounding in visual input. To mitigate this issue, we pro… ▽ More

    Submitted 3 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  6. arXiv:2402.13494  [pdf, other

    cs.CL cs.CR

    GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

    Authors: Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong

    Abstract: Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of sa… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 Main

  7. arXiv:2402.05138  [pdf, other

    cs.AI cs.CL

    SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

    Authors: Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang

    Abstract: The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-respon… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Work in progress

  8. arXiv:2402.03757  [pdf, other

    cs.CV cs.CL cs.LG

    The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs

    Authors: Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang

    Abstract: Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typi… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  9. arXiv:2401.02906  [pdf, other

    cs.CR cs.CL cs.CV

    MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

    Authors: Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang

    Abstract: The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not cons… ▽ More

    Submitted 17 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  10. arXiv:2312.11370  [pdf, other

    cs.CL

    G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

    Authors: Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

    Abstract: Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 10 pages

  11. arXiv:2311.08364  [pdf, other

    cs.LG cs.AI cs.DM

    Plum: Prompt Learning using Metaheuristic

    Authors: Rui Pan, Shuo Xing, Shizhe Diao, Wenhe Sun, Xiang Liu, Kashun Shum, Renjie Pi, Jipeng Zhang, Tong Zhang

    Abstract: Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunatel… ▽ More

    Submitted 30 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Published at Findings of ACL 2024

  12. arXiv:2311.06612  [pdf, other

    cs.CV cs.CL

    PerceptionGPT: Effectively Fusing Visual Perception into LLM

    Authors: Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, Tong Zhang

    Abstract: The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate visual perception tasks remains a challenge. In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips th… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  13. arXiv:2309.06256  [pdf, other

    cs.LG

    Mitigating the Alignment Tax of RLHF

    Authors: Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, Tong Zhang

    Abstract: LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments with existing RLHF algorithms using OpenLLaMA-3B, which revealed a pronounced alignment tax in NLP tasks. On the other hand, despite var… ▽ More

    Submitted 5 February, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 28 Pages

  14. arXiv:2305.14167  [pdf, other

    cs.CV cs.AI

    DetGPT: Detect What You Need via Reasoning

    Authors: Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

    Abstract: In recent years, the field of computer vision has seen significant advancements thanks to the development of large language models (LLMs). These models have enabled more effective and sophisticated interactions between humans and machines, paving the way for novel techniques that blur the lines between human and machine intelligence. In this paper, we introduce a new paradigm for object detection… ▽ More

    Submitted 23 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  15. arXiv:2305.13153   

    cs.LG math.OC

    Effective Bilevel Optimization via Minimax Reformulation

    Authors: Xiaoyu Wang, Rui Pan, Renjie Pi, Tong Zhang

    Abstract: Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation ne… ▽ More

    Submitted 19 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Typos and intended inclusion of additional experiments

  16. arXiv:2301.09880  [pdf, other

    cs.LG

    Probabilistic Bilevel Coreset Selection

    Authors: Xiao Zhou, Renjie Pi, Weizhong Zhang, Yong Lin, Tong Zhang

    Abstract: The goal of coreset selection in supervised learning is to produce a weighted subset of data, so that training only on the subset achieves similar performance as training on the entire dataset. Existing methods achieved promising results in resource-constrained scenarios such as continual learning and streaming. However, most of the existing algorithms are limited to traditional machine learning m… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  17. arXiv:2301.09819  [pdf, other

    cs.LG

    Model Agnostic Sample Reweighting for Out-of-Distribution Learning

    Authors: Xiao Zhou, Yong Lin, Renjie Pi, Weizhong Zhang, Renzhe Xu, Peng Cui, Tong Zhang

    Abstract: Distributionally robust optimization (DRO) and invariant risk minimization (IRM) are two popular methods proposed to improve out-of-distribution (OOD) generalization performance of machine learning models. While effective for small models, it has been observed that these methods can be vulnerable to overfitting with large overparameterized models. This work proposes a principled method, \textbf{M}… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  18. arXiv:2211.10878  [pdf, other

    cs.LG cs.AI

    DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics

    Authors: Renjie Pi, Weizhong Zhang, Yueqi Xie, Jiahui Gao, Xiaoyu Wang, Sunghun Kim, Qifeng Chen

    Abstract: The Federated Learning (FL) paradigm is known to face challenges under heterogeneous client data. Local training on non-iid distributed data results in deflected local optimum, which causes the client models drift further away from each other and degrades the aggregated global model's performance. A natural solution is to gather all client data onto the server, such that the server has a global vi… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  19. arXiv:2211.05554  [pdf, other

    cs.LG cs.CV cs.DC

    Robust Federated Learning against both Data Heterogeneity and Poisoning Attack via Aggregation Optimization

    Authors: Yueqi Xie, Weizhong Zhang, Renjie Pi, Fangzhao Wu, Qifeng Chen, Xing Xie, Sunghun Kim

    Abstract: Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning (FL) systems. While both of them have attracted great research interest with specific strategies developed, no known solution manages to address them in a unified framework. To universally overcome both challenges, we propose SmartFL, a generic approach that optimizes the server-… ▽ More

    Submitted 20 November, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

  20. arXiv:2205.12679  [pdf, other

    cs.CL

    Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

    Authors: Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

    Abstract: There is a rising interest in further exploring the zero-shot learning potential of large pre-trained language models (PLMs). A new paradigm called data-generation-based zero-shot learning has achieved impressive success. In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than… ▽ More

    Submitted 26 February, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: ICLR 2023 camera ready with 23 pages

  21. arXiv:2108.07482  [pdf, other

    cs.CV

    G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation

    Authors: Lewei Yao, Renjie Pi, Hang Xu, Wei Zhang, Zhenguo Li, Tong Zhang

    Abstract: In this paper, we investigate the knowledge distillation (KD) strategy for object detection and propose an effective framework applicable to both homogeneous and heterogeneous student-teacher pairs. The conventional feature imitation paradigm introduces imitation masks to focus on informative foreground areas while excluding the background noises. However, we find that those methods fail to fully… ▽ More

    Submitted 12 October, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

    Comments: Accepted by ICCV2021

  22. arXiv:2105.12971  [pdf, other

    cs.CV cs.AI

    Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

    Authors: Lewei Yao, Renjie Pi, Hang Xu, Wei Zhang, Zhenguo Li, Tong Zhang

    Abstract: We propose Joint-DetNAS, a unified NAS framework for object detection, which integrates 3 key components: Neural Architecture Search, pruning, and Knowledge Distillation. Instead of naively pipelining these techniques, our Joint-DetNAS optimizes them jointly. The algorithm consists of two core processes: student morphism optimizes the student's architecture and removes the redundant parameters, wh… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: Accepted by CVPR 2021

  23. arXiv:1911.09336  [pdf, other

    cs.LG stat.ML

    Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

    Authors: Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang

    Abstract: Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. Howe… ▽ More

    Submitted 24 November, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    Comments: Accepted by NeurIPS 2020