Skip to main content

Showing 1–50 of 86 results for author: Pan, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19976  [pdf, other

    cs.LG math.OC

    ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

    Authors: Rui Pan, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Xiaoyu Wang, Tong Zhang

    Abstract: Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up. Only recently, a paradigm of first-order algorithms emerged, capable of effectively addressing bilevel optimization problems. Nevertheless, the practical efficiency of this paradigm remains unverified, particu… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.15244  [pdf, other

    cs.LG math.OC

    Large Batch Analysis for Adagrad Under Anisotropic Smoothness

    Authors: Yuxing Liu, Rui Pan, Tong Zhang

    Abstract: Adaptive gradient algorithms have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite their huge success in practice, their theoretical advantages over stochastic gradient descent (SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice. This is because the only theoretical result that can d… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.07502  [pdf, other

    cs.CV cs.CL

    Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

    Authors: Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang

    Abstract: Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, image description datasets primarily originate from two sources. One source is the scra** of image-text pairs from the web. Despite their abundance, these descriptions are often of low quality and noisy. Another is t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2404.17582  [pdf, other

    cs.HC cs.LG stat.AP

    Data Quality in Crowdsourcing and Spamming Behavior Detection

    Authors: Yang Ba, Michelle V. Mancenido, Erin K. Chiou, Rong Pan

    Abstract: As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credib… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Preprint paper, under review on Behavior Research Methods. 45 pages, 10 figures

  6. arXiv:2404.12678  [pdf, other

    cs.CV

    Exploring Interactive Semantic Alignment for Efficient HOI Detection with Vision-language Model

    Authors: Jihao Dong, Renjie Pan, Hua Yang

    Abstract: Human-Object Interaction (HOI) detection aims to localize human-object pairs and comprehend their interactions. Recently, two-stage transformer-based methods have demonstrated competitive performance. However, these methods frequently focus on object appearance features and ignore global contextual information. Besides, vision-language model CLIP which effectively aligns visual and text embeddings… ▽ More

    Submitted 24 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  7. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  8. arXiv:2404.06809  [pdf, other

    cs.CL

    Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

    Authors: Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun

    Abstract: The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and corre… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Our code, benchmark, and models are available at https://github.com/panruotong/CAG

  9. arXiv:2404.01630  [pdf, other

    cs.NI

    SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies

    Authors: Tommaso Bonato, Abdul Kabbani, Daniele De Sensi, Rong Pan, Yanfang Le, Costin Raiciu, Mark Handley, Timo Schneider, Nils Blach, Ahmad Ghalayini, Daniel Alves, Michael Papamichael, Adrian Caulfield, Torsten Hoefler

    Abstract: With the rapid growth of machine learning (ML) workloads in datacenters, existing congestion control (CC) algorithms fail to deliver the required performance at scale. ML traffic is bursty and bulk-synchronous and thus requires quick reaction and strong fairness. We show that existing CC algorithms that use delay as a main signal react too slowly and are not always fair. We design SMaRTT, a simple… ▽ More

    Submitted 27 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Fixed typo and wrong y axis of one plot

  10. arXiv:2403.17919  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

    Authors: Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang

    Abstract: The machine learning community has witnessed impressive advancements since large language models (LLMs) first appeared. Yet, their massive memory consumption has become a significant roadblock to large-scale training. For instance, a 7B model typically requires at least 60 GB of GPU memory with full parameter training, which presents challenges for researchers without access to high-resource envir… ▽ More

    Submitted 25 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  11. arXiv:2403.11163  [pdf, ps, other

    stat.ME cs.LG math.ST stat.CO

    A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

    Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, **g Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

    Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  12. arXiv:2403.08730  [pdf, other

    cs.CL cs.CV

    Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

    Authors: Renjie Pi, Tianyang Han, Wei Xiong, Jipeng Zhang, Runtao Liu, Rui Pan, Tong Zhang

    Abstract: Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards generating responses similar to their pretraining corpus, overshadowing the importance of visual information. We treat this bias as a "preference" for pretraining statistics, which hinders the model's grounding in visual input. To mitigate this issue, we pro… ▽ More

    Submitted 3 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  13. arXiv:2403.00783  [pdf, other

    cs.AI

    On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs

    Authors: Hankz Hankui Zhuo, Xin Chen, Rong Pan

    Abstract: Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, wit… ▽ More

    Submitted 18 February, 2024; originally announced March 2024.

  14. arXiv:2402.03757  [pdf, other

    cs.CV cs.CL cs.LG

    The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs

    Authors: Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang

    Abstract: Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typi… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  15. arXiv:2401.11839  [pdf, other

    cs.CL cs.CY

    AI for social science and social science of AI: A Survey

    Authors: Ruoxi Xu, Yingfei Sun, Mengjie Ren, Shiguang Guo, Ruotong Pan, Hongyu Lin, Le Sun, Xianpei Han

    Abstract: Recent advancements in artificial intelligence, particularly with the emergence of large language models (LLMs), have sparked a rethinking of artificial general intelligence possibilities. The increasing human-like capabilities of AI are also attracting attention in social science research, leading to various studies exploring the combination of these two fields. In this survey, we systematically… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted by Information Processing and Management (IP&M)

  16. arXiv:2401.02906  [pdf, other

    cs.CR cs.CL cs.CV

    MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

    Authors: Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang

    Abstract: The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates the novel challenge of defending MLLMs against such attacks. Compared to large language models (LLMs), MLLMs include an additional image modality. We discover that images act as a ``foreign language" that is not cons… ▽ More

    Submitted 17 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  17. arXiv:2401.01916  [pdf, other

    astro-ph.IM astro-ph.CO astro-ph.GA astro-ph.SR cs.CL cs.LG

    AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets

    Authors: Ernest Perkowski, Rui Pan, Tuan Dung Nguyen, Yuan-Sen Ting, Sandor Kruk, Tong Zhang, Charlie O'Neill, Maja Jablonska, Zechang Sun, Michael J. Smith, Huiling Liu, Kevin Schawinski, Kartheik Iyer, Ioana Ciucă for UniverseTBD

    Abstract: We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: 4 pages, 1 figure, model is available at https://huggingface.co/universeTBD, published in RNAAS

  18. arXiv:2312.14567  [pdf, other

    cs.LG math.OC

    Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise

    Authors: Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang

    Abstract: Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide… ▽ More

    Submitted 17 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Published at ICLR 2024

  19. arXiv:2312.05385  [pdf, other

    cs.DC cs.LG

    Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving

    Authors: Yinwei Dai, Rui Pan, Anand Iyer, Kai Li, Ravi Netravali

    Abstract: Machine learning (ML) inference platforms are tasked with balancing two competing goals: ensuring high throughput given many requests, and delivering low-latency responses to support interactive applications. Unfortunately, existing platform knobs (e.g., batch sizes) fail to ease this fundamental tension, and instead only enable users to harshly trade off one property for the other. This paper exp… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: The first two authors contributed equally and are alphabetically ordered

  20. arXiv:2312.03176  [pdf, other

    cs.LG

    Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes

    Authors: Hao Zhao, Rong Pan

    Abstract: Change-point detection (CPD) is crucial for identifying abrupt shifts in data, which influence decision-making and efficient resource allocation across various domains. To address the challenges posed by the costly and time-intensive data acquisition in CPD, we introduce the Derivative-Aware Change Detection (DACD) method. It leverages the derivative process of a Gaussian process (GP) for Active L… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  21. arXiv:2311.08364  [pdf, other

    cs.LG cs.AI cs.DM

    Plum: Prompt Learning using Metaheuristic

    Authors: Rui Pan, Shuo Xing, Shizhe Diao, Wenhe Sun, Xiang Liu, Kashun Shum, Renjie Pi, Jipeng Zhang, Tong Zhang

    Abstract: Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunatel… ▽ More

    Submitted 30 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Published at Findings of ACL 2024

  22. arXiv:2311.00047  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

    Authors: Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai

    Abstract: Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this questio… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023 main conference

  23. arXiv:2309.06256  [pdf, other

    cs.LG

    Mitigating the Alignment Tax of RLHF

    Authors: Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, Tong Zhang

    Abstract: LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments with existing RLHF algorithms using OpenLLaMA-3B, which revealed a pronounced alignment tax in NLP tasks. On the other hand, despite var… ▽ More

    Submitted 5 February, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 28 Pages

  24. Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code

    Authors: Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, Reyhaneh Jabbarvand

    Abstract: Code translation aims to convert source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are exploring their potential to automate code translation. The prerequisite for advancing the state of LLM-based code translation is to understand their promises and limitations over existing techniques. To that en… ▽ More

    Submitted 16 January, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Published in ICSE 2024

  25. arXiv:2306.12420  [pdf, other

    cs.CL cs.AI

    LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

    Authors: Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang

    Abstract: Foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from the AI community, an increasing number of foundation models are becoming publicly accessible. However, a significant shortcoming of most of these models lies in their performance in specialized-domain and task-specific a… ▽ More

    Submitted 5 May, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Published in NAACL 2024 Demo Track

  26. Extending adjacency matrices to 3D with triangles

    Authors: Rusheng Pan, Helen C. Purchase, Tim Dwyer, Wei Chen

    Abstract: Social networks are the fabric of society and the subject of frequent visual analysis. Closed triads represent triangular relationships between three people in a social network and are significant for understanding inherent interconnections and influence within the network. The most common methods for representing social networks (node-link diagrams and adjacency matrices) are not optimal for unde… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: to be published in PacificVis 2024

    Journal ref: PacificVis 2024

  27. arXiv:2305.14167  [pdf, other

    cs.CV cs.AI

    DetGPT: Detect What You Need via Reasoning

    Authors: Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

    Abstract: In recent years, the field of computer vision has seen significant advancements thanks to the development of large language models (LLMs). These models have enabled more effective and sophisticated interactions between humans and machines, paving the way for novel techniques that blur the lines between human and machine intelligence. In this paper, we introduce a new paradigm for object detection… ▽ More

    Submitted 23 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  28. arXiv:2305.13153   

    cs.LG math.OC

    Effective Bilevel Optimization via Minimax Reformulation

    Authors: Xiaoyu Wang, Rui Pan, Renjie Pi, Tong Zhang

    Abstract: Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation ne… ▽ More

    Submitted 19 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Typos and intended inclusion of additional experiments

  29. arXiv:2305.00538  [pdf, other

    cs.NI

    SFC: Near-Source Congestion Signaling and Flow Control

    Authors: Yanfang Le, Jeongkeun Lee, Jeremias Blendin, Jiayi Chen, Georgios Nikolaidis, Rong Pan, Robert Soule, Aditya Akella, Pedro Yebenes Segura, Arjun singhvi, Yuliang Li, Qingkai Meng, Changhoon Kim, Serhat Arslan

    Abstract: State-of-the-art congestion control algorithms for data centers alone do not cope well with transient congestion and high traffic bursts. To help with these, we revisit the concept of direct \emph{backward} feedback from switches and propose Back-to-Sender (BTS) signaling to many concurrent incast senders. Combining it with our novel approach to in-network caching, we achieve near-source sub-RTT c… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

  30. arXiv:2304.11524  [pdf, other

    cs.AI

    Personalized Federated Learning via Gradient Modulation for Heterogeneous Text Summarization

    Authors: Rongfeng Pan, Jianzong Wang, Lingwei Kong, Zhangcheng Huang, **g Xiao

    Abstract: Text summarization is essential for information aggregation and demands large amounts of training data. However, concerns about data privacy and security limit data collection and model training. To eliminate this concern, we propose a federated learning text summarization scheme, which allows users to share the global model in a cooperative learning manner without sharing raw data. Personalized f… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: Accepted by IJCNN2023. 2023 IEEE International Joint Conference on Neural Network (IJCNN2023)

  31. arXiv:2304.06767  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

    Authors: Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang

    Abstract: Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-worl… ▽ More

    Submitted 1 December, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 29 pages, 12 figures, Published in Transactions on Machine Learning Research (TMLR)

  32. arXiv:2304.01397  [pdf, other

    cs.SE

    LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

    Authors: Rongqi Pan, Taher A. Ghaleb, Lionel Briand

    Abstract: Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test… ▽ More

    Submitted 2 May, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

  33. arXiv:2302.09501  [pdf, other

    cs.DC

    Meta Computing

    Authors: Xiuzhen Cheng, Minghui Xu, Runyu Pan, Dongxiao Yu, Chenxu Wang, Xue Xiao, Weifeng Lyu

    Abstract: With the continuous improvement of information infrastructures, academia and industry have been constantly exploring new computing paradigms to fully exploit computing powers. In this paper, we propose Meta Computing, a new computing paradigm that aims to utilize all available computing resources hooked on the Internet, provide efficient, fault-tolerant, and personalized services with strong secur… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Comments: 10 papes, 4 figures

  34. Towards Efficient Visual Simplification of Computational Graphs in Deep Neural Networks

    Authors: Rusheng Pan, Zhiyong Wang, Yating Wei, Han Gao, Gongchang Ou, Caleb Chen Cao, **gli Xu, Tong Xu, Wei Chen

    Abstract: A computational graph in a deep neural network (DNN) denotes a specific data flow diagram (DFD) composed of many tensors and operators. Existing toolkits for visualizing computational graphs are not applicable when the structure is highly complicated and large-scale (e.g., BERT [1]). To address this problem, we propose leveraging a suite of visual simplification techniques, including a cycle-remov… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics 1 (2022) 1-14

  35. arXiv:2212.05970  [pdf, other

    cs.SE cs.CL cs.LG

    Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement

    Authors: Sayem Mohammad Imtiaz, Fraol Batole, Astha Singh, Rangeet Pan, Breno Dantas Cruz, Hridesh Rajan

    Abstract: Can we take a recurrent neural network (RNN) trained to translate between languages and augment it to support a new natural language without retraining the model from scratch? Can we fix the faulty behavior of the RNN by replacing portions associated with the faulty behavior? Recent works on decomposing a fully connected neural network (FCNN) and convolutional neural network (CNN) into modules hav… ▽ More

    Submitted 9 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted at 45th international conference on software engineering (ICSE'2023)

  36. arXiv:2212.00105  [pdf, other

    cs.SE

    An Empirical Study on the Bugs Found while Reusing Pre-trained Natural Language Processing Models

    Authors: Rangeet Pan, Sumon Biswas, Mohna Chakraborty, Breno Dantas Cruz, Hridesh Rajan

    Abstract: In NLP, reusing pre-trained models instead of training from scratch has gained popularity; however, NLP models are mostly black boxes, very large, and often require significant resources. To ease, models trained with large corpora are made available, and developers reuse them for different problems. In contrast, developers mostly build their models from scratch for traditional DL-related problems.… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

    Comments: 12 pages, 12 figures

    MSC Class: 68T50 ACM Class: D.6; D.2.5; D.2.13

  37. arXiv:2211.17201  [pdf, other

    cs.CL cs.LG math.OC

    ExtremeBERT: A Toolkit for Accelerating Pretraining of Customized BERT

    Authors: Rui Pan, Shizhe Diao, Jianlin Chen, Tong Zhang

    Abstract: In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining. Our goal is to provide an easy-to-use BERT pretraining toolkit for the research community and industry. Thus, the pretraining of popular language models on customized datasets is affordable with limited resources. Experiments show that, to achieve the same or better GLUE scores, the time cost of our… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

  38. arXiv:2210.16269  [pdf, other

    cs.SE

    ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search

    Authors: Rongqi Pan, Taher A. Ghaleb, Lionel Briand

    Abstract: Executing large test suites is time and resource consuming, sometimes impossible, and such test suites typically contain many redundant test cases. Hence, test case minimization is used to remove redundant test cases that are unlikely to detect new faults. However, most test case (suite) minimization techniques rely on code coverage (white-box), model-based features, or requirements specifications… ▽ More

    Submitted 20 December, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted at the 45th IEEE/ACM International Conference on Software Engineering

  39. arXiv:2210.04623  [pdf, other

    cs.DC

    DeltaFS: Pursuing Zero Update Overhead via Metadata-Enabled Delta Compression for Log-structured File System on Mobile Devices

    Authors: Chao Wu, Cheng Ji, Geng Yuan, Riwei Pan, Weichao Guo, Chao Yu, Zongwei Zhu, Yanzhi Wang

    Abstract: Data compression has been widely adopted to release mobile devices from intensive write pressure. Delta compression is particularly promising for its high compression efficacy over conventional compression methods. However, this method suffers from non-trivial system overheads incurred by delta maintenance and read penalty, which prevents its applicability on mobile devices. To this end, this pape… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  40. arXiv:2210.00093  [pdf, other

    cs.DC

    Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning

    Authors: Pengfei Zheng, Rui Pan, Tarannum Khan, Shivaram Venkataraman, Aditya Akella

    Abstract: Dynamic adaptation has become an essential technique in accelerating distributed machine learning (ML) training. Recent studies have shown that dynamically adjusting model structure (e.g., lottery ticket hypothesis) or hyperparameters (e.g., batch size) can significantly accelerate training without sacrificing accuracy. However, existing ML cluster schedulers are not designed to handle dynamic ada… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted at the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI '23)

  41. arXiv:2201.09047  [pdf, other

    cs.GT cs.AI

    Online Auction-Based Incentive Mechanism Design for Horizontal Federated Learning with Budget Constraint

    Authors: **gwen Zhang, Yuezhou Wu, Rong Pan

    Abstract: Federated learning makes it possible for all parties with data isolation to train the model collaboratively and efficiently while satisfying privacy protection. To obtain a high-quality model, an incentive mechanism is necessary to motivate more high-quality workers with data and computing power. The existing incentive mechanisms are applied in offline scenarios, where the task publisher collects… ▽ More

    Submitted 17 May, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

  42. arXiv:2201.02410  [pdf, other

    cs.AI cs.LG cs.MA

    Auction-Based Ex-Post-Payment Incentive Mechanism Design for Horizontal Federated Learning with Reputation and Contribution Measurement

    Authors: **gwen Zhang, Yuezhou Wu, Rong Pan

    Abstract: Federated learning trains models across devices with distributed data, while protecting the privacy and obtaining a model similar to that of centralized ML. A large number of workers with data and computing power are the foundation of federal learning. However, the inevitable costs prevent self-interested workers from serving for free. Moreover, due to data isolation, task publishers lack effectiv… ▽ More

    Submitted 15 March, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  43. Manas: Mining Software Repositories to Assist AutoML

    Authors: Giang Nguyen, Md Johir Islam, Rangeet Pan, Hridesh Rajan

    Abstract: Today deep learning is widely used for building software. A software engineering problem with deep learning is that finding an appropriate convolutional neural network (CNN) model for the task can be a challenge for developers. Recent work on AutoML, more precisely neural architecture search (NAS), embodied by tools like Auto-Keras aims to solve this problem by essentially viewing it as a search p… ▽ More

    Submitted 13 February, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

  44. arXiv:2111.12965  [pdf, other

    cs.CR cs.CV

    Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

    Authors: Xiangyu Qi, Tinghao Xie, Ruizhe Pan, Jifeng Zhu, Yong Yang, Kai Bu

    Abstract: One major goal of the AI security community is to securely and reliably produce and deploy deep learning models for real-world applications. To this end, data poisoning based backdoor attacks on deep neural networks (DNNs) in the production stage (or training stage) and corresponding defenses are extensively explored in recent years. Ironically, backdoor attacks in the deployment stage, which can… ▽ More

    Submitted 26 May, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

  45. arXiv:2110.14109  [pdf, other

    cs.LG math.OC

    Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums

    Authors: Rui Pan, Haishan Ye, Tong Zhang

    Abstract: Learning rate schedulers have been widely adopted in training deep neural networks. Despite their practical importance, there is a discrepancy between its practice and its theoretical analysis. For instance, it is not known what schedules of SGD achieve best convergence, even for simple problems such as optimizing quadratic objectives. In this paper, we propose Eigencurve, the first family of lear… ▽ More

    Submitted 14 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Published at ICLR 2022

  46. arXiv:2110.07720  [pdf, other

    cs.CV

    Decomposing Convolutional Neural Networks into Reusable and Replaceable Modules

    Authors: Rangeet Pan, Hridesh Rajan

    Abstract: Training from scratch is the most common way to build a Convolutional Neural Network (CNN) based model. What if we can build new CNN models by reusing parts from previously build CNN models? What if we can improve a CNN model by replacing (possibly faulty) parts with other parts? In both cases, instead of training, can we identify the part responsible for each output class (module) in the model(s)… ▽ More

    Submitted 20 December, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted at ICSE'22

  47. arXiv:2107.08784  [pdf, other

    cs.LG stat.ML

    Boost-R: Gradient Boosted Trees for Recurrence Data

    Authors: Xiao Liu, Rong Pan

    Abstract: Recurrence data arise from multi-disciplinary domains spanning reliability, cyber security, healthcare, online retailing, etc. This paper investigates an additive-tree-based approach, known as Boost-R (Boosting for Recurrence Data), for recurrent event data with both static and dynamic features. Boost-R constructs an ensemble of gradient boosted additive trees to estimate the cumulative intensity… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  48. Test Case Selection and Prioritization Using Machine Learning: A Systematic Literature Review

    Authors: Rongqi Pan, Mojtaba Bagherzadeh, Taher A. Ghaleb, Lionel Briand

    Abstract: Regression testing is an essential activity to assure that software code changes do not adversely affect existing functionalities. With the wide adoption of Continuous Integration (CI) in software projects, which increases the frequency of running software builds, running all tests can be time-consuming and resource-intensive. To alleviate that problem, Test case Selection and Prioritization (TSP)… ▽ More

    Submitted 5 October, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Journal ref: Empirical Software Engineering (EMSE). (2021) 1-34

  49. An Adversarial Transfer Network for Knowledge Representation Learning

    Authors: Huijuan Wang, Shuangyin Li, Rong Pan

    Abstract: Knowledge representation learning has received a lot of attention in the past few years. The success of existing methods heavily relies on the quality of knowledge graphs. The entities with few triplets tend to be learned with less expressive power. Fortunately, there are many knowledge graphs constructed from various sources, the representations of which could contain much information. We propose… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: Accepted by TheWebConf 2021

  50. arXiv:2103.02004  [pdf, other

    cs.SE

    Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis

    Authors: Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu Lahiri, Mike Kaufman

    Abstract: Forking structure is widespread in the open-source repositories and that causes a significant number of merge conflicts. In this paper, we study the problem of textual merge conflicts from the perspective of Microsoft Edge, a large, highly collaborative fork off the main Chromium branch with significant merge conflicts. Broadly, this study is divided into two sections. First, we empirically evalua… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: Accepted at ICSE 2021