Skip to main content

Showing 1–11 of 11 results for author: Ruhle, V

.
  1. arXiv:2405.10480  [pdf, other

    cs.AR cs.LG

    Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

    Authors: Rya Sanovar, Srikant Bharadwaj, Renee St. Amant, Victor Rühle, Saravan Rajmohan

    Abstract: Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching billions of parameters. These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators, such as GPUs. Specifica… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 13 pages, 10 figures

    ACM Class: I.2.7; C.1.4

  2. arXiv:2404.14618  [pdf, other

    cs.LG cs.AI cs.CL

    Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

    Authors: Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Ruhle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah

    Abstract: Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e.g., edge) devices, tend to lag behind in terms of response quality. Therefore in this work we propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality. Our ap… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to ICLR 2024 (main conference)

  3. arXiv:2403.12968  [pdf, other

    cs.CL cs.LG

    LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

    Authors: Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang

    Abstract: This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. Considering the redundancy in natural language, existing approaches compress prompts by removing tokens or lexical units according to their information entropy obtained from a causal language model such as LLaMa-7B. The challenge is that information entropy may be a suboptimal compression metric: (i)… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  4. arXiv:2401.07033  [pdf, other

    cs.HC

    Risk-aware Adaptive Virtual CPU Oversubscription in Microsoft Cloud via Prototypical Human-in-the-loop Imitation Learning

    Authors: Lu Wang, Mayukh Das, Fangkai Yang, Junjie Sheng, Bo Qiao, Hang Dong, Si Qin, Victor Rühle, Chetan Bansal, Eli Cortez, Íñigo Goiri, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

    Abstract: Oversubscription is a prevalent practice in cloud services where the system offers more virtual resources, such as virtual cores in virtual machines, to users or applications than its available physical capacity for reducing revenue loss due to unused/redundant capacity. While oversubscription can potentially lead to significant enhancement in efficient resource utilization, the caveat is that it… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures

  5. arXiv:2311.17937  [pdf, other

    cs.CV

    Unlocking Spatial Comprehension in Text-to-Image Diffusion Models

    Authors: Mohammad Mahdi Derakhshani, Menglin Xia, Harkirat Behl, Cees G. M. Snoek, Victor Rühle

    Abstract: We propose CompFuser, an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models. Our pipeline enables the interpretation of instructions defining spatial relationships between objects in a scene, such as `An image of a gray cat on the left of an orange dog', and generate corresponding images. This is especially important in order t… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  6. arXiv:2311.15792  [pdf, other

    cs.LG cs.CR

    Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

    Authors: Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle

    Abstract: Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  7. arXiv:2308.04215  [pdf, other

    cs.CL cs.AI cs.DC

    Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance

    Authors: Menglin Xia, Xuchao Zhang, Camille Couturier, Guoqing Zheng, Saravan Rajmohan, Victor Ruhle

    Abstract: Retrieval augmentation enhances performance of traditional language models by incorporating additional context. However, the computational demands for retrieval augmented large language models (LLMs) pose a challenge when applying them to real-time tasks, such as composition assistance. To address this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework, a novel… ▽ More

    Submitted 5 February, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

  8. arXiv:2206.05199  [pdf, other

    cs.LG cs.CR

    Bayesian Estimation of Differential Privacy

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones

    Abstract: Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the p… ▽ More

    Submitted 15 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 17 pages, 8 figures. Joint main authors: Santiago Zanella-Béguelin, Lukas Wutschitz, and Shruti Tople

  9. arXiv:2103.07567  [pdf, other

    cs.LG cs.CL cs.CR

    Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

    Authors: Fatemehsadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Rühle, Taylor Berg-Kirkpatrick, Robert Sim

    Abstract: Neural language models are known to have a high capacity for memorization of training samples. This may have serious privacy implications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice to train models with privacy guarantees, comes with significant costs in terms of utility degradation and disparate impact on subgroups of users. In th… ▽ More

    Submitted 15 April, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

    Comments: NAACL-HLT 2021 Paper

  10. arXiv:2101.05405  [pdf, other

    cs.CR cs.CL cs.LG

    Training Data Leakage Analysis in Language Models

    Authors: Huseyin A. Inan, Osman Ramadan, Lukas Wutschitz, Daniel Jones, Victor Rühle, James Withers, Robert Sim

    Abstract: Recent advances in neural network based language models lead to successful deployments of such models, improving user experience in various applications. It has been demonstrated that strong performance of language models comes along with the ability to memorize rare training samples, which poses serious privacy threats in case the model is trained on confidential user content. In this work, we in… ▽ More

    Submitted 22 February, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

  11. arXiv:1912.07942  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Analyzing Information Leakage of Updates to Natural Language Models

    Authors: Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Victor Rühle, Andrew Paverd, Olga Ohrimenko, Boris Köpf, Marc Brockschmidt

    Abstract: To continuously improve quality and reflect changes in data, machine learning applications have to regularly retrain and update their core models. We show that a differential analysis of language model snapshots before and after an update can reveal a surprising amount of detailed information about changes in the training data. We propose two new metrics---\emph{differential score} and \emph{diffe… ▽ More

    Submitted 5 August, 2021; v1 submitted 17 December, 2019; originally announced December 2019.