Skip to main content

Showing 1–17 of 17 results for author: Musuvathi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04693  [pdf, other

    cs.SE cs.AI cs.LG cs.PF

    LLM-Vectorizer: LLM-based Verified Loop Vectorizer

    Authors: Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, Shuvendu K. Lahiri

    Abstract: Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that de… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2310.09342  [pdf, other

    cs.PL cs.AI cs.CL cs.SE

    Ranking LLM-Generated Loop Invariants for Program Verification

    Authors: Saikat Chakraborty, Shuvendu K. Lahiri, Sarah Fakhoury, Madanlal Musuvathi, Akash Lal, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, Nikhil Swamy

    Abstract: Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an… ▽ More

    Submitted 12 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-findings 2023)

  3. arXiv:2305.13450  [pdf, other

    cs.DC

    A Framework for Fine-Grained Synchronization of Dependent GPU Kernels

    Authors: Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi

    Abstract: Machine Learning (ML) models execute several parallel computations including Generalized Matrix Multiplication, Convolution, Dropout, etc. These computations are commonly executed on Graphics Processing Units (GPUs), by dividing the computation into independent processing blocks, known as tiles. Since the number of tiles are usually higher than the execution units of a GPU, tiles are executed on a… ▽ More

    Submitted 14 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at CGO 2024

  4. arXiv:2304.03816  [pdf, other

    cs.SE cs.LG

    Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions

    Authors: Sarah Fakhoury, Saikat Chakraborty, Madan Musuvathi, Shuvendu K. Lahiri

    Abstract: Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the ability of LLMs to generate functionally correct code from natural language intent with respect to a set of hidden test cases. This has enabled the research comm… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  5. arXiv:2208.05950  [pdf, other

    cs.SE cs.LG cs.PL

    Interactive Code Generation via Test-Driven User-Intent Formalization

    Authors: Shuvendu K. Lahiri, Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, Madanlal Musuvathi, Piali Choudhury, Curtis von Veh, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao

    Abstract: Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, when interacting with LLMs, users have no guarantees that the code suggestions produced correctly satisfy the intent they provided. In fact, it is hard to define a notion of correctness since natural language can be ambig… ▽ More

    Submitted 3 October, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: 18 pages

  6. arXiv:2206.03865  [pdf, other

    cs.PL cs.AI cs.SE

    Fault-Aware Neural Code Rankers

    Authors: Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark Encarnación, Shuvendu K Lahiri, Madanlal Musuvathi, Jianfeng Gao

    Abstract: Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit t… ▽ More

    Submitted 9 December, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

    Comments: In the proceedings of Advances in Neural Information Processing Systems, 2022

  7. arXiv:2201.11840  [pdf, other

    cs.DC

    GC3: An Optimizing Compiler for GPU Collective Communication

    Authors: Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong

    Abstract: Machine learning models made up of millions or billions of parameters are trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications become a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can alleviate this bottlen… ▽ More

    Submitted 19 July, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  8. arXiv:2111.04867  [pdf, other

    cs.DC cs.LG

    TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

    Authors: Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

    Abstract: Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm d… ▽ More

    Submitted 5 October, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: Accepted at NSDI'23. Contains 20 pages, 11 figures, including Appendix

  9. arXiv:2105.05720  [pdf, other

    cs.DC cs.LG cs.PL

    Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

    Authors: Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi

    Abstract: Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the op… ▽ More

    Submitted 26 March, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

  10. arXiv:2011.10472  [pdf, ps, other

    cs.CV

    GenderRobustness: Robustness of Gender Detection in Facial Recognition Systems with variation in Image Properties

    Authors: Sharadha Srinivasan, Madan Musuvathi

    Abstract: In recent times, there have been increasing accusations on artificial intelligence systems and algorithms of computer vision of possessing implicit biases. Even though these conversations are more prevalent now and systems are improving by performing extensive testing and broadening their horizon, biases still do exist. One such class of systems where bias is said to exist is facial recognition sy… ▽ More

    Submitted 26 November, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

  11. Synthesizing Optimal Collective Algorithms

    Authors: Zixian Cai, Zhengyang Liu, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi

    Abstract: Collective communication algorithms are an important component of distributed computation. Indeed, in the case of deep-learning, collective communication is the Amdahl's bottleneck of data-parallel training. This paper introduces SCCL (for Synthesized Collective Communication Library), a systematic approach to synthesize collective communication algorithms that are explicitly tailored to a parti… ▽ More

    Submitted 4 January, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

    Comments: Both Zixian Cai and Zhengyang Liu contributed equally to the paper. The work was done during internships at Microsoft Research. To appear at PPoPP 2021

  12. arXiv:2006.02924  [pdf, other

    cs.DC cs.LG

    Scaling Distributed Training with Adaptive Summation

    Authors: Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum

    Abstract: Stochastic gradient descent (SGD) is an inherently sequential training algorithm--computing the gradient at batch $i$ depends on the model parameters learned from batch $i-1$. Prior approaches that break this dependence do not honor them (e.g., sum the gradients for each batch, which is not what sequential SGD would do) and thus potentially suffer from poor convergence. This paper introduces a nov… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  13. arXiv:1912.11951  [pdf, other

    cs.CR cs.LG cs.PL

    EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation

    Authors: Roshan Dathathri, Blagovesta Kostova, Olli Saarikivi, Wei Dai, Kim Laine, Madanlal Musuvathi

    Abstract: Fully-Homomorphic Encryption (FHE) offers powerful capabilities by enabling secure offloading of both storage and computation, and recent innovations in schemes and implementations have made it all the more attractive. At the same time, FHE is notoriously hard to use with a very constrained programming model, a very unusual performance profile, and many cryptographic constraints. Existing compiler… ▽ More

    Submitted 26 June, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

    ACM Class: D.3.3; D.3.4

    Journal ref: Programming Language Design and Implementation (PLDI 2020) 546-561

  14. arXiv:1909.03359  [pdf, other

    cs.LG cs.CL cs.DC stat.ML

    Distributed Training of Embeddings using Graph Analytics

    Authors: Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi

    Abstract: Many applications today, such as NLP, network analysis, and code analysis, rely on semantically embedding objects into low-dimensional fixed-length vectors. Such embeddings naturally provide a way to perform useful downstream tasks, such as identifying relations among objects or predicting objects for a given context, etc. Unfortunately, the training necessary for accurate embeddings is usually co… ▽ More

    Submitted 23 February, 2020; v1 submitted 7 September, 2019; originally announced September 2019.

  15. arXiv:1810.00845  [pdf, other

    cs.LG cs.CR cs.PL stat.ML

    CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

    Authors: Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

    Abstract: Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations to be applied directly on encrypted data without requiring a secret key. This enables novel application scenarios where a client can safely offload storage and computation to a third-party cloud provider without having to trust the software and the hardware vendors with the decryption keys. Recent adva… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: Submitted to ASPLOS2019

  16. arXiv:1705.08030  [pdf, other

    cs.LG stat.ML

    Parallel Stochastic Gradient Descent with Sound Combiners

    Authors: Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

    Abstract: Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from the previous examples. Prior approaches to parallelizing linear learners using SGD, such as HOGWILD! and ALLREDUCE, do not honor these dependencies across thread… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

    Comments: 16 pages, 4 figures

  17. arXiv:0811.0987  [pdf, ps, other

    cs.CC

    Modular difference logic is hard

    Authors: Nikolaj Bjørner, Andreas Blass, Yuri Gurevich, Madan Musuvathi

    Abstract: In connection with machine arithmetic, we are interested in systems of constraints of the form x + k \leq y + k'. Over integers, the satisfiability problem for such systems is polynomial time. The problem becomes NP complete if we restrict attention to the residues for a fixed modulus N.

    Submitted 6 November, 2008; originally announced November 2008.