Skip to main content

Showing 1–14 of 14 results for author: Saarikivi, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2311.15269  [pdf, other

    cs.DC cs.AI

    Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

    Authors: Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

    Abstract: Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcraft… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: The paper is accepted by HPCA 2024

  3. arXiv:2310.02393  [pdf, ps, other

    cs.FL cs.DS

    Symbolic Automata: $ω$-Regularity Modulo Theories

    Authors: Margus Veanes, Thomas Ball, Gabriel Ebner, Olli Saarikivi

    Abstract: Symbolic automata are finite state automata that support potentially infinite alphabets, such as the set of rational numbers, generally applied to regular expressions/languages over finite words. In symbolic automata (or automata modulo theories), an alphabet is represented by an effective Boolean algebra, supported by a decision procedure for satisfiability. Regular languages over infinite words… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  4. arXiv:2306.11644  [pdf, other

    cs.CL cs.AI cs.LG

    Textbooks Are All You Need

    Authors: Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

    Abstract: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 26 pages; changed color scheme of plot. fixed minor typos and added couple clarifications

  5. arXiv:2305.13450  [pdf, other

    cs.DC

    A Framework for Fine-Grained Synchronization of Dependent GPU Kernels

    Authors: Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi

    Abstract: Machine Learning (ML) models execute several parallel computations including Generalized Matrix Multiplication, Convolution, Dropout, etc. These computations are commonly executed on Graphics Processing Units (GPUs), by dividing the computation into independent processing blocks, known as tiles. Since the number of tiles are usually higher than the execution units of a GPU, tiles are executed on a… ▽ More

    Submitted 14 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at CGO 2024

  6. arXiv:2201.11840  [pdf, other

    cs.DC

    GC3: An Optimizing Compiler for GPU Collective Communication

    Authors: Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong

    Abstract: Machine learning models made up of millions or billions of parameters are trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications become a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can alleviate this bottlen… ▽ More

    Submitted 19 July, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  7. arXiv:2111.04867  [pdf, other

    cs.DC cs.LG

    TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

    Authors: Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

    Abstract: Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm d… ▽ More

    Submitted 5 October, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: Accepted at NSDI'23. Contains 20 pages, 11 figures, including Appendix

  8. arXiv:2105.05720  [pdf, other

    cs.DC cs.LG cs.PL

    Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

    Authors: Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi

    Abstract: Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, current logical separation between computation and communication kernels in deep learning frameworks misses the op… ▽ More

    Submitted 26 March, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

  9. Synthesizing Optimal Collective Algorithms

    Authors: Zixian Cai, Zhengyang Liu, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi

    Abstract: Collective communication algorithms are an important component of distributed computation. Indeed, in the case of deep-learning, collective communication is the Amdahl's bottleneck of data-parallel training. This paper introduces SCCL (for Synthesized Collective Communication Library), a systematic approach to synthesize collective communication algorithms that are explicitly tailored to a parti… ▽ More

    Submitted 4 January, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

    Comments: Both Zixian Cai and Zhengyang Liu contributed equally to the paper. The work was done during internships at Microsoft Research. To appear at PPoPP 2021

  10. arXiv:2006.02924  [pdf, other

    cs.DC cs.LG

    Scaling Distributed Training with Adaptive Summation

    Authors: Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum

    Abstract: Stochastic gradient descent (SGD) is an inherently sequential training algorithm--computing the gradient at batch $i$ depends on the model parameters learned from batch $i-1$. Prior approaches that break this dependence do not honor them (e.g., sum the gradients for each batch, which is not what sequential SGD would do) and thus potentially suffer from poor convergence. This paper introduces a nov… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  11. arXiv:1912.11951  [pdf, other

    cs.CR cs.LG cs.PL

    EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation

    Authors: Roshan Dathathri, Blagovesta Kostova, Olli Saarikivi, Wei Dai, Kim Laine, Madanlal Musuvathi

    Abstract: Fully-Homomorphic Encryption (FHE) offers powerful capabilities by enabling secure offloading of both storage and computation, and recent innovations in schemes and implementations have made it all the more attractive. At the same time, FHE is notoriously hard to use with a very constrained programming model, a very unusual performance profile, and many cryptographic constraints. Existing compiler… ▽ More

    Submitted 26 June, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

    ACM Class: D.3.3; D.3.4

    Journal ref: Programming Language Design and Implementation (PLDI 2020) 546-561

  12. arXiv:1910.01996  [pdf, ps, other

    cs.FL cs.LO

    Succinct Determinisation of Counting Automata via Sphere Construction (Technical Report)

    Authors: Lukáš Holík, Ondřej Lengál, Olli Saarikivi, Lenka Turoňová, Margus Veanes, Tomáš Vojnar

    Abstract: We propose an efficient algorithm for determinising counting automata (CAs), i.e., finite automata extended with bounded counters. The algorithm avoids unfolding counters into control states, unlike the naïve approach, and thus produces much smaller deterministic automata. We also develop a simplified and faster version of the general algorithm for the sub-class of so-called monadic CAs (MCAs), i.… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: An extended version of a paper accepted at APLAS'19

  13. arXiv:1909.03359  [pdf, other

    cs.LG cs.CL cs.DC stat.ML

    Distributed Training of Embeddings using Graph Analytics

    Authors: Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi

    Abstract: Many applications today, such as NLP, network analysis, and code analysis, rely on semantically embedding objects into low-dimensional fixed-length vectors. Such embeddings naturally provide a way to perform useful downstream tasks, such as identifying relations among objects or predicting objects for a given context, etc. Unfortunately, the training necessary for accurate embeddings is usually co… ▽ More

    Submitted 23 February, 2020; v1 submitted 7 September, 2019; originally announced September 2019.

  14. arXiv:1810.00845  [pdf, other

    cs.LG cs.CR cs.PL stat.ML

    CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

    Authors: Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

    Abstract: Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations to be applied directly on encrypted data without requiring a secret key. This enables novel application scenarios where a client can safely offload storage and computation to a third-party cloud provider without having to trust the software and the hardware vendors with the decryption keys. Recent adva… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: Submitted to ASPLOS2019