Skip to main content

Showing 1–9 of 9 results for author: Phothilimthana, P M

.
  1. arXiv:2401.14021  [pdf, other

    cs.LG cs.CL cs.IR

    Accelerating Retrieval-Augmented Language Model Serving with Speculation

    Authors: Zhihao Zhang, Alan Zhu, Lijie Yang, Yihua Xu, Lanting Li, Phitchaya Mangpo Phothilimthana, Zhihao Jia

    Abstract: Retrieval-augmented language models (RaLM) have demonstrated the potential to solve knowledge-intensive natural language processing (NLP) tasks by combining a non-parametric knowledge base with a parametric language model. Instead of fine-tuning a fully parametric model, RaLM excels at its low-cost adaptation to the latest data and better source attribution mechanisms. Among various RaLM approache… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Preprint

  2. arXiv:2308.13490  [pdf, other

    cs.LG cs.AR cs.SI

    TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

    Authors: Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Kaidi Cao, Bahare Fatemi, Mike Burrows, Charith Mendis, Bryan Perozzi

    Abstract: Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there ex… ▽ More

    Submitted 5 December, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  3. arXiv:2305.12322  [pdf, other

    cs.LG cs.SI

    Learning Large Graph Property Prediction via Graph Segment Training

    Authors: Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi

    Abstract: Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first di… ▽ More

    Submitted 5 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  4. arXiv:2305.07440  [pdf, other

    cs.PF cs.AI cs.LG

    Optimizing Memory Map** Using Deep Reinforcement Learning

    Authors: Pengming Wang, Mikita Sazanovich, Berkin Ilbeyi, Phitchaya Mangpo Phothilimthana, Manish Purohit, Han Yang Tay, Ngân Vũ, Miaosen Wang, Cosmin Paduraru, Edouard Leurent, Anton Zhernov, Po-Sen Huang, Julian Schrittwieser, Thomas Hubert, Robert Tung, Paula Kurylowicz, Kieran Milan, Oriol Vinyals, Daniel J. Mankowitz

    Abstract: Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, n… ▽ More

    Submitted 17 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  5. arXiv:2210.03894  [pdf, other

    cs.LG cs.AR cs.PF

    GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation

    Authors: Ondrej Sykora, Phitchaya Mangpo Phothilimthana, Charith Mendis, Amir Yazdanbakhsh

    Abstract: Analytical hardware performance models yield swift estimation of desired hardware performance metrics. However, develo** these analytical models for modern processors with sophisticated microarchitectures is an extremely laborious task and requires a firm understanding of target microarchitecture's internal structure. In this paper, we introduce GRANITE, a new machine learning model that estimat… ▽ More

    Submitted 10 October, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: 13 pages; 5 figures; published at IISWC 2022; Included IEEE copyright;

  6. arXiv:2205.03960  [pdf, other

    cs.LG cs.PL

    Neural Architecture Search using Property Guided Synthesis

    Authors: Charles **, Phitchaya Mangpo Phothilimthana, Sudip Roy

    Abstract: In the past few years, neural architecture search (NAS) has become an increasingly important tool within the deep learning community. Despite the many recent successes of NAS, however, most existing approaches operate within highly structured design spaces, and hence explore only a small fraction of the full search space of neural architectures while also requiring significant manual effort from d… ▽ More

    Submitted 10 November, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: Our code is available at https://github.com/google-research/google-research/tree/master/abstract_nas

    Journal ref: Proc. ACM Program. Lang., Vol. 6, No. OOPSLA2, Article 166. Publication date: October 2022

  7. arXiv:2112.04041  [pdf, other

    cs.LG cs.AR

    A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules

    Authors: Xinfeng Xie, Prakash Prabhu, Ulysse Beaugnon, Phitchaya Mangpo Phothilimthana, Sudip Roy, Azalia Mirhoseini, Eugene Brevdo, James Laudon, Yanqi Zhou

    Abstract: Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning (ML) accelerators while delivering performance and energy efficiency on par with a monolithic large chip. However, ML compilers targeting MCMs need to solve complex optimization problems optimally and efficiently to achieve this high performance. One such problem is the multi-chip partitioning problem where compil… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

  8. arXiv:2010.12438  [pdf, other

    cs.LG cs.DC

    Transferable Graph Optimizers for ML Compilers

    Authors: Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Mangpo Phothilimthana, Shen Wang, Anna Goldie, Azalia Mirhoseini, James Laudon

    Abstract: Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learnin… ▽ More

    Submitted 19 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:1910.01578

    Journal ref: NeurIPS 2020

  9. arXiv:2008.01040  [pdf, other

    cs.PF cs.LG

    A Learned Performance Model for Tensor Processing Units

    Authors: Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, Mike Burrows

    Abstract: Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for a specific program. However, they are difficult to develop because contemporary processors are complex, and the recent proliferation of deep learning accelerat… ▽ More

    Submitted 18 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: A version will appear in the Proceedings of the 4th MLSys Conference, San Jose, CA, USA, 2021