Skip to main content

Showing 1–50 of 188 results for author: Mahoney, M W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19522  [pdf, other

    cs.LG

    Reliable edge machine learning hardware for scientific applications

    Authors: Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin

    Abstract: Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: IEEE VLSI Test Symposium 2024 (VTS)

    Report number: FERMILAB-CONF-24-0116-CSAID

  2. arXiv:2406.11151  [pdf, other

    cs.LG math.NA stat.ML

    Recent and Upcoming Developments in Randomized Numerical Linear Algebra for Machine Learning

    Authors: Michał Dereziński, Michael W. Mahoney

    Abstract: Large matrices arise in many machine learning and data analysis applications, including as representations of datasets, graphs, model weights, and first and second-order derivatives. Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity; but recent hardware trend… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2406.09997  [pdf, other

    cs.LG

    Towards Scalable and Versatile Weight Space Learning

    Authors: Konstantin Schürholt, Michael W. Mahoney, Damian Borth

    Abstract: Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models. However, previous work has either faced limitations when processing larger networks or was task-specific to either discriminative or generative tasks. This paper introduces the SANE approach to weight-space learning. SANE overcomes previous limitations… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024

  4. arXiv:2405.20516  [pdf, other

    cs.LG physics.geo-ph

    WaveCastNet: An AI-enabled Wavefield Forecasting Framework for Earthquake Early Warning

    Authors: Dongwei Lyu, Rie Nakata, Pu Ren, Michael W. Mahoney, Arben Pitarka, Nori Nakata, N. Benjamin Erichson

    Abstract: Large earthquakes can be destructive and quickly wreak havoc on a landscape. To mitigate immediate threats, early warning systems have been developed to alert residents, emergency responders, and critical infrastructure operators seconds to a minute before seismic waves arrive. These warnings provide time to take precautions and prevent damage. The success of these systems relies on fast, accurate… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  5. arXiv:2405.13975  [pdf, other

    cs.LG stat.ML

    There is HOPE to Avoid HiPPOs for Long-memory State Space Models

    Authors: Annan Yu, Michael W. Mahoney, N. Benjamin Erichson

    Abstract: State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. However, these models typically face several challenges: (i) they require specifically designed initializations of the system matrices to achieve state-of-the-art performance, (ii) they require training of state matrices on a logarithmic scale with very small le… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  6. arXiv:2403.15042  [pdf, other

    cs.CL

    LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

    Authors: Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipali, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Our code is available at https://github.com/SqueezeAILab/LLM2LLM

  7. arXiv:2403.14123  [pdf, other

    cs.LG cs.AR cs.DC

    AI and Memory Wall

    Authors: Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer

    Abstract: The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memory bandwidth. Over the past 20 years, peak server hardware FLOPS has been scaling at 3.0x/2yrs, outpacing the growth of DRAM and… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Published in IEEE Micro Journal

  8. arXiv:2403.10642  [pdf, other

    cs.LG math.NA

    Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs

    Authors: S. Chandra Mouli, Danielle C. Maddix, Shima Alizadeh, Gaurav Gupta, Andrew Stuart, Michael W. Mahoney, Yuyang Wang

    Abstract: Existing work in scientific machine learning (SciML) has shown that data-driven learning of solution operators can provide a fast approximate alternative to classical numerical partial differential equation (PDE) solvers. Of these, Neural Operators (NOs) have emerged as particularly promising. We observe that several uncertainty quantification (UQ) methods for NOs fail for test inputs that are eve… ▽ More

    Submitted 12 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: ICML 2024

  9. arXiv:2403.07815  [pdf, other

    cs.LG cs.AI

    Chronos: Learning the Language of Time Series

    Authors: Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang

    Abstract: We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M… ▽ More

    Submitted 2 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Code and model checkpoints available at https://github.com/amazon-science/chronos-forecasting

  10. arXiv:2402.15734  [pdf, other

    cs.LG stat.ML

    Data-Efficient Operator Learning via Unsupervised Pretraining and In-Context Learning

    Authors: Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, Michael W. Mahoney

    Abstract: Recent years have witnessed the promise of coupling machine learning methods and physical domainspecific insights for solving scientific problems based on partial differential equations (PDEs). However, being data-intensive, these methods still require a large amount of PDE data. This reintroduces the need for expensive numerical PDE solutions, partially undermining the original goal of avoiding t… ▽ More

    Submitted 13 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  11. arXiv:2401.18079  [pdf, other

    cs.LG

    KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

    Authors: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

    Abstract: LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurat… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  12. arXiv:2401.00122  [pdf, other

    stat.ML cs.LG

    SALSA: Sequential Approximate Leverage-Score Algorithm with Application in Analyzing Big Time Series Data

    Authors: Ali Eshragh, Luke Yerbury, Asef Nazari, Fred Roosta, Michael W. Mahoney

    Abstract: We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra (RandNLA) for large matrices. We demonstrate that, with high probability, the accuracy of SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage scores. In addition, we show that the theoretical computational complexity and numerical accu… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 42 pages, 7 figures

    MSC Class: 62M10

  13. arXiv:2312.17351  [pdf, other

    cs.SI

    Multi-scale Local Network Structure Critically Impacts Epidemic Spread and Interventions

    Authors: Omar Eldaghar, Michael W. Mahoney, David F. Gleich

    Abstract: Network epidemic simulation holds the promise of enabling fine-grained understanding of epidemic behavior, beyond that which is possible with coarse-grained compartmental models. Key inputs to these epidemic simulations are the networks themselves. However, empirical measurements and samples of realistic interaction networks typically display properties that are challenging to capture with popular… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  14. arXiv:2312.04511  [pdf, other

    cs.CL

    An LLM Compiler for Parallel Function Calling

    Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling oft… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  15. arXiv:2312.00359  [pdf, other

    cs.LG stat.ML

    Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

    Authors: Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang

    Abstract: Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely ad… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Spotlight, first two authors contributed equally

  16. arXiv:2311.13028  [pdf, other

    cs.LG cs.AI cs.DC eess.SP

    DMLR: Data-centric Machine Learning Research -- Past, Present and Future

    Authors: Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš , et al. (13 additional authors not shown)

    Abstract: Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods tow… ▽ More

    Submitted 1 June, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Published in the Journal of Data-centric Machine Learning Research (DMLR) at https://data.mlr.press/assets/pdf/v01-5.pdf

  17. arXiv:2311.07013  [pdf, ps, other

    stat.ML cs.LG

    A PAC-Bayesian Perspective on the Interpolating Information Criterion

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 9 pages

  18. arXiv:2310.05387  [pdf, other

    cs.LG stat.ML

    Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

    Authors: Da Long, Wei W. Xing, Aditi S. Krishnapriyan, Robert M. Kirby, Shandian Zhe, Michael W. Mahoney

    Abstract: Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity and noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equa… ▽ More

    Submitted 21 April, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  19. arXiv:2310.02926  [pdf, other

    cs.DC

    Extensions to the SENSEI In situ Framework for Heterogeneous Architectures

    Authors: Burlen Loring, E. Wes Bethel, Gunther H. Weber, Michael W. Mahoney

    Abstract: The proliferation of GPUs and accelerators in recent supercomputing systems, so called heterogeneous architectures, has led to increased complexity in execution environments and programming models as well as to deeper memory hierarchies on these systems. In this work, we discuss challenges that arise in in situ code coupling on these heterogeneous architectures. In particular, we present data and… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: To appear in: ISAV 2023: In Situ Infrastructures for Enabling Extreme-scale Analysis and Visualization, November 13 2023

    ACM Class: I.6.6; E.1

  20. arXiv:2310.02619  [pdf, other

    cs.LG

    Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs

    Authors: Ilan Naiman, N. Benjamin Erichson, Pu Ren, Michael W. Mahoney, Omri Azencot

    Abstract: Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to the these issues, they are (surprisingly) less considered for tim… ▽ More

    Submitted 13 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted to The Twelfth International Conference on Learning Representations, ICLR 2024

  21. arXiv:2310.01698  [pdf, other

    cs.LG stat.ML

    Robustifying State-space Models for Long Sequences via Approximate Diagonalization

    Authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson

    Abstract: State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have c… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  22. arXiv:2308.15720  [pdf, other

    cs.LG cs.AI

    Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

    Authors: Younghyun Cho, James W. Demmel, Michał Dereziński, Haoyun Li, Hengrui Luo, Michael W. Mahoney, Riley J. Murray

    Abstract: Algorithms from Randomized Numerical Linear Algebra (RandNLA) are known to be effective in handling high-dimensional computational problems, providing high-quality empirical performance as well as strong probabilistic guarantees. However, their practical application is complicated by the fact that the user needs to set various algorithm-specific tuning parameters which are different than those use… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    MSC Class: 68W20; 65F20; 65Y20

  23. arXiv:2307.09797  [pdf, other

    cs.LG cs.AI

    Probabilistic Forecasting with Coherent Aggregation

    Authors: Geoffrey Négiar, Ruijun Ma, O. Nangba Meetei, Mengfei Cao, Michael W. Mahoney

    Abstract: Obtaining accurate probabilistic forecasts while respecting hierarchical information is an important operational challenge in many applications, perhaps most obviously in energy management, supply chain planning, and resource allocation. The basic challenge, especially for multivariate forecasting, is that forecasts are often required to be coherent with respect to the hierarchical structure. In t… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  24. arXiv:2307.07785  [pdf, other

    stat.ML cs.LG

    The Interpolating Information Criterion for Overparameterized Models

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized mod… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: 23 pages, 2 figures

  25. arXiv:2307.03595  [pdf, other

    cs.LG cs.AI

    GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series Forecasting

    Authors: Sitan Yang, Malcolm Wolff, Shankar Ramasubramanian, Vincent Quenneville-Belair, Ronak Metha, Michael W. Mahoney

    Abstract: Encoder-decoder deep neural networks have been increasingly studied for multi-horizon time series forecasting, especially in real-world applications. However, to forecast accurately, these sophisticated models typically rely on a large number of time series examples with substantial history. A rapidly growing topic of interest is forecasting time series which lack sufficient historical data -- oft… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  26. arXiv:2306.14070  [pdf, other

    cs.CV eess.IV physics.comp-ph

    SuperBench: A Super-Resolution Benchmark Dataset for Scientific Machine Learning

    Authors: Pu Ren, N. Benjamin Erichson, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney

    Abstract: Super-Resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation. There is growing interest in applying SR methods to complex spatiotemporal systems within the Scientific Machine Learning (SciML) community, with the hope of accelerating numerical simulations and/or improving forecasts… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  27. arXiv:2306.09262  [pdf, other

    stat.ML cs.LG cs.PL

    A Heavy-Tailed Algebra for Probabilistic Programming

    Authors: Feynman Liang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approac… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: 21 pages, 6 figures

  28. arXiv:2306.07629  [pdf, other

    cs.CL cs.LG

    SqueezeLLM: Dense-and-Sparse Quantization

    Authors: Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer

    Abstract: Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models.… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: ICML 2024

  29. arXiv:2305.18383  [pdf, other

    stat.ML cs.LG

    A Three-regime Model of Network Pruning

    Authors: Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney

    Abstract: Recent work has highlighted the complex influence training hyperparameters, e.g., the number of training epochs, can have on the prunability of machine learning models. Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistica… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ICML 2023

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:42790-42809, 2023

  30. arXiv:2305.18379  [pdf, other

    math.OC cs.LG math.NA stat.ML

    Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching

    Authors: Ilgee Hong, Sen Na, Michael W. Mahoney, Mladen Kolar

    Abstract: We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: 25 pages, 4 figures

    Journal ref: ICML 2023

  31. arXiv:2305.12313  [pdf, other

    stat.ML cs.LG

    When are ensembles really effective?

    Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new res… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  32. arXiv:2304.06745  [pdf, other

    cs.LG cs.AR hep-ex physics.ins-det

    End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

    Authors: Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney, Jovan Mitrevski, Nhan Tran

    Abstract: We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpi… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 19 pages, 6 figures, 2 tables

    Report number: FERMILAB-PUB-23-150-CSAID-ETD

  33. arXiv:2302.14017  [pdf, other

    cs.CL cs.LG

    Full Stack Optimization of Transformer Inference: a Survey

    Authors: Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qi**g Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

    Abstract: Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Journal ref: Presented in Workshop on Architecture and System Support for Transformer Models (ASSYST) at ISCA 2023

  34. arXiv:2302.11474  [pdf, other

    math.NA cs.MS math.OC

    Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software

    Authors: Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E. Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra

    Abstract: Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more ef… ▽ More

    Submitted 12 April, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: v1: this is the first arXiv release of LAPACK Working Note 299. v2: complete rewrite of the subsection on trace estimation, among other changes. See frontmatter page ii (pdf page 5) for revision history

  35. arXiv:2302.11002  [pdf, other

    cs.LG math.AP math.NA

    Learning Physical Models that Can Respect Conservation Laws

    Authors: Derek Hansen, Danielle C. Maddix, Shima Alizadeh, Gaurav Gupta, Michael W. Mahoney

    Abstract: Recent work in scientific machine learning (SciML) has focused on incorporating partial differential equation (PDE) information into the learning process. Much of this work has focused on relatively "easy" PDE operators (e.g., elliptic and parabolic), with less emphasis on relatively "hard" PDE operators (e.g., hyperbolic). Within numerical PDEs, the latter problem class requires control of a type… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: ICML 2023, Physica D: Nonlinear Phenomena, Accepted

    Journal ref: Physica D: Nonlinear Phenomena, 457 (2024) 133952

  36. arXiv:2302.07863  [pdf, other

    cs.CL

    Speculative Decoding with Big Little Decoder

    Authors: Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

    Abstract: The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and makes them prohibitively expensive for various real-time applications. The inference latency is further exacerbated by autoregressive generative tasks,… ▽ More

    Submitted 12 October, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  37. arXiv:2212.00228  [pdf, other

    cs.LG cs.NE stat.ML

    Gated Recurrent Neural Networks with Weighted Time-Delay Feedback

    Authors: N. Benjamin Erichson, Soon Hoe Lim, Michael W. Mahoney

    Abstract: We introduce a novel gated recurrent unit (GRU) with a weighted time-delay feedback mechanism in order to improve the modeling of long-term dependencies in sequential data. This model is a discretized version of a continuous-time formulation of a recurrent unit, where the dynamics are governed by delay differential equations (DDEs). By considering a suitable time-discretization scheme, we propose… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  38. arXiv:2210.07612  [pdf, other

    stat.ML cs.LG

    Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input d… ▽ More

    Submitted 25 July, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 33 pages, 21 figures

  39. arXiv:2210.00513  [pdf, other

    cs.LG stat.ML

    Gradient Gating for Deep Multi-Rate Learning on Graphs

    Authors: T. Konstantin Rusch, Benjamin P. Chamberlain, Michael W. Mahoney, Michael M. Bronstein, Siddhartha Mishra

    Abstract: We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). Our framework is based on gating the output of GNN layers with a mechanism for multi-rate flow of message passing information across nodes of the underlying graph. Local gradients are harnessed to further modulate message passing updates. Our framework flexibly allows one to use any… ▽ More

    Submitted 15 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

  40. arXiv:2207.08675  [pdf, other

    cs.LG

    Learning differentiable solvers for systems with hard constraints

    Authors: Geoffrey Négiar, Michael W. Mahoney, Aditi S. Krishnapriyan

    Abstract: We introduce a practical method to enforce partial differential equation (PDE) constraints for functions defined by neural networks (NNs), with a high degree of accuracy and up to a desired tolerance. We develop a differentiable PDE-constrained layer that can be incorporated into any NN architecture. Our method leverages differentiable optimization and the implicit function theorem to effectively… ▽ More

    Submitted 18 April, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: Paper accepted to the 11th International Conference on Learning Representations (ICLR 2023). 9 pages + references + appendix. 5 figures in main text

  41. arXiv:2207.04084  [pdf, other

    cs.LG physics.comp-ph

    Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

    Authors: Shashank Subramanian, Robert M. Kirby, Michael W. Mahoney, Amir Gholami

    Abstract: Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adaptin… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 15 pages

  42. arXiv:2206.10341  [pdf, other

    cs.CR cs.AI cs.LG

    Neurotoxin: Durable Backdoors in Federated Learning

    Authors: Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal

    Abstract: Due to their decentralized nature, federated learning (FL) systems have an inherent vulnerability during their training to adversarial backdoor attacks. In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs. (As a simple toy exam… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

    Comments: Appears in ICML 2022

  43. arXiv:2206.00888  [pdf, other

    eess.AS cs.CL cs.SD

    Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

    Authors: Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

    Abstract: The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture's design choices are not optimal. After re-examining the design choices for both the macro and mi… ▽ More

    Submitted 15 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  44. arXiv:2205.13687  [pdf, other

    math.OC cs.LG stat.ML

    Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

    Authors: Sen Na, Michael W. Mahoney

    Abstract: We consider online statistical inference of constrained stochastic nonlinear optimization problems. We apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these problems, which can be regarded as applying second-order Newton's method to the Karush-Kuhn-Tucker (KKT) conditions. In each iteration, the StoSQP method computes the Newton direction by solving a quadratic progr… ▽ More

    Submitted 13 April, 2024; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 59 pages, 3 figures, 11 tables

  45. arXiv:2205.07918  [pdf, other

    stat.ML cs.LG

    Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows

    Authors: Feynman Liang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to non-Lipschitz polynomial… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  46. arXiv:2205.07147  [pdf

    cs.DC

    The Sky Above The Clouds

    Authors: Sarah Chasins, Alvin Cheung, Natacha Crooks, Ali Ghodsi, Ken Goldberg, Joseph E. Gonzalez, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, Michael W. Mahoney, Aditya Parameswaran, David Patterson, Raluca Ada Popa, Koushik Sen, Scott Shenker, Dawn Song, Ion Stoica

    Abstract: Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen ye… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 35 pages

  47. arXiv:2204.09656  [pdf, other

    cs.CL cs.LG

    A Fast Post-Training Pruning Framework for Transformers

    Authors: Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami

    Abstract: Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any… ▽ More

    Submitted 17 October, 2022; v1 submitted 29 March, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  48. arXiv:2204.09266  [pdf, other

    math.OC cs.LG stat.ML

    Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

    Authors: Sen Na, Michał Dereziński, Michael W. Mahoney

    Abstract: We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear converge… ▽ More

    Submitted 28 November, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: 43 pages, 16 figures

  49. arXiv:2202.13718  [pdf, other

    cs.LG cs.CY

    Fast Feature Selection with Fairness Constraints

    Authors: Francesco Quinzan, Rajiv Khanna, Moshik Hershcovitch, Sarel Cohen, Daniel G. Waddington, Tobias Friedrich, Michael W. Mahoney

    Abstract: We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-… ▽ More

    Submitted 3 February, 2023; v1 submitted 28 February, 2022; originally announced February 2022.

  50. arXiv:2202.08494  [pdf, other

    cs.LG

    Learning continuous models for continuous physics

    Authors: Aditi S. Krishnapriyan, Alejandro F. Queiruga, N. Benjamin Erichson, Michael W. Mahoney

    Abstract: Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models… ▽ More

    Submitted 21 November, 2023; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: 39 pages