Skip to main content

Showing 1–50 of 65 results for author: Gholami, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19522  [pdf, other

    cs.LG

    Reliable edge machine learning hardware for scientific applications

    Authors: Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin

    Abstract: Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing. This leads to unique validation challenges for VLSI implementations of ML algorithms: enabling bit-accurate functional simulations for performance validation in experimental software frameworks, verifying those ML models are robust under extreme quantization and pruning, and enabling… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: IEEE VLSI Test Symposium 2024 (VTS)

    Report number: FERMILAB-CONF-24-0116-CSAID

  2. arXiv:2403.15042  [pdf, other

    cs.CL

    LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

    Authors: Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipali, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Our code is available at https://github.com/SqueezeAILab/LLM2LLM

  3. arXiv:2403.14123  [pdf, other

    cs.LG cs.AR cs.DC

    AI and Memory Wall

    Authors: Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer

    Abstract: The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memory bandwidth. Over the past 20 years, peak server hardware FLOPS has been scaling at 3.0x/2yrs, outpacing the growth of DRAM and… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Published in IEEE Micro Journal

  4. arXiv:2401.18079  [pdf, other

    cs.LG

    KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

    Authors: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

    Abstract: LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurat… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  5. arXiv:2312.04511  [pdf, other

    cs.CL

    An LLM Compiler for Parallel Function Calling

    Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling oft… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  6. arXiv:2310.12072  [pdf, other

    cs.CL

    SPEED: Speculative Pipelined Execution for Efficient Decoding

    Authors: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

    Abstract: Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios has been highly restricted due to the significant inference latency associated with these models. This is particularly pronounced due to the autoregressive nat… ▽ More

    Submitted 2 January, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS Workshop on Efficient Natural Language and Speech Processing (2023)

  7. arXiv:2306.07629  [pdf, other

    cs.CL cs.LG

    SqueezeLLM: Dense-and-Sparse Quantization

    Authors: Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer

    Abstract: Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models.… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: ICML 2024

  8. arXiv:2306.00258  [pdf, other

    cs.LG math.NA

    Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

    Authors: Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael Mahoney, Amir Gholami

    Abstract: Pre-trained machine learning (ML) models have shown great performance for a wide range of applications, in particular in natural language processing (NLP) and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 16 pages, 11 figures

    Journal ref: NeurIPS 2023

  9. arXiv:2305.05332  [pdf, ps, other

    cs.IT

    On Multi-Message Private Computation

    Authors: Ali Gholami, Kai Wan, Tayyebeh Jahani-Nezhad, Hua Sun, Mingyue Ji, Giuseppe Caire

    Abstract: In a typical formulation of the private information retrieval (PIR) problem, a single user wishes to retrieve one out of K files from N servers without revealing the demanded file index to any server. This paper formulates an extended model of PIR, referred to as multi-message private computation (MM-PC), where instead of retrieving a single file, the user wishes to retrieve P > 1 linear combinati… ▽ More

    Submitted 22 February, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

  10. arXiv:2304.06745  [pdf, other

    cs.LG cs.AR hep-ex physics.ins-det

    End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

    Authors: Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney, Jovan Mitrevski, Nhan Tran

    Abstract: We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpi… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 19 pages, 6 figures, 2 tables

    Report number: FERMILAB-PUB-23-150-CSAID-ETD

  11. arXiv:2302.14017  [pdf, other

    cs.CL cs.LG

    Full Stack Optimization of Transformer Inference: a Survey

    Authors: Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qi**g Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

    Abstract: Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Journal ref: Presented in Workshop on Architecture and System Support for Transformer Models (ASSYST) at ISCA 2023

  12. arXiv:2302.07863  [pdf, other

    cs.CL

    Speculative Decoding with Big Little Decoder

    Authors: Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

    Abstract: The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and makes them prohibitively expensive for various real-time applications. The inference latency is further exacerbated by autoregressive generative tasks,… ▽ More

    Submitted 12 October, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  13. arXiv:2211.09120  [pdf, other

    cs.CV cs.AI

    AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders

    Authors: Wele Gedara Chaminda Bandara, Naman Patel, Ali Gholami, Mehdi Nikkhah, Motilal Agrawal, Vishal M. Patel

    Abstract: Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame-based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive ma… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Code available at: https://github.com/wgcban/adamae

  14. arXiv:2210.00055  [pdf, other

    cs.LG cs.CV

    MaskTune: Mitigating Spurious Correlations by Forcing to Explore

    Authors: Saeid Asgari Taghanaki, Aliasghar Khani, Fereshte Khani, Ali Gholami, Linh Tran, Ali Mahdavi-Amiri, Ghassan Hamarneh

    Abstract: A fundamental challenge of over-parameterized deep learning models is learning meaningful data representations that yield good performance on a downstream task without over-fitting spurious input features. This work proposes MaskTune, a masking strategy that prevents over-reliance on spurious (or a limited number of) features. MaskTune forces the trained model to explore new features during a sing… ▽ More

    Submitted 8 October, 2022; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  15. arXiv:2207.04084  [pdf, other

    cs.LG physics.comp-ph

    Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

    Authors: Shashank Subramanian, Robert M. Kirby, Michael W. Mahoney, Amir Gholami

    Abstract: Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adaptin… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 15 pages

  16. arXiv:2207.01548  [pdf, other

    cs.LG cs.CV

    Counterbalancing Teacher: Regularizing Batch Normalized Models for Robustness

    Authors: Saeid Asgari Taghanaki, Ali Gholami, Fereshte Khani, Kristy Choi, Linh Tran, Ran Zhang, Aliasghar Khani

    Abstract: Batch normalization (BN) is a ubiquitous technique for training deep neural networks that accelerates their convergence to reach higher accuracy. However, we demonstrate that BN comes with a fundamental drawback: it incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data, hurting generalization performance on out-of-domain examples. In this… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  17. arXiv:2206.00888  [pdf, other

    eess.AS cs.CL cs.SD

    Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

    Authors: Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

    Abstract: The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture's design choices are not optimal. After re-examining the design choices for both the macro and mi… ▽ More

    Submitted 15 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  18. arXiv:2204.09656  [pdf, other

    cs.CL cs.LG

    A Fast Post-Training Pruning Framework for Transformers

    Authors: Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami

    Abstract: Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any… ▽ More

    Submitted 17 October, 2022; v1 submitted 29 March, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  19. arXiv:2201.11539  [pdf, ps, other

    cs.IT

    Coded Caching with Private Demands and Caches

    Authors: Ali Gholami, Kai Wan, Hua Sun, Mingyue Ji, Giuseppe Caire

    Abstract: Recently it was shown that the seminal Maddah-Ali and Niesen (MAN) coded caching scheme leaks the demand information of each user to the others. Many works have considered coded caching with demand privacy, while each non-trivial existing coded caching scheme with private demands was built on the fact that the cache information of each user is private to the others. However, most of these schemes… ▽ More

    Submitted 2 November, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: 46 pages

  20. arXiv:2201.11067  [pdf, other

    cs.NI

    ROMA: Resource Orchestration for Microservices-based 5G Applications

    Authors: Anousheh Gholami, Kunal Rao, Wang-Pin Hsiung, Oliver Po, Murugan Sankaradas, Srimat Chakradhar

    Abstract: With the growth of 5G, Internet of Things (IoT), edge computing and cloud computing technologies, the infrastructure (compute and network) available to emerging applications (AR/VR, autonomous driving, industry 4.0, etc.) has become quite complex. There are multiple tiers of computing (IoT devices, near edge, far edge, cloud, etc.) that are connected with different types of networking technologies… ▽ More

    Submitted 25 February, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

    Comments: Accepted at 2022 IEEE/IFIP Network Operations and Management Symposium

  21. arXiv:2110.13041  [pdf, other

    cs.LG cs.AR physics.data-an physics.ins-det

    Applications and Techniques for Fast Machine Learning in Science

    Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

    Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: 66 pages, 13 figures, 5 tables

    Report number: FERMILAB-PUB-21-502-AD-E-SCD

    Journal ref: Front. Big Data 5, 787421 (2022)

  22. arXiv:2109.01050  [pdf, other

    cs.LG cs.AI math.NA physics.comp-ph

    Characterizing possible failure modes in physics-informed neural networks

    Authors: Aditi S. Krishnapriyan, Amir Gholami, Shandian Zhe, Robert M. Kirby, Michael W. Mahoney

    Abstract: Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial pro… ▽ More

    Submitted 11 November, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: 22 pages

    Journal ref: NeurIPS 2021

  23. arXiv:2108.08730  [pdf, other

    cs.CE math.AP

    Accurate 3D frequency-domain seismic wave modeling with the wavelength-adaptive 27-point finite-difference stencil: a tool for full waveform inversion

    Authors: Hossein S. Aghamiry, Ali Gholami, Laure Combe, Stéphane Operto

    Abstract: Efficient frequency-domain Full Waveform Inversion (FWI) of long-offset/wide-azimuth node data can be designed with a few discrete frequencies. However, 3D frequency-domain seismic modeling remains challenging since it requires solving a large and sparse linear indefinite system per frequency. When such systems are solved with direct methods or hybrid direct/iterative solvers, based upon domain de… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  24. arXiv:2107.00910  [pdf, other

    cs.CL

    Learned Token Pruning for Transformers

    Authors: Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer

    Abstract: Deploying transformer models in practice is challenging due to their inference cost, which scales quadratically with input sequence length. To address this, we present a novel Learned Token Pruning (LTP) method which adaptively removes unimportant tokens as an input sequence passes through transformer layers. In particular, LTP prunes tokens with an attention score below a threshold value which is… ▽ More

    Submitted 2 June, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: KDD 2022 (Research Track)

  25. arXiv:2104.07853  [pdf, other

    eess.SY cs.AI cs.LG

    On the Importance of Trust in Next-Generation Networked CPS Systems: An AI Perspective

    Authors: Anousheh Gholami, Nariman Torkzaban, John S. Baras

    Abstract: With the increasing scale, complexity, and heterogeneity of the next generation networked systems, seamless control, management, and security of such systems becomes increasingly challenging. Many diverse applications have driven interest in networked systems, including large-scale distributed learning, multi-agent optimization, 5G service provisioning, and network slicing, etc. In this paper, we… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  26. arXiv:2103.16827  [pdf, other

    eess.AS cs.CL cs.SD

    Integer-only Zero-shot Quantization for Efficient Speech Recognition

    Authors: Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer

    Abstract: End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks. However, these models perform poorly on edge hardware due to large memory and computation requirements. While quantizing model weights and/or activations to low-precision can be a promising solution, previous research on quantizing ASR models is limited. In particular, the previous ap… ▽ More

    Submitted 30 January, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2022

  27. arXiv:2103.13630  [pdf, other

    cs.CV

    A Survey of Quantization Methods for Efficient Neural Network Inference

    Authors: Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer

    Abstract: As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fi… ▽ More

    Submitted 21 June, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence

  28. arXiv:2101.08940  [pdf, other

    cs.CV

    Hessian-Aware Pruning and Optimal Neural Implant

    Authors: Shixing Yu, Zhewei Yao, Amir Gholami, Zhen Dong, Sehoon Kim, Michael W Mahoney, Kurt Keutzer

    Abstract: Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models. However, existing structured-pruning methods often result in significant accuracy degradation for moderate pruning levels. To address this problem, we introduce a new Hessian Aware Pruning (HAP) method coupled with a Neural Implant approach that uses second-order sensitivity as a metric f… ▽ More

    Submitted 21 June, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

  29. arXiv:2101.01321  [pdf, other

    cs.CL

    I-BERT: Integer-only BERT Quantization

    Authors: Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer

    Abstract: Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this, previous work on quantizing Transformer based models use floati… ▽ More

    Submitted 8 June, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Journal ref: ICML 2021 (Oral)

  30. arXiv:2012.02206  [pdf, other

    cs.CV cs.LG eess.IV

    Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

    Authors: Dave Zhenyu Chen, Ali Gholami, Matthias Nießner, Angel X. Chang

    Abstract: We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects. To address the 3D object detection and description problems, we propose Scan2Cap, an end-to-end trained method, to detect objects in the input scene and describe them in… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: Video: https://youtu.be/AgmIpDbwTCY

  31. arXiv:2011.10680  [pdf, other

    cs.CV

    HAWQV3: Dyadic Neural Network Quantization

    Authors: Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qi**g Huang, Yida Wang, Michael W. Mahoney, Kurt Keutzer

    Abstract: Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values. This hidden cost limits the latency improvement realized by quantizing Neural Networks. To address this, we present HAWQV3, a novel mixed-precision integer-only quantization framework. The contributions of HAWQV3 are the following: (i) An integer-on… ▽ More

    Submitted 23 June, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Journal ref: ICML 2021

  32. arXiv:2009.14446  [pdf, other

    cs.NI eess.SY math.OC

    Joint Mobility-Aware UAV Placement and Routing in Multi-Hop UAV Relaying Systems

    Authors: Anousheh Gholami, Nariman Torkzaban, John S. Baras, Chrysa Papagianni

    Abstract: Unmanned Aerial Vehicles (UAVs) have been extensively utilized to provide wireless connectivity in rural and under-developed areas, enhance network capacity and provide support for peaks or unexpected surges in user demand, mainly due to their fast deployment, cost-efficiency and superior communication performance resulting from Line of Sight (LoS)-dominated wireless channels. In order to exploit… ▽ More

    Submitted 30 September, 2020; originally announced September 2020.

    Comments: 15 Pages, Accepted at ADHOCNETS2020

  33. arXiv:2007.05086  [pdf, other

    cs.LG stat.ML

    Boundary thickness and robustness in learning models

    Authors: Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

    Abstract: Robustness of machine learning models to various adversarial and non-adversarial corruptions continues to be of interest. In this paper, we introduce the notion of the boundary thickness of a classifier, and we describe its connection with and usefulness for model robustness. Thick decision boundaries lead to improved performance, while thin decision boundaries lead to overfitting (e.g., measured… ▽ More

    Submitted 12 January, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Journal ref: NeurIPS 2020

  34. arXiv:2006.00719  [pdf, other

    cs.LG math.NA stat.ML

    ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

    Authors: Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney

    Abstract: We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order m… ▽ More

    Submitted 28 April, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

    Journal ref: AAAI 2021

  35. arXiv:2003.07845  [pdf, other

    cs.CL cs.LG

    PowerNorm: Rethinking Batch Normalization in Transformers

    Authors: Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

    Abstract: The standard normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN). This is different than batch normalization (BN), which is widely-adopted in Computer Vision. The preferred use of LN in NLP is principally due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks;… ▽ More

    Submitted 28 June, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Journal ref: ICML 2020

  36. arXiv:2002.03071  [pdf, other

    cs.NI

    Joint Satellite Gateway Placement and Routing for Integrated Satellite-Terrestrial Networks

    Authors: Nariman Torkzaban, Anousheh Gholami, Chrysa Papagianni, John S. Baras

    Abstract: With the increasing attention to the integrated satellite-terrestrial networks (ISTNs), the satellite gateway placement problem becomes of paramount importance. The resulting network performance may vary depending on the different design strategies. In this paper, a joint satellite gateway placement and routing strategy for the terrestrial network is proposed to minimize the overall cost of gatewa… ▽ More

    Submitted 5 October, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: 6 pages, In Proceedings of IEEE ICC 2020. https://ieeexplore.ieee.org/document/9149175 N. Torkzaban, A. Gholami, J. S. Baras and C. Papagianni, "Joint Satellite Gateway Placement and Routing for Integrated Satellite-Terrestrial Networks," ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020, pp. 1-6. doi: 10.1109/ICC40277.2020.9149175

  37. arXiv:2001.04802  [pdf

    cs.LG math.NA stat.CO stat.ML

    A Bayesian Monte-Carlo Uncertainty Model for Assessment of Shear Stress Entropy

    Authors: Amin Kazemian-Kale-Kale, Azadeh Gholami, Mohammad Rezaie-Balf, Amir Mosavi, Ahmed A Sattar, Bahram Gharabaghi, Hossein Bonakdari

    Abstract: The entropy models have been recently adopted in many studies to evaluate the distribution of the shear stress in circular channels. However, the uncertainty in their predictions and their reliability remains an open question. We present a novel method to evaluate the uncertainty of four popular entropy models, including Shannon, Shannon-Power Low (PL), Tsallis, and Renyi, in shear stress estimati… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

    Comments: 48 pages, 7 figures

    MSC Class: 65Z05

  38. arXiv:2001.00281  [pdf, other

    cs.CV

    ZeroQ: A Novel Zero Shot Quantization Framework

    Authors: Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

    Abstract: Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization method… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

    Comments: CVPR 2020

  39. arXiv:1912.07145  [pdf, other

    cs.LG math.NA

    PyHessian: Neural Networks Through the Lens of the Hessian

    Authors: Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

    Abstract: We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open sour… ▽ More

    Submitted 5 March, 2020; v1 submitted 15 December, 2019; originally announced December 2019.

    Journal ref: IEEE BigData 2020 (and ICML Workshop 2020)

  40. arXiv:1911.03852  [pdf, other

    cs.CV

    HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

    Authors: Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

    Abstract: Quantization is an effective method for reducing memory footprint and inference time of Neural Networks, e.g., for efficient inference in the cloud, especially at the edge. However, ultra low precision quantization could lead to significant degradation in model generalization. A promising method to address this is to perform mixed-precision quantization, where more sensitive layers are kept at hig… ▽ More

    Submitted 9 November, 2019; originally announced November 2019.

    Journal ref: NeurIPS 2020 paper, link: https://proceedings.neurips.cc/paper/2020/file/d77c703536718b95308130ff2e5cf9ee-Supplemental.pdf

  41. arXiv:1910.02653  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

    Authors: Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

    Abstract: We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal rematerialization schedules in reasonable times (under an hour) using off-the-shelf MILP solvers or near-optimal schedules with an approximation algorithm,… ▽ More

    Submitted 14 May, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: In Proceedings of 3rd Conference Machine Learning and Systems 2020 (MLSys 2020)

  42. arXiv:1909.05840  [pdf, other

    cs.CL cs.LG

    Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

    Authors: Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

    Abstract: Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challengin… ▽ More

    Submitted 24 September, 2019; v1 submitted 12 September, 2019; originally announced September 2019.

    Journal ref: AAAI 2020

  43. arXiv:1909.02150  [pdf, other

    eess.SP cs.RO eess.SY

    Drone-Assisted Communications for Remote Areas and Disaster Relief

    Authors: Anousheh Gholami, Usman A. Fiaz, John S. Baras

    Abstract: We explore an end-to-end (including access and backhaul links) UAV-assisted wireless communication system, considering both uplink and downlink traffics, with the goal of supporting demand of the Ground Users (GUs) using the minimum number of UAVs. Moreover, in order to extend the operational (flight) time of UAVs, we exploit an energy-aware routing scheme. Our intention is to design and analyze t… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Accepted at DGRS 2019

  44. arXiv:1906.04596  [pdf, other

    cs.LG stat.ML

    ANODEV2: A Coupled Neural ODE Evolution Framework

    Authors: Tianjun Zhang, Zhewei Yao, Amir Gholami, Kurt Keutzer, Joseph Gonzalez, George Biros, Michael Mahoney

    Abstract: It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time step**. Here, we propose ANODEV2, which is an extension of this approach that also allows evolution of the neural network… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Journal ref: NeurIPS 2019

  45. arXiv:1905.03696  [pdf, other

    cs.CV

    HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

    Authors: Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer

    Abstract: Model size and inference speed/power have become a major challenge in the deployment of Neural Networks for many applications. A promising approach to address these problems is quantization. However, uniformly quantizing a model to ultra low precision leads to significant accuracy degradation. A novel solution for this is to use mixed-precision quantization, as some parts of the network may allow… ▽ More

    Submitted 29 April, 2019; originally announced May 2019.

    Comments: ICCV 2019

    Journal ref: ICCV 2019 paper

  46. arXiv:1904.00800  [pdf, ps, other

    cs.IT

    Private Shotgun DNA Sequencing: A Structured Approach

    Authors: Ali Gholami, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari

    Abstract: DNA sequencing has faced a huge demand since it was first introduced as a service to the public. This service is often offloaded to the sequencing companies who will have access to full knowledge of individuals' sequences, a major violation of privacy. To address this challenge, we propose a solution, which is based on separating the process of reading the fragments of sequences, which is done at… ▽ More

    Submitted 2 April, 2019; v1 submitted 28 March, 2019; originally announced April 2019.

    Comments: 10 pages, 3 figures. arXiv admin note: text overlap with arXiv:1811.10693

    ACM Class: E.4; H.1.1

  47. arXiv:1903.06237  [pdf, other

    cs.LG stat.ML

    Inefficiency of K-FAC for Large Batch Size Training

    Authors: Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney

    Abstract: In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch. However, for both training loss and testing error, recent results analyzing large batch Stochastic Gradient Descent (SGD) have found sharp diminishing returns, beyond a certain critical batch size. In the hopes of addressing this, it ha… ▽ More

    Submitted 31 July, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

    Journal ref: AAAI 2020

  48. arXiv:1902.10298  [pdf, other

    cs.LG

    ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

    Authors: Amir Gholami, Kurt Keutzer, George Biros

    Abstract: Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in [8]… ▽ More

    Submitted 1 July, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

  49. arXiv:1812.06371  [pdf, other

    cs.LG cs.CR stat.ML

    Trust Region Based Adversarial Attack on Neural Networks

    Authors: Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael Mahoney

    Abstract: Deep Neural Networks are quite vulnerable to adversarial perturbations. Current state-of-the-art adversarial attack methods typically require very time consuming hyper-parameter tuning, or require many iterations to solve an optimization based adversarial attack. To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial pertu… ▽ More

    Submitted 15 December, 2018; originally announced December 2018.

    Journal ref: CVPR 2019

  50. arXiv:1812.01216  [pdf, other

    cs.LG

    Parameter Re-Initialization through Cyclical Batch Size Schedules

    Authors: Norman Mu, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

    Abstract: Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by repeated annealing and injection of noise in the training process. We implement this through a cyclical batch size schedule motivated by a Bayesian perspective… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: Presented in Systems for Machine Learning Workshop at NeurIPS'18 conference

    Journal ref: NeurIPS 2018 Workshop