Skip to main content

Showing 1–13 of 13 results for author: Umuroglu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.10432  [pdf, other

    cs.LG cs.AR cs.PF

    A2Q+: Improving Accumulator-Aware Weight Quantization

    Authors: Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig, Yaman Umuroglu

    Abstract: Quantization techniques commonly reduce the inference costs of neural networks by restricting the precision of weights and activations. Recent studies show that also reducing the precision of the accumulator can further improve hardware efficiency at the risk of numerical overflow, which introduces arithmetic errors that can degrade model accuracy. To avoid numerical overflow while maintaining acc… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  2. arXiv:2206.11791  [pdf, other

    cs.LG cs.AR

    Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

    Authors: Hendrik Borras, Giuseppe Di Guglielmo, Javier Duarte, Nicolò Ghielmetti, Ben Hawks, Scott Hauck, Shih-Chieh Hsu, Ryan Kastner, Jason Liang, Andres Meza, Jules Muhizi, Tai Nguyen, Rushil Roy, Nhan Tran, Yaman Umuroglu, Olivia Weng, Aidan Yokuda, Michaela Blott

    Abstract: We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classificatio… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 15 pages, 7 figures, Contribution to 3rd Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench) at 5th Conference on Machine Learning and Systems (MLSys)

    Report number: FERMILAB-CONF-22-479-SCD

  3. arXiv:2206.07527  [pdf, other

    cs.LG cs.AR cs.PL stat.ML

    QONNX: Representing Arbitrary-Precision Quantized Neural Networks

    Authors: Alessandro Pappalardo, Yaman Umuroglu, Michaela Blott, Jovan Mitrevski, Ben Hawks, Nhan Tran, Vladimir Loncar, Sioni Summers, Hendrik Borras, Jules Muhizi, Matthew Trahms, Shih-Chieh Hsu, Scott Hauck, Javier Duarte

    Abstract: We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clip**, resulting in two new backward-compatible variants: the quantized operator format with clip** and quantiz… ▽ More

    Submitted 24 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: 9 pages, 5 figures, Contribution to 4th Workshop on Accelerated Machine Learning (AccML) at HiPEAC 2022 Conference

    Report number: FERMILAB-CONF-22-471-SCD

  4. arXiv:2202.02310  [pdf, other

    cs.LG cs.AR

    EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

    Authors: Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu

    Abstract: Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  5. arXiv:2102.11289  [pdf, other

    cs.LG hep-ex physics.data-an physics.ins-det

    Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

    Authors: Benjamin Hawks, Javier Duarte, Nicholas J. Fraser, Alessandro Pappalardo, Nhan Tran, Yaman Umuroglu

    Abstract: Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. I… ▽ More

    Submitted 19 July, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: 22 pages, 7 Figures, 1 Table

    Report number: FERMILAB-PUB-21-056-SCD

    Journal ref: Front. AI 4, 94 (2021)

  6. arXiv:2004.03021  [pdf, other

    eess.SP cs.AR cs.LG

    LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications

    Authors: Yaman Umuroglu, Yash Akhauri, Nicholas J. Fraser, Michaela Blott

    Abstract: Deployment of deep neural networks for applications that require very high throughput or extremely low latency is a severe computational challenge, further exacerbated by inefficiencies in map** the computation to hardware. We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation. By exploiting the equivalence of artificial neu… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

  7. Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

    Authors: Yaman Umuroglu, Davide Conficconi, Lahiru Rasnayake, Thomas B. Preusser, Magnus Sjalander

    Abstract: Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offerin… ▽ More

    Submitted 11 June, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

    Comments: Invited paper at ACM TRETS as extension of FPL'18 paper arXiv:1806.08862

  8. arXiv:1809.04570  [pdf, other

    cs.AR

    FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

    Authors: Michaela Blott, Thomas Preusser, Nicholas Fraser, Giulio Gambardella, Kenneth O'Brien, Yaman Umuroglu

    Abstract: Convolutional Neural Networks have rapidly become the most successful machine learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing-systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activ… ▽ More

    Submitted 12 September, 2018; originally announced September 2018.

    Comments: to be published in ACM TRETS Special Edition on Deep Learning

  9. arXiv:1807.03123  [pdf, other

    cs.CV

    Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic

    Authors: Michaela Blott, Thomas B. Preusser, Nicholas Fraser, Giulio Gambardella, Kenneth OBrien, Yaman Umuroglu, Miriam Leeser

    Abstract: Convolutional Neural Networks have dramatically improved in recent years, surpassing human accuracy on certain problems and performance exceeding that of traditional computer vision algorithms. While the compute pattern in itself is relatively simple, significant compute and memory challenges remain as CNNs may contain millions of floating-point parameters and require billions of floating-point op… ▽ More

    Submitted 26 June, 2018; originally announced July 2018.

  10. arXiv:1806.08862  [pdf, other

    cs.AR

    BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

    Authors: Yaman Umuroglu, Lahiru Rasnayake, Magnus Sjalander

    Abstract: Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offerin… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

    Comments: To appear at FPL'18

  11. arXiv:1709.04060  [pdf, other

    cs.CV

    Streamlined Deployment for Quantized Neural Networks

    Authors: Yaman Umuroglu, Magnus Jahre

    Abstract: Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem, promising to offer most of the DNN accuracy benefits with much lower computational cost. However, harvesting these benefits on existing mobile CPUs is a challenge… ▽ More

    Submitted 30 May, 2018; v1 submitted 12 September, 2017; originally announced September 2017.

    Comments: Presented at the International Workshop on Highly Efficient Neural Networks Design (HENND) co-located with CASES'17

  12. arXiv:1701.03400  [pdf, other

    cs.CV cs.LG

    Scaling Binarized Neural Networks on Reconfigurable Logic

    Authors: Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

    Abstract: Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost. They are particularly well suited to reconfigurable logic devices, which contain an abundance of fine-grained compute resources and can result in smaller, lower power implementations, or conversely in higher classification rates. Towards this end, the… ▽ More

    Submitted 27 January, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

    Comments: To appear in the PARMA-DITAM workshop at HiPEAC 2017, January 2017

  13. arXiv:1612.07119  [pdf, other

    cs.CV cs.AR cs.LG

    FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

    Authors: Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers

    Abstract: Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optim… ▽ More

    Submitted 1 December, 2016; originally announced December 2016.

    Comments: To appear in the 25th International Symposium on Field-Programmable Gate Arrays, February 2017