Skip to main content

Showing 1–4 of 4 results for author: Stosic, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.05433  [pdf, other

    cs.LG

    FP8 Formats for Deep Learning

    Authors: Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Heinecke, Patrick Judd, John Kamalu, Naveen Mellempudi, Stuart Oberman, Mohammad Shoeybi, Michael Siu, Hao Wu

    Abstract: FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for representatio of special… ▽ More

    Submitted 29 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

  2. arXiv:2106.05495  [pdf, other

    cs.LG

    Distance Metric Learning through Minimization of the Free Energy

    Authors: Dusan Stosic, Darko Stosic, Teresa B. Ludermir, Borko Stosic

    Abstract: Distance metric learning has attracted a lot of interest for solving machine learning and pattern recognition problems over the last decades. In this work we present a simple approach based on concepts from statistical physics to learn optimal distance metric for a given problem. We formulate the task as a typical statistical physics problem: distances between patterns represent constituents of a… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  3. arXiv:2105.12920  [pdf, other

    cs.LG

    Search Spaces for Neural Model Training

    Authors: Darko Stosic, Dusan Stosic

    Abstract: While larger neural models are pushing the boundaries of what deep learning can do, often more weights are needed to train models rather than to run inference for tasks. This paper seeks to understand this behavior using search spaces -- adding weights creates extra degrees of freedom that form new paths for optimization (or wider search spaces) rendering neural model training more effective. We t… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  4. arXiv:2104.08378  [pdf, other

    cs.LG cs.AI cs.AR

    Accelerating Sparse Deep Neural Networks

    Authors: Asit Mishra, Jorge Albericio Latorre, Jeff Pool, Darko Stosic, Dusan Stosic, Ganesh Venkatesh, Chong Yu, Paulius Micikevicius

    Abstract: As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero values in parameters that can then be discarded from storage or computations. While most research focuses on high levels of sparsity, there are challenges in univ… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.