Skip to main content

Showing 1–3 of 3 results for author: Perez, S P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.17224  [pdf, other

    cs.LG cs.AR cs.CL cs.ET cs.PF

    Training and inference of large language models using 8-bit floating point

    Authors: Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon

    Abstract: FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect h… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    ACM Class: I.2.7; B.2.4

  2. arXiv:2107.02027  [pdf, other

    cs.CL cs.CC cs.IT cs.LG

    Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance

    Authors: Mario Michael Krell, Matej Kosec, Sergio P. Perez, Andrew Fitzgibbon

    Abstract: Effective training of today's large language models (LLMs) depends on large batches and long sequences for throughput and accuracy. To handle variable-length sequences on hardware accelerators, it is common practice to introduce padding tokens, so that all sequences in a batch have the same length. We show in this paper that the variation in sequence lengths in common NLP datasets is such that up… ▽ More

    Submitted 5 October, 2022; v1 submitted 29 June, 2021; originally announced July 2021.

    Comments: Significantly new version with different authors and much more content. Much larger variety in experiments and exhaustive SOTA analysis

    MSC Class: 05-08 ACM Class: I.2.7; G.2.1

  3. arXiv:2007.10753  [pdf, other

    cs.CV eess.IV math.NA

    Enhancement of damaged-image prediction through Cahn-Hilliard Image Inpainting

    Authors: José A. Carrillo, Serafim Kalliadasis, Fuyue Liang, Sergio P. Perez

    Abstract: We assess the benefit of including an image inpainting filter before passing damaged images into a classification neural network. For this we employ a modified Cahn-Hilliard equation as an image inpainting filter, which is solved via a finite volume scheme with reduced computational cost and adequate properties for energy stability and boundedness. The benchmark dataset employed here is MNIST, whi… ▽ More

    Submitted 15 March, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: An interactive jupyter notebook with the code of this work is available at https://github.com/sergiopperez/Image_Inpainting. The MNIST dataset employed in this work can be downloaded from http://yann.lecun.com/exdb/mnist/

    MSC Class: 68U10; 94A08; 65M22; 76M25; 76M12