Skip to main content

Showing 1–3 of 3 results for author: Shippole, E

.
  1. arXiv:2401.11605  [pdf, other

    cs.CV cs.AI cs.LG

    Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

    Authors: Katherine Crowson, Stefan Andreas Baumann, Alex Birch, Tanishq Mathew Abraham, Daniel Z. Kaplan, Enrico Shippole

    Abstract: We present the Hourglass Diffusion Transformer (HDiT), an image generative model that exhibits linear scaling with pixel count, supporting training at high-resolution (e.g. $1024 \times 1024$) directly in pixel-space. Building on the Transformer architecture, which is known to scale to billions of parameters, it bridges the gap between the efficiency of convolutional U-Nets and the scalability of… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 20 pages, 13 figures, project page and code available at https://crowsonkb.github.io/hourglass-diffusion-transformers/

  2. arXiv:2310.16787  [pdf, other

    cs.CL cs.AI cs.LG

    The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

    Authors: Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

    Abstract: The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tool… ▽ More

    Submitted 4 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 30 pages (18 main), 6 figures, 5 tables

  3. arXiv:2309.00071  [pdf, other

    cs.CL cs.AI cs.LG

    YaRN: Efficient Context Window Extension of Large Language Models

    Authors: Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole

    Abstract: Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps… ▽ More

    Submitted 1 November, 2023; v1 submitted 31 August, 2023; originally announced September 2023.