Skip to main content

Showing 1–7 of 7 results for author: Ankner, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11196  [pdf, other

    cs.CV

    Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

    Authors: Rishab Parthasarathy, Zack Ankner, Aaron Gokaslan

    Abstract: A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by jointly optimizing for consistency across both time and views of the scene. In this paper, we instead investigate whether it is necessary to explicitly enforce… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures, 3 tables

  2. arXiv:2405.20541  [pdf, other

    cs.LG cs.CL

    Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

    Authors: Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

    Abstract: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2402.05109  [pdf, other

    cs.LG

    Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

    Authors: Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

    Abstract: To combat the memory bandwidth-bound nature of autoregressive LLM inference, previous research has proposed the speculative decoding framework. To perform speculative decoding, a small draft model proposes candidate continuations of the input sequence, that are then verified in parallel by the base model. One way to specify the draft model, as used in the recent Medusa decoding framework, is as a… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  4. arXiv:2311.09431  [pdf, other

    cs.LG cs.CL

    Striped Attention: Faster Ring Attention for Causal Transformers

    Authors: William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian **, Zhiye Song, Jonathan Ragan-Kelley

    Abstract: To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we study the performance characteristics of Ring Attention in the important special case of causal transformer… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  5. arXiv:2305.15096  [pdf, other

    cs.CL cs.AI

    Dynamic Masking Rate Schedules for MLM Pretraining

    Authors: Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

    Abstract: Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectivel… ▽ More

    Submitted 10 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  6. arXiv:2212.00291  [pdf, other

    cs.LG

    The Effect of Data Dimensionality on Neural Network Prunability

    Authors: Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, Tian **

    Abstract: Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy. In this work, we study the properties of input data that may contribute to the prunability of a neural network. For high dimens… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  7. arXiv:2211.16677  [pdf, other

    cs.CV cs.AI cs.GR

    3D Neural Field Generation using Triplane Diffusion

    Authors: J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein

    Abstract: Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D t… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Project page: https://jryanshue.com/nfd