Skip to main content

Showing 1–11 of 11 results for author: Heek, J

.
  1. arXiv:2406.04103  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Multistep Distillation of Diffusion Models via Moment Matching

    Authors: Tim Salimans, Thomas Mensink, Jonathan Heek, Emiel Hoogeboom

    Abstract: We present a new method for making diffusion models faster to sample. The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data given noisy data along the sampling trajectory. Our approach extends recently proposed one-step methods to the multi-step case, and provides a new perspective by interpreting these approaches in terms of mom… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2403.06807  [pdf, other

    cs.LG cs.CV stat.ML

    Multistep Consistency Models

    Authors: Jonathan Heek, Emiel Hoogeboom, Tim Salimans

    Abstract: Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion m… ▽ More

    Submitted 3 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  3. arXiv:2402.09470  [pdf, other

    cs.LG stat.ML

    Rolling Diffusion Models

    Authors: David Ruhe, Jonathan Heek, Tim Salimans, Emiel Hoogeboom

    Abstract: Diffusion models have recently been increasingly applied to temporal data such as video, fluid mechanics simulations, or climate data. These methods generally treat subsequent frames equally regarding the amount of noise in the diffusion process. This paper explores Rolling Diffusion: a new approach that uses a sliding window denoising process. It ensures that the diffusion process progressively c… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  4. arXiv:2307.06304  [pdf, other

    cs.CV cs.AI cs.LG

    Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

    Authors: Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby

    Abstract: The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  5. arXiv:2304.02847  [pdf, other

    cs.CV cs.AI cs.LG

    Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets

    Authors: Jonas Ngnawe, Marianne ABEMGNIGNI NJIFON, Jonathan Heek, Yann Dauphin

    Abstract: Deep networks have achieved impressive results on a range of well-curated benchmark datasets. Surprisingly, their performance remains sensitive to perturbations that have little effect on human performance. In this work, we propose a novel extension of Mixup called Robustmix that regularizes networks to classify based on lower-frequency spatial features. We show that this type of regularization im… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted at: Workshop on Distribution Shifts, 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://openreview.net/forum?id=Na64z0YpOx

  6. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  7. arXiv:2301.11093  [pdf, other

    cs.CV cs.LG stat.ML

    Simple diffusion: End-to-end diffusion for high resolution images

    Authors: Emiel Hoogeboom, Jonathan Heek, Tim Salimans

    Abstract: Currently, applying diffusion models in pixel space of high resolution images is difficult. Instead, existing approaches focus on diffusion in lower dimensional spaces (latent diffusion), or have multiple super-resolution levels of generation referred to as cascades. The downside is that these approaches add additional complexity to the diffusion framework. This paper aims to improve denoising d… ▽ More

    Submitted 12 December, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

  8. arXiv:2211.05102  [pdf, other

    cs.LG cs.CL

    Efficiently Scaling Transformer Inference

    Authors: Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean

    Abstract: We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a sim… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  9. arXiv:2111.07470  [pdf, other

    cs.LG physics.ao-ph

    Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks

    Authors: Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Jason Hickey, Aaron Bell, Nal Kalchbrenner

    Abstract: The problem of forecasting weather has been scientifically studied for centuries due to its high impact on human lives, transportation, food production and energy management, among others. Current operational forecasting models are based on physics and use supercomputers to simulate the atmosphere to make forecasts hours and days in advance. Better physics-based forecasts require improvements in t… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: 34 pages

  10. arXiv:2003.12140  [pdf, other

    cs.LG physics.ao-ph stat.ML

    MetNet: A Neural Weather Model for Precipitation Forecasting

    Authors: Casper Kaae Sønderby, Lasse Espeholt, Jonathan Heek, Mostafa Dehghani, Avital Oliver, Tim Salimans, Shreya Agrawal, Jason Hickey, Nal Kalchbrenner

    Abstract: Weather forecasting is a long standing scientific challenge with direct social and economic impact. The task is suitable for deep neural networks due to vast amounts of continuously collected data and a rich spatial and temporal structure that presents long range dependencies. We introduce MetNet, a neural network that forecasts precipitation up to 8 hours into the future at the high spatial resol… ▽ More

    Submitted 30 March, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

  11. arXiv:1908.03491  [pdf, other

    cs.LG cs.CV stat.ML

    Bayesian Inference for Large Scale Image Classification

    Authors: Jonathan Heek, Nal Kalchbrenner

    Abstract: Bayesian inference promises to ground and improve the performance of deep neural networks. It promises to be robust to overfitting, to simplify the training procedure and the space of hyperparameters, and to provide a calibrated measure of uncertainty that can enhance decision making, agent exploration and prediction fairness. Markov Chain Monte Carlo (MCMC) methods enable Bayesian inference by ge… ▽ More

    Submitted 9 August, 2019; originally announced August 2019.