-
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Authors:
Amirmojtaba Sabour,
Sanja Fidler,
Karsten Kreis
Abstract:
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling…
▽ More
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called $\textit{Align Your Steps}$. We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets. We evaluate our novel approach on several image, video as well as 2D toy data synthesis benchmarks, using a variety of different samplers, and observe that our optimized schedules outperform previous hand-crafted schedules in almost all experiments. Our method demonstrates the untapped potential of sampling schedule optimization, especially in the few-step synthesis regime.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
Authors:
Koichi Namekata,
Amirmojtaba Sabour,
Sanja Fidler,
Seung Wook Kim
Abstract:
Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks. However, generating fine-grained segmentation masks with diffusion models often requires additional training on annotated datasets, leaving it unclear to what extent pre-trained diffusion models alone understand the semantic relations of their generated imag…
▽ More
Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks. However, generating fine-grained segmentation masks with diffusion models often requires additional training on annotated datasets, leaving it unclear to what extent pre-trained diffusion models alone understand the semantic relations of their generated images. To address this question, we leverage the semantic knowledge extracted from Stable Diffusion (SD) and aim to develop an image segmentor capable of generating fine-grained segmentation maps without any additional training. The primary difficulty stems from the fact that semantically meaningful feature maps typically exist only in the spatially lower-dimensional layers, which poses a challenge in directly extracting pixel-level semantic relations from these feature maps. To overcome this issue, our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps by exploiting SD's generation process and utilizes them for constructing image-resolution segmentation maps. In extensive experiments, the produced segmentation maps are demonstrated to be well delineated and capture detailed parts of the images, indicating the existence of highly accurate pixel-level semantic knowledge in diffusion models.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
CCGG: A Deep Autoregressive Model for Class-Conditional Graph Generation
Authors:
Yassaman Ommi,
Matin Yousefabadi,
Faezeh Faez,
Amirmojtaba Sabour,
Mahdieh Soleymani Baghshah,
Hamid R. Rabiee
Abstract:
Graph data structures are fundamental for studying connected entities. With an increase in the number of applications where data is represented as graphs, the problem of graph generation has recently become a hot topic. However, despite its significance, conditional graph generation that creates graphs with desired features is relatively less explored in previous studies. This paper addresses the…
▽ More
Graph data structures are fundamental for studying connected entities. With an increase in the number of applications where data is represented as graphs, the problem of graph generation has recently become a hot topic. However, despite its significance, conditional graph generation that creates graphs with desired features is relatively less explored in previous studies. This paper addresses the problem of class-conditional graph generation that uses class labels as generation constraints by introducing the Class Conditioned Graph Generator (CCGG). We built CCGG by injecting the class information as an additional input into a graph generator model and including a classification loss in its total loss along with a gradient passing trick. Our experiments show that CCGG outperforms existing conditional graph generation methods on various datasets. It also manages to maintain the quality of the generated graphs in terms of distribution-based evaluation metrics.
△ Less
Submitted 25 April, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Dynamic Averaging Load Balancing on Cycles
Authors:
Dan Alistarh,
Giorgi Nadiradze,
Amirmojtaba Sabour
Abstract:
We consider the following dynamic load-balancing process: given an underlying graph $G$ with $n$ nodes, in each step $t\geq 0$, one unit of load is created, and placed at a randomly chosen graph node. In the same step, the chosen node picks a random neighbor, and the two nodes balance their loads by averaging them. We are interested in the expected gap between the minimum and maximum loads at node…
▽ More
We consider the following dynamic load-balancing process: given an underlying graph $G$ with $n$ nodes, in each step $t\geq 0$, one unit of load is created, and placed at a randomly chosen graph node. In the same step, the chosen node picks a random neighbor, and the two nodes balance their loads by averaging them. We are interested in the expected gap between the minimum and maximum loads at nodes as the process progresses, and its dependence on $n$ and on the graph structure.
Similar variants of the above graphical balanced allocation process have been studied by Peres, Talwar, and Wieder, and by Sauerwald and Sun for regular graphs. These authors left as open the question of characterizing the gap in the case of \emph{cycle graphs} in the \emph{dynamic} case, where weights are created during the algorithm's execution. For this case, the only known upper bound is of $\mathcal{O}( n \log n )$, following from a majorization argument due to Peres, Talwar, and Wieder, which analyzes a related graphical allocation process.
In this paper, we provide an upper bound of $\mathcal{O} ( \sqrt n \log n )$ on the expected gap of the above process for cycles of length $n$. We introduce a new potential analysis technique, which enables us to bound the difference in load between $k$-hop neighbors on the cycle, for any $k \leq n / 2$. We complement this with a "gap covering" argument, which bounds the maximum value of the gap by bounding its value across all possible subsets of a certain structure, and recursively bounding the gaps within each subset. We provide analytical and experimental evidence that our upper bound on the gap is tight up to a logarithmic factor.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Asynchronous Decentralized SGD with Quantized and Local Updates
Authors:
Giorgi Nadiradze,
Amirmojtaba Sabour,
Peter Davies,
Shigang Li,
Dan Alistarh
Abstract:
Decentralized optimization is emerging as a viable alternative for scalable distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in…
▽ More
Decentralized optimization is emerging as a viable alternative for scalable distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in such a relaxed setting, this line of work often assumes \emph{global} communication rounds, which require additional synchronization. In this paper, we consider decentralized optimization in the simpler, but harder to analyze, \emph{asynchronous gossip} model, in which communication occurs in discrete, randomly chosen pairings among nodes. Perhaps surprisingly, we show that a variant of SGD called \emph{SwarmSGD} still converges in this setting, even if \emph{non-blocking communication}, \emph{quantization}, and \emph{local steps} are all applied \emph{in conjunction}, and even if the node data distributions and underlying graph topology are both \emph{heterogenous}. Our analysis is based on a new connection with multi-dimensional load-balancing processes. We implement this algorithm and deploy it in a super-computing environment, showing that it can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for certain tasks.
△ Less
Submitted 25 March, 2022; v1 submitted 27 October, 2019;
originally announced October 2019.