Search | arXiv e-print repository

Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion

Authors: Ge Ya Luo, Zhi Hao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, Christopher Pal

Abstract: With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor… ▽ More With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor that, given the initial and ending frames' bounding boxes, can predict up to 15 bounding boxes per frame for all the frames in a 25-frame clip. We perform experiments across 3 well-known AV video datasets: KITTI, Virtual-KITTI 2 and BDD100k. △ Less

Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.16287 [pdf, other]

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

Authors: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu

Abstract: A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vis… ▽ More A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vision models. Unfortunately, predicting parameters of very wide networks relies on copying small chunks of parameters multiple times and requires an extremely large number of parameters to support full prediction, which greatly hinders its adoption in practice. To address this limitation, we propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter decoder that expands to significantly wider networks without requiring as excessive increase of parameters as in previous attempts. LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner. We show that vision and language models (i.e., ViT and GPT-2) initialized with LoGAH achieve better performance than those initialized randomly or using existing hypernetworks. Furthermore, we show promising transfer learning results w.r.t. training LoGAH on small datasets and using the predicted parameters to initialize for larger tasks. We provide the codes in https://github.com/Blackzxy/LoGAH . △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 16 pages

arXiv:2309.09968 [pdf, other]

Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees

Authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Tal Kachman

Abstract: Tabular data is hard to acquire and is subject to missing values. This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow matching. In contrast to prior methods that rely on neural networks to learn the score function or the vector field, we adopt XGBoost, a widely used Gradient-Boo… ▽ More Tabular data is hard to acquire and is subject to missing values. This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) tabular data utilizing score-based diffusion and conditional flow matching. In contrast to prior methods that rely on neural networks to learn the score function or the vector field, we adopt XGBoost, a widely used Gradient-Boosted Tree (GBT) technique. To test our method, we build one of the most extensive benchmarks for tabular data generation and imputation, containing 27 diverse datasets and 9 metrics. Through empirical evaluation across the benchmark, we demonstrate that our approach outperforms deep-learning generation methods in data generation tasks and remains competitive in data imputation. Notably, it can be trained in parallel using CPUs without requiring a GPU. Our Python and R code is available at https://github.com/SamsungSAILMontreal/ForestDiffusion. △ Less

Submitted 19 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Code: https://github.com/SamsungSAILMontreal/ForestDiffusion

arXiv:2304.05907 [pdf, ps, other]

Diffusion models with location-scale noise

Authors: Alexia Jolicoeur-Martineau, Kilian Fatras, Ke Li, Tal Kachman

Abstract: Diffusion Models (DMs) are powerful generative models that add Gaussian noise to the data and learn to remove it. We wanted to determine which noise distribution (Gaussian or non-Gaussian) led to better generated data in DMs. Since DMs do not work by design with non-Gaussian noise, we built a framework that allows reversing a diffusion process with non-Gaussian location-scale noise. We use that fr… ▽ More Diffusion Models (DMs) are powerful generative models that add Gaussian noise to the data and learn to remove it. We wanted to determine which noise distribution (Gaussian or non-Gaussian) led to better generated data in DMs. Since DMs do not work by design with non-Gaussian noise, we built a framework that allows reversing a diffusion process with non-Gaussian location-scale noise. We use that framework to show that the Gaussian distribution performs the best over a wide range of other distributions (Laplace, Uniform, t, Generalized-Gaussian). △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.03094 [pdf, other]

PopulAtion Parameter Averaging (PAPA)

Authors: Alexia Jolicoeur-Martineau, Emy Gervais, Kilian Fatras, Yan Zhang, Simon Lacoste-Julien

Abstract: Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights. However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when different enough to benefit from… ▽ More Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights. However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when different enough to benefit from combining them, but similar enough to average well. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while slowly pushing the weights of the networks toward the population average of the weights. We also propose PAPA variants (PAPA-all, and PAPA-2) that average weights rarely rather than continuously; all methods increase generalization, but PAPA tends to perform best. PAPA reduces the performance gap between averaging and ensembling, increasing the average accuracy of a population of models by up to 0.8% on CIFAR-10, 1.9% on CIFAR-100, and 1.6% on ImageNet when compared to training independent (non-averaged) models. △ Less

Submitted 6 May, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: Blog post: https://ajolicoeur.wordpress.com/papa/, Code: https://github.com/SamsungSAILMontreal/PAPA, TMLR journal publication: https://openreview.net/forum?id=cPDVjsOytS

arXiv:2210.09505 [pdf, other]

CNT (Conditioning on Noisy Targets): A new Algorithm for Leveraging Top-Down Feedback

Authors: Alexia Jolicoeur-Martineau, Alex Lamb, Vikas Verma, Aniket Didolkar

Abstract: We propose a novel regularizer for supervised learning called Conditioning on Noisy Targets (CNT). This approach consists in conditioning the model on a noisy version of the target(s) (e.g., actions in imitation learning or labels in classification) at a random noise level (from small to large noise). At inference time, since we do not know the target, we run the network with only noise in place o… ▽ More We propose a novel regularizer for supervised learning called Conditioning on Noisy Targets (CNT). This approach consists in conditioning the model on a noisy version of the target(s) (e.g., actions in imitation learning or labels in classification) at a random noise level (from small to large noise). At inference time, since we do not know the target, we run the network with only noise in place of the noisy target. CNT provides hints through the noisy label (with less noise, we can more easily infer the true target). This give two main benefits: 1) the top-down feedback allows the model to focus on simpler and more digestible sub-problems and 2) rather than learning to solve the task from scratch, the model will first learn to master easy examples (with less noise), while slowly progressing toward harder examples (with more noise). △ Less

Submitted 26 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

arXiv:2205.09853 [pdf, other]

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Authors: Vikram Voleti, Alexia Jolicoeur-Martineau, Christopher Pal

Abstract: Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a ge… ▽ More Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using $\le$ 4 GPUs. Project page: https://mask-cond-video-diffusion.github.io ; Code : https://github.com/voletiv/mcvd-pytorch △ Less

Submitted 12 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022 ; 10 pages, 4 figures, 7 tables

arXiv:2105.14080 [pdf, other]

Gotta Go Fast When Generating Data with Score-Based Models

Authors: Alexia Jolicoeur-Martineau, Ke Li, Rémi Piché-Taillefer, Tal Kachman, Ioannis Mitliagkas

Abstract: Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evalua… ▽ More Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evaluations required by numerical SDE solvers. In this work, we aim to accelerate this process by devising a more efficient SDE solver. Existing approaches rely on the Euler-Maruyama (EM) solver, which uses a fixed step size. We found that naively replacing it with other SDE solvers fares poorly - they either result in low-quality samples or become slower than EM. To get around this issue, we carefully devise an SDE solver with adaptive step sizes tailored to score-based generative models piece by piece. Our solver requires only two score function evaluations, rarely rejects samples, and leads to high-quality samples. Our approach generates data 2 to 10 times faster than EM while achieving better or equal sample quality. For high-resolution images, our method leads to significantly higher quality samples than all other methods tested. Our SDE solver has the benefit of requiring no step size tuning. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: Code is available on https://github.com/AlexiaJM/score_sde_fast_sampling

arXiv:2009.05475 [pdf, other]

Adversarial score matching and improved sampling for image generation

Authors: Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Rémi Tachet des Combes, Ioannis Mitliagkas

Abstract: Denoising Score Matching with Annealed Langevin Sampling (DSM-ALS) has recently found success in generative modeling. The approach works by first training a neural network to estimate the score of a distribution, and then using Langevin dynamics to sample from the data distribution assumed by the score network. Despite the convincing visual quality of samples, this method appears to perform worse… ▽ More Denoising Score Matching with Annealed Langevin Sampling (DSM-ALS) has recently found success in generative modeling. The approach works by first training a neural network to estimate the score of a distribution, and then using Langevin dynamics to sample from the data distribution assumed by the score network. Despite the convincing visual quality of samples, this method appears to perform worse than Generative Adversarial Networks (GANs) under the Fréchet Inception Distance, a standard metric for generative models. We show that this apparent gap vanishes when denoising the final Langevin samples using the score network. In addition, we propose two improvements to DSM-ALS: 1) Consistent Annealed Sampling as a more stable alternative to Annealed Langevin Sampling, and 2) a hybrid training formulation, composed of both Denoising Score Matching and adversarial objectives. By combining these two techniques and exploring different network architectures, we elevate score matching methods and obtain results competitive with state-of-the-art image generation on CIFAR-10. △ Less

Submitted 10 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

Comments: Code at https://github.com/AlexiaJM/AdversarialConsistentScoreMatching

arXiv:2007.04202 [pdf, other]

Stochastic Hamiltonian Gradient Methods for Smooth Games

Authors: Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau, Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

Abstract: The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using t… ▽ More The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations. △ Less

Submitted 8 July, 2020; originally announced July 2020.

Comments: ICML 2020 - Proceedings of the 37th International Conference on Machine Learning

arXiv:1910.06922 [pdf, other]

Gradient penalty from a maximum margin perspective

Authors: Alexia Jolicoeur-Martineau, Ioannis Mitliagkas

Abstract: A popular heuristic for improved performance in Generative adversarial networks (GANs) is to use some form of gradient penalty on the discriminator. This gradient penalty was originally motivated by a Wasserstein distance formulation. However, the use of gradient penalty in other GAN formulations is not well motivated. We present a unifying framework of expected margin maximization and show that a… ▽ More A popular heuristic for improved performance in Generative adversarial networks (GANs) is to use some form of gradient penalty on the discriminator. This gradient penalty was originally motivated by a Wasserstein distance formulation. However, the use of gradient penalty in other GAN formulations is not well motivated. We present a unifying framework of expected margin maximization and show that a wide range of gradient-penalized GANs (e.g., Wasserstein, Standard, Least-Squares, and Hinge GANs) can be derived from this framework. Our results imply that employing gradient penalties induces a large-margin classifier (thus, a large-margin discriminator in GANs). We describe how expected margin maximization helps reduce vanishing gradients at fake (generated) samples, a known problem in GANs. From this framework, we derive a new $L^\infty$ gradient norm penalty with Hinge loss which generally produces equally good (or better) generated output in GANs than $L^2$-norm penalties (based on the Fréchet Inception Distance). △ Less

Submitted 24 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: Code at https://github.com/AlexiaJM/MaximumMarginGANs

arXiv:1901.02474 [pdf, other]

On Relativistic $f$-Divergences

Authors: Alexia Jolicoeur-Martineau

Abstract: This paper provides a more rigorous look at Relativistic Generative Adversarial Networks (RGANs). We prove that the objective function of the discriminator is a statistical divergence for any concave function $f$ with minimal properties ($f(0)=0$, $f'(0) \neq 0$, $\sup_x f(x)>0$). We also devise a few variants of relativistic $f$-divergences. Wasserstein GAN was originally justified by the idea th… ▽ More This paper provides a more rigorous look at Relativistic Generative Adversarial Networks (RGANs). We prove that the objective function of the discriminator is a statistical divergence for any concave function $f$ with minimal properties ($f(0)=0$, $f'(0) \neq 0$, $\sup_x f(x)>0$). We also devise a few variants of relativistic $f$-divergences. Wasserstein GAN was originally justified by the idea that the Wasserstein distance (WD) is most sensible because it is weak (i.e., it induces a weak topology). We show that the WD is weaker than $f$-divergences which are weaker than relativistic $f$-divergences. Given the good performance of RGANs, this suggests that WGAN does not performs well primarily because of the weak metric, but rather because of regularization and the use of a relativistic discriminator. We also take a closer look at estimators of relativistic $f$-divergences. We introduce the minimum-variance unbiased estimator (MVUE) for Relativistic paired GANs (RpGANs; originally called RGANs which could bring confusion) and show that it does not perform better. Furthermore, we show that the estimator of Relativistic average GANs (RaGANs) is only asymptotically unbiased, but that the finite-sample bias is small. Removing this bias does not improve performance. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Comments: Code is available on: https://github.com/AlexiaJM/relativistic-f-divergences

arXiv:1809.02145 [pdf, other]

GANs beyond divergence minimization

Authors: Alexia Jolicoeur-Martineau

Abstract: Generative adversarial networks (GANs) can be interpreted as an adversarial game between two players, a discriminator D and a generator G, in which D learns to classify real from fake data and G learns to generate realistic data by "fooling" D into thinking that fake data is actually real data. Currently, a dominating view is that G actually learns by minimizing a divergence given that the general… ▽ More Generative adversarial networks (GANs) can be interpreted as an adversarial game between two players, a discriminator D and a generator G, in which D learns to classify real from fake data and G learns to generate realistic data by "fooling" D into thinking that fake data is actually real data. Currently, a dominating view is that G actually learns by minimizing a divergence given that the general objective function is a divergence when D is optimal. However, this view has been challenged due to inconsistencies between theory and practice. In this paper, we discuss of the properties associated with most loss functions for G (e.g., saturating/non-saturating f-GAN, LSGAN, WGAN, etc.). We show that these loss functions are not divergences and do not have the same equilibrium as expected of divergences. This suggests that G does not need to minimize the same objective function as D maximize, nor maximize the objective of D after swap** real data with fake data (non-saturating GAN) but can instead use a wide range of possible loss functions to learn to generate realistic data. We define GANs through two separate and independent D maximization and G minimization steps. We generalize the generator step to four new classes of loss functions, most of which are actual divergences (while traditional G loss functions are not). We test a wide variety of loss functions from these four classes on a synthetic dataset and on CIFAR-10. We observe that most loss functions converge well and provide comparable data generation quality to non-saturating GAN, LSGAN, and WGAN-GP generator loss functions, whether we use divergences or non-divergences. These results suggest that GANs do not conform well to the divergence minimization theory and form a much broader range of models than previously assumed. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: Associated repository: https://github.com/AlexiaJM/GANsBeyondDivergenceMin

arXiv:1807.00734 [pdf, other]

The relativistic discriminator: a key element missing from standard GAN

Authors: Alexia Jolicoeur-Martineau

Abstract: In standard generative adversarial network (SGAN), the discriminator estimates the probability that the input data is real. The generator is trained to increase the probability that fake data is real. We argue that it should also simultaneously decrease the probability that real data is real because 1) this would account for a priori knowledge that half of the data in the mini-batch is fake, 2) th… ▽ More In standard generative adversarial network (SGAN), the discriminator estimates the probability that the input data is real. The generator is trained to increase the probability that fake data is real. We argue that it should also simultaneously decrease the probability that real data is real because 1) this would account for a priori knowledge that half of the data in the mini-batch is fake, 2) this would be observed with divergence minimization, and 3) in optimal settings, SGAN would be equivalent to integral probability metric (IPM) GANs. We show that this property can be induced by using a relativistic discriminator which estimate the probability that the given real data is more realistic than a randomly sampled fake data. We also present a variant in which the discriminator estimate the probability that the given real data is more realistic than fake data, on average. We generalize both approaches to non-standard GAN loss functions and we refer to them respectively as Relativistic GANs (RGANs) and Relativistic average GANs (RaGANs). We show that IPM-based GANs are a subset of RGANs which use the identity function. Empirically, we observe that 1) RGANs and RaGANs are significantly more stable and generate higher quality data samples than their non-relativistic counterparts, 2) Standard RaGAN with gradient penalty generate data of better quality than WGAN-GP while only requiring a single discriminator update per generator update (reducing the time taken for reaching the state-of-the-art by 400%), and 3) RaGANs are able to generate plausible high resolutions images (256x256) from a very small sample (N=2011), while GAN and LSGAN cannot; these images are of significantly better quality than the ones generated by WGAN-GP and SGAN with spectral normalization. △ Less

Submitted 10 September, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: https://github.com/AlexiaJM/RelativisticGAN

Showing 1–14 of 14 results for author: Jolicoeur-Martineau, A