Generalization in diffusion models arises from geometry-adaptive harmonic representations

Kadkhodaie, Zahra; Guth, Florentin; Simoncelli, Eero P.; Mallat, Stéphane

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.02557 (cs)

[Submitted on 4 Oct 2023 (v1), last revised 12 Apr 2024 (this version, v3)]

Title:Generalization in diffusion models arises from geometry-adaptive harmonic representations

Authors:Zahra Kadkhodaie, Florentin Guth, Eero P. Simoncelli, Stéphane Mallat

View PDF HTML (experimental)

Abstract:Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. These impressive capabilities seem to imply an escape from the curse of dimensionality, but recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two DNNs trained on non-overlap** subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. In this regime of strong generalization, diffusion-generated images are distinct from the training set, and are of high visual quality, suggesting that the inductive biases of the DNNs are well-aligned with the data density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.

Comments:	Accepted for oral presentation at ICLR, Vienna, May 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2310.02557 [cs.CV]
	(or arXiv:2310.02557v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.02557

Submission history

From: Florentin Guth [view email]
[v1] Wed, 4 Oct 2023 03:30:32 UTC (2,846 KB)
[v2] Fri, 15 Mar 2024 18:21:48 UTC (6,790 KB)
[v3] Fri, 12 Apr 2024 15:48:47 UTC (6,797 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generalization in diffusion models arises from geometry-adaptive harmonic representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generalization in diffusion models arises from geometry-adaptive harmonic representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators