-
Hierarchical Uncertainty Exploration via Feedforward Posterior Trees
Authors:
Elias Nehme,
Rotem Mulayoff,
Tomer Michaeli
Abstract:
When solving ill-posed inverse problems, one often desires to explore the space of potential solutions rather than be presented with a single plausible reconstruction. Valuable insights into these feasible solutions and their associated probabilities are embedded in the posterior distribution. However, when confronted with data of high dimensionality (such as images), visualizing this distribution…
▽ More
When solving ill-posed inverse problems, one often desires to explore the space of potential solutions rather than be presented with a single plausible reconstruction. Valuable insights into these feasible solutions and their associated probabilities are embedded in the posterior distribution. However, when confronted with data of high dimensionality (such as images), visualizing this distribution becomes a formidable challenge, necessitating the application of effective summarization techniques before user examination. In this work, we introduce a new approach for visualizing posteriors across multiple levels of granularity using tree-valued predictions. Our method predicts a tree-valued hierarchical summarization of the posterior distribution for any input measurement, in a single forward pass of a neural network. We showcase the efficacy of our approach across diverse datasets and image restoration challenges, highlighting its prowess in uncertainty quantification and visualization. Our findings reveal that our method performs comparably to a baseline that hierarchically clusters samples from a diffusion-based posterior sampler, yet achieves this with orders of magnitude greater speed.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Spectral Discovery of Jointly Smooth Features for Multimodal Data
Authors:
Felix Dietrich,
Or Yair,
Rotem Mulayoff,
Ronen Talmon,
Ioannis G. Kevrekidis
Abstract:
In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. This allows us to register measurements of the same phenomenon by heterogeneous sensors, and to reject sensor-specific noise. Our method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each separate m…
▽ More
In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. This allows us to register measurements of the same phenomenon by heterogeneous sensors, and to reject sensor-specific noise. Our method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each separate manifold. Then, we apply a spectral method to the obtained subspaces and discover functions that are jointly smooth on all manifolds. We show analytically that our method is guaranteed to provide a set of orthogonal functions that are as jointly smooth as possible, ordered by increasing Dirichlet energy from the smoothest to the least smooth. In addition, we show that the extracted functions can be efficiently extended to unseen data using the Nyström method. We demonstrate the proposed method on both simulated and real measured data and compare the results to nonlinear variants of the seminal Canonical Correlation Analysis (CCA). Particularly, we show superior results for sleep stage identification. In addition, we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.
△ Less
Submitted 29 April, 2021; v1 submitted 9 April, 2020;
originally announced April 2020.
-
Unique Properties of Flat Minima in Deep Networks
Authors:
Rotem Mulayoff,
Tomer Michaeli
Abstract:
It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In deep neural network training, this mechanism serves to screen out minima. However, the precise effect that this has on the trained network is not yet fully understood. In this paper, we characterize the flat minima in linear neural networks trained with a quadratic loss. First, we show that linear ResN…
▽ More
It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In deep neural network training, this mechanism serves to screen out minima. However, the precise effect that this has on the trained network is not yet fully understood. In this paper, we characterize the flat minima in linear neural networks trained with a quadratic loss. First, we show that linear ResNets with zero initialization necessarily converge to the flattest of all minima. We then prove that these minima correspond to nearly balanced networks whereby the gain from the input to any intermediate representation does not change drastically from one layer to the next. Finally, we show that consecutive layers in flat minima solutions are coupled. That is, one of the left singular vectors of each weight matrix, equals one of the right singular vectors of the next matrix. This forms a distinct path from input to output, that, as we show, is dedicated to the signal that experiences the largest gain end-to-end. Experiments indicate that these properties are characteristic of both linear and nonlinear models trained in practice.
△ Less
Submitted 8 August, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.