Search | arXiv e-print repository

GLIMPSE: Generalized Local Imaging with MLPs

Authors: AmirEhsan Khorashadizadeh, Valentin Debarnot, Tianlin Liu, Ivan Dokmanić

Abstract: Deep learning is the current de facto state of the art in tomographic imaging. A common approach is to feed the result of a simple inversion, for example the backprojection, to a convolutional neural network (CNN) which then computes the reconstruction. Despite strong results on 'in-distribution' test data similar to the training data, backprojection from sparse-view data delocalizes singularities… ▽ More Deep learning is the current de facto state of the art in tomographic imaging. A common approach is to feed the result of a simple inversion, for example the backprojection, to a convolutional neural network (CNN) which then computes the reconstruction. Despite strong results on 'in-distribution' test data similar to the training data, backprojection from sparse-view data delocalizes singularities, so these approaches require a large receptive field to perform well. As a consequence, they overfit to certain global structures which leads to poor generalization on out-of-distribution (OOD) samples. Moreover, their memory complexity and training time scale unfavorably with image resolution, making them impractical for application at realistic clinical resolutions, especially in 3D: a standard U-Net requires a substantial 140GB of memory and 2600 seconds per epoch on a research-grade GPU when training on 1024x1024 images. In this paper, we introduce GLIMPSE, a local processing neural network for computed tomography which reconstructs a pixel value by feeding only the measurements associated with the neighborhood of the pixel to a simple MLP. While achieving comparable or better performance with successful CNNs like the U-Net on in-distribution test data, GLIMPSE significantly outperforms them on OOD samples while maintaining a memory footprint almost independent of image resolution; 5GB memory suffices to train on 1024x1024 images. Further, we built GLIMPSE to be fully differentiable, which enables feats such as recovery of accurate projection angles if they are out of calibration. △ Less

Submitted 20 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Comments: 12 pages, 10 figures

arXiv:2310.00987 [pdf, other]

A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

Authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmanić, Anastasis Kratsios, David Belius

Abstract: Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regres… ▽ More Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR. Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters. △ Less

Submitted 3 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2307.07572 [pdf, other]

High-Rate Phase Association with Travel Time Neural Fields

Authors: Cheng Shi, Maarten V. de Hoop, Ivan Dokmanić

Abstract: Our understanding of regional seismicity from multi-station seismograms relies on the ability to associate arrival phases with their originating earthquakes. Deep-learning-based phase detection now detects small, high-rate arrivals from seismicity clouds, even at negative magnitudes. This new data could give important insight into earthquake dynamics, but it is presents a challenging association t… ▽ More Our understanding of regional seismicity from multi-station seismograms relies on the ability to associate arrival phases with their originating earthquakes. Deep-learning-based phase detection now detects small, high-rate arrivals from seismicity clouds, even at negative magnitudes. This new data could give important insight into earthquake dynamics, but it is presents a challenging association task. Existing techniques relying on coarsely approximated, fixed wave speed models fail in this unexplored dense regime where the complexity of unknown wave speed cannot be ignored. We introduce Harpa, a high-rate association framework built on deep generative modeling and neural fields. Harpa incorporates wave physics by using optimal transport to compare arrival sequences. It is thus robust to unknown wave speeds and estimates the wave speed model as a by-product of association. Experiments with realistic, complex synthetic models show that Harpa is the first seismic phase association framework which is accurate in the high-rate regime, paving the way for new avenues in exploratory Earth science and improved understanding of seismicity. △ Less

Submitted 26 March, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.06041 [pdf, other]

A Graph Dynamics Prior for Relational Inference

Authors: Liming Pan, Cheng Shi, Ivan Dokmanić

Abstract: Relational inference aims to identify interactions between parts of a dynamical system from the observed dynamics. Current state-of-the-art methods fit the dynamics with a graph neural network (GNN) on a learnable graph. They use one-step message-passing GNNs -- intuitively the right choice since non-locality of multi-step or spectral GNNs may confuse direct and indirect interactions. But the \tex… ▽ More Relational inference aims to identify interactions between parts of a dynamical system from the observed dynamics. Current state-of-the-art methods fit the dynamics with a graph neural network (GNN) on a learnable graph. They use one-step message-passing GNNs -- intuitively the right choice since non-locality of multi-step or spectral GNNs may confuse direct and indirect interactions. But the \textit{effective} interaction graph depends on the sampling rate and it is rarely localized to direct neighbors, leading to poor local optima for the one-step model. In this work, we propose a \textit{graph dynamics prior} (GDP) for relational inference. GDP constructively uses error amplification in non-local polynomial filters to steer the solution to the ground-truth graph. To deal with non-uniqueness, GDP simultaneously fits a ``shallow'' one-step model and a polynomial multi-step model with shared graph topology. Experiments show that GDP reconstructs graphs far more accurately than earlier methods, with remarkable robustness to under-sampling. Since appropriate sampling rates for unknown dynamical systems are not known a priori, this robustness makes GDP suitable for real applications in scientific machine learning. Reproducible code is available at https://github.com/DaDaCheng/GDP. △ Less

Submitted 20 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2304.12231 [pdf, other]

An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning

Authors: Anastasis Kratsios, Chong Liu, Matti Lassas, Maarten V. de Hoop, Ivan Dokmanić

Abstract: Motivated by the develo** mathematics of deep learning, we build universal functions approximators of continuous maps between arbitrary Polish metric spaces $\mathcal{X}$ and $\mathcal{Y}$ using elementary functions between Euclidean spaces as building blocks. Earlier results assume that the target space $\mathcal{Y}$ is a topological vector space. We overcome this limitation by ``randomization'… ▽ More Motivated by the develo** mathematics of deep learning, we build universal functions approximators of continuous maps between arbitrary Polish metric spaces $\mathcal{X}$ and $\mathcal{Y}$ using elementary functions between Euclidean spaces as building blocks. Earlier results assume that the target space $\mathcal{Y}$ is a topological vector space. We overcome this limitation by ``randomization'': our approximators output discrete probability measures over $\mathcal{Y}$. When $\mathcal{X}$ and $\mathcal{Y}$ are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of $\mathcal{X}$ and $\mathcal{Y}$. For barycentric $\mathcal{Y}$, including Banach spaces, $\mathbb{R}$-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to $\mathcal{Y}$-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning. △ Less

Submitted 24 July, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 14 Figures, 3 Tables, 78 Pages (Main 40, Proofs 26, Acknowledgments and References 12)

MSC Class: 41A65; 68T07; 60L50; 65N21; 46T99

arXiv:2302.14112 [pdf, other]

Injectivity of ReLU networks: perspectives from statistical physics

Authors: Antoine Maillard, Afonso S. Bandeira, David Belius, Ivan Dokmanić, Shuta Nakajima

Abstract: When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity thresh… ▽ More When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $α= \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 60 pages

arXiv:2301.03092 [pdf, other]

Deep Injective Prior for Inverse Scattering

Authors: AmirEhsan Khorashadizadeh, Vahid Khorashadizadeh, Sepehr Eskandari, Guy A. E. Vandenbosch, Ivan Dokmanić

Abstract: In electromagnetic inverse scattering, the goal is to reconstruct object permittivity using scattered waves. While deep learning has shown promise as an alternative to iterative solvers, it is primarily used in supervised frameworks which are sensitive to distribution drift of the scattered fields, common in practice. Moreover, these methods typically provide a single estimate of the permittivity… ▽ More In electromagnetic inverse scattering, the goal is to reconstruct object permittivity using scattered waves. While deep learning has shown promise as an alternative to iterative solvers, it is primarily used in supervised frameworks which are sensitive to distribution drift of the scattered fields, common in practice. Moreover, these methods typically provide a single estimate of the permittivity pattern, which may be inadequate or misleading due to noise and the ill-posedness of the problem. In this paper, we propose a data-driven framework for inverse scattering based on deep generative models. Our approach learns a low-dimensional manifold as a regularizer for recovering target permittivities. Unlike supervised methods that necessitate both scattered fields and target permittivities, our method only requires the target permittivities for training; it can then be used with any experimental setup. We also introduce a Bayesian framework for approximating the posterior distribution of the target permittivity, enabling multiple estimates and uncertainty quantification. Extensive experiments with synthetic and experimental data demonstrate that our framework outperforms traditional iterative solvers, particularly for strong scatterers, while achieving comparable reconstruction quality to state-of-the-art supervised learning methods like the U-Net. △ Less

Submitted 21 July, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

Comments: 13 pages, 11 figures

arXiv:2212.14042 [pdf, other]

FunkNN: Neural Interpolation for Functional Generation

Authors: AmirEhsan Khorashadizadeh, Anadi Chaman, Valentin Debarnot, Ivan Dokmanić

Abstract: Can we build continuous generative models which generalize across scales, can be evaluated at any coordinate, admit calculation of exact derivatives, and are conceptually simple? Existing MLP-based architectures generate worse samples than the grid-based generators with favorable convolutional inductive biases. Models that focus on generating images at different scales do better, but employ comple… ▽ More Can we build continuous generative models which generalize across scales, can be evaluated at any coordinate, admit calculation of exact derivatives, and are conceptually simple? Existing MLP-based architectures generate worse samples than the grid-based generators with favorable convolutional inductive biases. Models that focus on generating images at different scales do better, but employ complex architectures not designed for continuous evaluation of images and derivatives. We take a signal-processing perspective and treat continuous image generation as interpolation from samples. Indeed, correctly sampled discrete images contain all information about the low spatial frequencies. The question is then how to extrapolate the spectrum in a data-driven way while meeting the above design criteria. Our answer is FunkNN -- a new convolutional network which learns how to reconstruct continuous images at arbitrary coordinates and can be applied to any image dataset. Combined with a discrete generative model it becomes a functional generator which can act as a prior in continuous ill-posed inverse problems. We show that FunkNN generates high-quality continuous images and exhibits strong out-of-distribution performance thanks to its patch-based design. We further showcase its performance in several stylized inverse problems with exact spatial derivatives. △ Less

Submitted 3 April, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 17 pages, 13 figures

Journal ref: The 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2212.13069 [pdf, other]

Homophily modulates double descent generalization in graph convolution networks

Authors: Cheng Shi, Liming Pan, Hong Hu, Ivan Dokmanić

Abstract: Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental… ▽ More Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental observations of ``transductive'' double descent in key networks and datasets, we use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. Our results illuminate the nuances of learning on homophilic versus heterophilic data and predict double descent whose existence in GNNs has been questioned by recent work. We show how risk is shaped by the interplay between the graph noise, feature noise, and the number of training labels. Our findings apply beyond stylized models, capturing qualitative trends in real-world GNNs and datasets. As a case in point, we use our analytic insights to improve performance of state-of-the-art graph convolution networks on heterophilic datasets. △ Less

Submitted 23 January, 2024; v1 submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.04309 [pdf, other]

Deep Variational Inverse Scattering

Authors: AmirEhsan Khorashadizadeh, Ali Aghababaei, Tin Vlašić, Hieu Nguyen, Ivan Dokmanić

Abstract: Inverse medium scattering solvers generally reconstruct a single solution without an associated measure of uncertainty. This is true both for the classical iterative solvers and for the emerging deep learning methods. But ill-posedness and noise can make this single estimate inaccurate or misleading. While deep networks such as conditional normalizing flows can be used to sample posteriors in inve… ▽ More Inverse medium scattering solvers generally reconstruct a single solution without an associated measure of uncertainty. This is true both for the classical iterative solvers and for the emerging deep learning methods. But ill-posedness and noise can make this single estimate inaccurate or misleading. While deep networks such as conditional normalizing flows can be used to sample posteriors in inverse problems, they often yield low-quality samples and uncertainty estimates. In this paper, we propose U-Flow, a Bayesian U-Net based on conditional normalizing flows, which generates high-quality posterior samples and estimates physically-meaningful uncertainty. We show that the proposed model significantly outperforms the recent normalizing flows in terms of posterior sample quality while having comparable performance with the U-Net in point estimation. △ Less

Submitted 9 December, 2022; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: 5 pages, 5 figures

arXiv:2211.10525 [pdf, other]

Differentiable Uncalibrated Imaging

Authors: Sidharth Gupta, Konik Kothari, Valentin Debarnot, Ivan Dokmanić

Abstract: We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordin… ▽ More We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordinates. We also develop differentiable spline interpolators which perform as well as neural networks, require less time to optimize and have well-understood properties. Differentiability is key as it allows us to jointly fit a measurement representation, optimize over the uncertain measurement coordinates, and perform image reconstruction which in turn ensures consistent calibration. We apply our approach to 2D and 3D computed tomography, and show that it produces improved reconstructions compared to baselines that do not account for the lack of calibration. The flexibility of the proposed framework makes it easy to extend to almost arbitrary imaging problems. △ Less

Submitted 20 December, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

arXiv:2210.00577 [pdf, other]

Deep Invertible Approximation of Topologically Rich Maps between Manifolds

Authors: Michael Puthawala, Matti Lassas, Ivan Dokmanic, Pekka Pankka, Maarten de Hoop

Abstract: How can we design neural networks that allow for stable universal approximation of maps between topologically interesting manifolds? The answer is with a coordinate projection. Neural networks based on topological data analysis (TDA) use tools such as persistent homology to learn topological signatures of data and stabilize training but may not be universal approximators or have stable inverses. O… ▽ More How can we design neural networks that allow for stable universal approximation of maps between topologically interesting manifolds? The answer is with a coordinate projection. Neural networks based on topological data analysis (TDA) use tools such as persistent homology to learn topological signatures of data and stabilize training but may not be universal approximators or have stable inverses. Other architectures universally approximate data distributions on submanifolds but only when the latter are given by a single chart, making them unable to learn maps that change topology. By exploiting the topological parallels between locally bilipschitz maps, covering spaces, and local homeomorphisms, and by using universal approximation arguments from machine learning, we find that a novel network of the form $\mathcal{T} \circ p \circ \mathcal{E}$, where $\mathcal{E}$ is an injective network, $p$ a fixed coordinate projection, and $\mathcal{T}$ a bijective network, is a universal approximator of local diffeomorphisms between compact smooth submanifolds embedded in $\mathbb{R}^n$. We emphasize the case when the target map changes topology. Further, we find that by constraining the projection $p$, multivalued inversions of our networks can be computed without sacrificing universality. As an application, we show that learning a group invariant function with unknown group action naturally reduces to the question of learning local diffeomorphisms for finite groups. Our theory permits us to recover orbits of the group action. We also outline possible extensions of our architecture to address molecular imaging of molecules with symmetries. Finally, our analysis informs the choice of topologically expressive starting spaces in generative problems. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.06788 [pdf, other]

Small Transformers Compute Universal Metric Embeddings

Authors: Anastasis Kratsios, Valentin Debarnot, Ivan Dokmanić

Abstract: We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called \emph{probabilistic transformers}. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about… ▽ More We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called \emph{probabilistic transformers}. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about $n\log(n)$ and width about $n^2$ can bi-Hölder embed any $n$-point dataset from $\mathcal{X}$ with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees, which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If $\mathcal{X}$'s geometry is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all points in the dataset. As applications, we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs. When instead embedding into multivariate Gaussian mixtures, we show that probabilistic transformers can compute bi-Hölder embeddings with arbitrarily small distortion. △ Less

Submitted 18 October, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: 42 pages, 10 Figures, 3 Tables

MSC Class: 68T07; 30L05; 68R12; 68T30; 05C12

Journal ref: Journal of Machine Learning Research 24 (2023): 1-48

arXiv:2207.02985 [pdf, other]

Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown-View Tomography

Authors: Shuai Huang, Mona Zehni, Ivan Dokmanić, Zhizhen Zhao

Abstract: Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR… ▽ More Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR) approaches based on matrix factorization, which, while elegant, either require side information about the density that is not available, or fail to be sufficiently robust. For OMR to break free from those restrictions, we propose to jointly recover the density map and the orthogonal matrices by requiring that they be mutually consistent. We regularize the resulting non-convex optimization problem by a denoised reference projection and a nonnegativity constraint. This is enabled by the new closed-form expressions for spatial autocorrelation features. Further, we design an easy-to-compute initial density map which effectively mitigates the non-convexity of the reconstruction problem. Experimental results show that the proposed OMR with spatial consensus is more robust and performs significantly better than the previous state-of-the-art OMR approach in the typical low-SNR scenario of 3D UVT. △ Less

Submitted 10 June, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: Keywords: unknown view tomography, single-particle cryo-electron microscopy, method of moments, autocorrelation, spherical harmonics

MSC Class: 92C55; 68U10; 33C55; 78M05

arXiv:2206.02027 [pdf, other]

Implicit Neural Representation for Mesh-Free Inverse Obstacle Scattering

Authors: Tin Vlašić, Hieu Nguyen, AmirEhsan Khorashadizadeh, Ivan Dokmanić

Abstract: Implicit representation of shapes as level sets of multilayer perceptrons has recently flourished in different shape analysis, compression, and reconstruction tasks. In this paper, we introduce an implicit neural representation-based framework for solving the inverse obstacle scattering problem in a mesh-free fashion. We express the obstacle shape as the zero-level set of a signed distance functio… ▽ More Implicit representation of shapes as level sets of multilayer perceptrons has recently flourished in different shape analysis, compression, and reconstruction tasks. In this paper, we introduce an implicit neural representation-based framework for solving the inverse obstacle scattering problem in a mesh-free fashion. We express the obstacle shape as the zero-level set of a signed distance function which is implicitly determined by network parameters. To solve the direct scattering problem, we implement the implicit boundary integral method. It uses projections of the grid points in the tubular neighborhood onto the boundary to compute the PDE solution directly in the level-set framework. The proposed implicit representation conveniently handles the shape perturbation in the optimization process. To update the shape, we use PyTorch's automatic differentiation to backpropagate the loss function w.r.t. the network parameters, allowing us to avoid complex and error-prone manual derivation of the shape derivative. Additionally, we propose a deep generative model of implicit neural shape representations that can fit into the framework. The deep generative model effectively regularizes the inverse obstacle scattering problem, making it more tractable and robust, while yielding high-quality reconstruction results even in noise-corrupted setups. △ Less

Submitted 4 December, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: 6 pages, 8 figures, to be published in 2022 Asilomar Conference on Signals, Systems, and Computers

Journal ref: 2022 Asilomar Conference on Signals, Systems, and Computers

arXiv:2204.07664 [pdf, other]

doi 10.1109/TCI.2023.3248949

Conditional Injective Flows for Bayesian Imaging

Authors: AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, Ivan Dokmanić

Abstract: Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns… ▽ More Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns. Recent variational inference methods based on conditional normalizing flows are a promising alternative to traditional MCMC methods, but they come with drawbacks: excessive memory and compute demands for moderate to high resolution images and underwhelming performance on hard nonlinear problems. In this work, we propose C-Trumpets -- conditional injective flows specifically designed for imaging problems, which greatly diminish these challenges. Injectivity reduces memory footprint and training time while low-dimensional latent space together with architectural innovations like fixed-volume-change layers and skip-connection revnet layers, C-Trumpets outperform regular conditional flow models on a variety of imaging and image restoration tasks, including limited-view CT and nonlinear inverse scattering, with a lower compute and memory budget. C-Trumpets enable fast approximation of point estimates like MMSE or MAP as well as physically-meaningful uncertainty quantification. △ Less

Submitted 3 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 23 pages, 23 figures

Journal ref: IEEE Transactions on Computational Imaging, vol. 9, pp. 224-237, 2023

arXiv:2110.04375 [pdf, other]

Neural Link Prediction with Walk Pooling

Authors: Liming Pan, Cheng Shi, Ivan Dokmanić

Abstract: Graph neural networks achieve high accuracy in link prediction by jointly leveraging graph topology and node attributes. Topology, however, is represented indirectly; state-of-the-art methods based on subgraph classification label nodes with distance to the target link, so that, although topological information is present, it is tempered by pooling. This makes it challenging to leverage features l… ▽ More Graph neural networks achieve high accuracy in link prediction by jointly leveraging graph topology and node attributes. Topology, however, is represented indirectly; state-of-the-art methods based on subgraph classification label nodes with distance to the target link, so that, although topological information is present, it is tempered by pooling. This makes it challenging to leverage features like loops and motifs associated with network formation mechanisms. We propose a link prediction algorithm based on a new pooling scheme called WalkPool. WalkPool combines the expressivity of topological heuristics with the feature-learning ability of neural networks. It summarizes a putative link by random walk probabilities of adjacent paths. Instead of extracting transition probabilities from the original graph, it computes the transition matrix of a "predictive" latent graph by applying attention to learned features; this may be interpreted as feature-sensitive topology fingerprinting. WalkPool can leverage unsupervised node features or be combined with GNNs and trained end-to-end. It outperforms state-of-the-art methods on all common link prediction benchmarks, both homophilic and heterophilic, with and without node attributes. Applying WalkPool to a set of unsupervised GNNs significantly improves prediction accuracy, suggesting that it may be used as a general-purpose graph pooling scheme. △ Less

Submitted 16 March, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

arXiv:2110.04227 [pdf, other]

Universal Joint Approximation of Manifolds and Densities by Simple Injective Flows

Authors: Michael Puthawala, Matti Lassas, Ivan Dokmanić, Maarten de Hoop

Abstract: We study approximation of probability measures supported on $n$-dimensional manifolds embedded in $\mathbb{R}^m$ by injective flows -- neural networks composed of invertible flows and injective layers. We show that in general, injective flows between $\mathbb{R}^n$ and $\mathbb{R}^m$ universally approximate measures supported on images of extendable embeddings, which are a subset of standard embed… ▽ More We study approximation of probability measures supported on $n$-dimensional manifolds embedded in $\mathbb{R}^m$ by injective flows -- neural networks composed of invertible flows and injective layers. We show that in general, injective flows between $\mathbb{R}^n$ and $\mathbb{R}^m$ universally approximate measures supported on images of extendable embeddings, which are a subset of standard embeddings: when the embedding dimension m is small, topological obstructions may preclude certain manifolds as admissible targets. When the embedding dimension is sufficiently large, $m \ge 3n+1$, we use an argument from algebraic topology known as the clean trick to prove that the topological obstructions vanish and injective flows universally approximate any differentiable embedding. Along the way we show that the studied injective flows admit efficient projections on the range, and that their optimality can be established "in reverse," resolving a conjecture made in Brehmer and Cranmer 2020. △ Less

Submitted 27 June, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: 26 pages, 5 figures

arXiv:2110.03303 [pdf, other]

Universal Approximation Under Constraints is Possible with Transformers

Authors: Anastasis Kratsios, Behnoosh Zamanlooy, Tianlin Liu, Ivan Dokmanić

Abstract: Many practical problems need the output of a machine learning model to satisfy a set of constraints, $K$. Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set $K$ and a… ▽ More Many practical problems need the output of a machine learning model to satisfy a set of constraints, $K$. Nevertheless, there is no known guarantee that classical neural network architectures can exactly encode constraints while simultaneously achieving universality. We provide a quantitative constrained universal approximation theorem which guarantees that for any non-convex compact set $K$ and any continuous function $f:\mathbb{R}^n\rightarrow K$, there is a probabilistic transformer $\hat{F}$ whose randomized outputs all lie in $K$ and whose expected output uniformly approximates $f$. Our second main result is a "deep neural version" of Berge's Maximum Theorem (1963). The result guarantees that given an objective function $L$, a constraint set $K$, and a family of soft constraint sets, there is a probabilistic transformer $\hat{F}$ that approximately minimizes $L$ and whose outputs belong to $K$; moreover, $\hat{F}$ approximately satisfies the soft constraints. Our results imply the first universal approximation theorem for classical transformers with exact convex constraint satisfaction. They also yield that a chart-free universal approximation theorem for Riemannian manifold-valued functions subject to suitable geodesically convex constraints. △ Less

Submitted 8 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: 9.5 Pages + 14 Page Append + References, 3 Tables, 5 Figures

MSC Class: 68T07; 41A65; 41A29; 51F99

Journal ref: ICLR 2022 (Spotlight)

arXiv:2105.04040 [pdf, other]

Truly shift-equivariant convolutional neural networks with adaptive polyphase upsampling

Authors: Anadi Chaman, Ivan Dokmanić

Abstract: Convolutional neural networks lack shift equivariance due to the presence of downsampling layers. In image classification, adaptive polyphase downsampling (APS-D) was recently proposed to make CNNs perfectly shift invariant. However, in networks used for image reconstruction tasks, it can not by itself restore shift equivariance. We address this problem by proposing adaptive polyphase upsampling (… ▽ More Convolutional neural networks lack shift equivariance due to the presence of downsampling layers. In image classification, adaptive polyphase downsampling (APS-D) was recently proposed to make CNNs perfectly shift invariant. However, in networks used for image reconstruction tasks, it can not by itself restore shift equivariance. We address this problem by proposing adaptive polyphase upsampling (APS-U), a non-linear extension of conventional upsampling, which allows CNNs with symmetric encoder-decoder architecture (for example U-Net) to exhibit perfect shift equivariance. With MRI and CT reconstruction experiments, we show that networks containing APS-D/U layers exhibit state of the art equivariance performance without sacrificing on image reconstruction quality. In addition, unlike prior methods like data augmentation and anti-aliasing, the gains in equivariance obtained from APS-D/U also extend to images outside the training distribution. △ Less

Submitted 6 December, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

arXiv:2102.10461 [pdf, other]

Trumpets: Injective Flows for Inference and Inverse Problems

Authors: Konik Kothari, AmirEhsan Khorashadizadeh, Maarten de Hoop, Ivan Dokmanić

Abstract: We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows… ▽ More We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows such as training based on maximum likelihood and a fast, exact inverse of the generator. Since Trumpets are injective and have fast inverses, they can be effectively used for downstream Bayesian inference. To wit, we use Trumpet priors for maximum a posteriori estimation in the context of image reconstruction from compressive measurements, outperforming competitive baselines in terms of reconstruction quality and speed. We then propose an efficient method for posterior characterization and uncertainty quantification with Trumpets by taking advantage of the low-dimensional latent space. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 16 pages

Journal ref: Uncertainty in Artificial Intelligence (UAI 2021)

arXiv:2011.14214 [pdf, other]

Truly shift-invariant convolutional neural networks

Authors: Anadi Chaman, Ivan Dokmanić

Abstract: Thanks to the use of convolution and pooling layers, convolutional neural networks were for a long time thought to be shift-invariant. However, recent works have shown that the output of a CNN can change significantly with small shifts in input: a problem caused by the presence of downsampling (stride) layers. The existing solutions rely either on data augmentation or on anti-aliasing, both of whi… ▽ More Thanks to the use of convolution and pooling layers, convolutional neural networks were for a long time thought to be shift-invariant. However, recent works have shown that the output of a CNN can change significantly with small shifts in input: a problem caused by the presence of downsampling (stride) layers. The existing solutions rely either on data augmentation or on anti-aliasing, both of which have limitations and neither of which enables perfect shift invariance. Additionally, the gains obtained from these methods do not extend to image patterns not seen during training. To address these challenges, we propose adaptive polyphase sampling (APS), a simple sub-sampling scheme that allows convolutional neural networks to achieve 100% consistency in classification performance under shifts, without any loss in accuracy. With APS, the networks exhibit perfect consistency to shifts even before training, making it the first approach that makes convolutional neural networks truly shift-invariant. △ Less

Submitted 30 March, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

arXiv:2011.12815 [pdf, other]

Learning Multiscale Convolutional Dictionaries for Image Reconstruction

Authors: Tianlin Liu, Anadi Chaman, David Belius, Ivan Dokmanić

Abstract: Convolutional neural networks (CNNs) have been tremendously successful in solving imaging inverse problems. To understand their success, an effective strategy is to construct simpler and mathematically more tractable convolutional sparse coding (CSC) models that share essential ingredients with CNNs. Existing CSC methods, however, underperform leading CNNs in challenging inverse problems. We hypot… ▽ More Convolutional neural networks (CNNs) have been tremendously successful in solving imaging inverse problems. To understand their success, an effective strategy is to construct simpler and mathematically more tractable convolutional sparse coding (CSC) models that share essential ingredients with CNNs. Existing CSC methods, however, underperform leading CNNs in challenging inverse problems. We hypothesize that the performance gap may be attributed in part to how they process images at different spatial scales: While many CNNs use multiscale feature representations, existing CSC models mostly rely on single-scale dictionaries. To close the performance gap, we thus propose a multiscale convolutional dictionary structure. The proposed dictionary structure is derived from the U-Net, arguably the most versatile and widely used CNN for image-to-image learning problems. We show that incorporating the proposed multiscale dictionary in an otherwise standard CSC framework yields performance competitive with state-of-the-art CNNs across a range of challenging inverse problems including CT and MRI reconstruction. Our work thus demonstrates the effectiveness and scalability of the multiscale CSC approach in solving challenging inverse problems. △ Less

Submitted 19 May, 2022; v1 submitted 25 November, 2020; originally announced November 2020.

arXiv:2006.09858 [pdf, other]

Geometry of Similarity Comparisons

Authors: Puoya Tabaghi, Jianhao Peng, Olgica Milenkovic, Ivan Dokmanić

Abstract: Many data analysis problems can be cast as distance geometry problems in \emph{space forms} -- Euclidean, spherical, or hyperbolic spaces. Often, absolute distance measurements are often unreliable or simply unavailable and only proxies to absolute distances in the form of similarities are available. Hence we ask the following: Given only \emph{comparisons} of similarities amongst a set of entitie… ▽ More Many data analysis problems can be cast as distance geometry problems in \emph{space forms} -- Euclidean, spherical, or hyperbolic spaces. Often, absolute distance measurements are often unreliable or simply unavailable and only proxies to absolute distances in the form of similarities are available. Hence we ask the following: Given only \emph{comparisons} of similarities amongst a set of entities, what can be said about the geometry of the underlying space form? To study this question, we introduce the notions of the \textit{ordinal capacity} of a target space form and \emph{ordinal spread} of the similarity measurements. The latter is an indicator of complex patterns in the measurements, while the former quantifies the capacity of a space form to accommodate a set of measurements with a specific ordinal spread profile. We prove that the ordinal capacity of a space form is related to its dimension and the sign of its curvature. This leads to a lower bound on the Euclidean and spherical embedding dimension of what we term similarity graphs. More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form. We support our theoretical claims with experiments on weighted trees, single-cell RNA expression data and spherical cartographic measurements. △ Less

Submitted 28 July, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2006.08464 [pdf, other]

Globally Injective ReLU Networks

Authors: Michael Puthawala, Konik Kothari, Matti Lassas, Ivan Dokmanić, Maarten de Hoop

Abstract: Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and s… ▽ More Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and sufficient for injectivity by constructing appropriate weight matrices. We show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires larger expansivity between 3.4 and 10.5. We also characterize the stability of inverting an injective network via worst-case Lipschitz constants of the inverse. We then use arguments from differential topology to study injectivity of deep networks and prove that any Lipschitz map can be approximated by an injective ReLU network. Finally, using an argument based on random projections, we show that an end-to-end -- rather than layerwise -- doubling of the dimension suffices for injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks. △ Less

Submitted 8 October, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 48 pages, 18 figures, submitted to JMLR

arXiv:2006.05854 [pdf, other]

Learning the geometry of wave-based imaging

Authors: Konik Kothari, Maarten de Hoop, Ivan Dokmanić

Abstract: We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an inter… ▽ More We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable neural architecture inspired by Fourier integral operators (FIOs) which approximate the wave physics. FIOs model a wide range of imaging modalities, from seismology and radar to Doppler and ultrasound. We focus on learning the geometry of wave propagation captured by FIOs, which is implicit in the data, via a loss based on optimal transport. The proposed FIONet performs significantly better than the usual baselines on a number of imaging inverse problems, especially in out-of-distribution tests. △ Less

Submitted 10 November, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted as spotlight presentation to NeurIPS '20

arXiv:2005.08672 [pdf, other]

Hyperbolic Distance Matrices

Authors: Puoya Tabaghi, Ivan Dokmanić

Abstract: Hyperbolic space is a natural setting for mining and visualizing data with hierarchical structure. In order to compute a hyperbolic embedding from comparison or similarity information, one has to solve a hyperbolic distance geometry problem. In this paper, we propose a unified framework to compute hyperbolic embeddings from an arbitrary mix of noisy metric and non-metric data. Our algorithms are b… ▽ More Hyperbolic space is a natural setting for mining and visualizing data with hierarchical structure. In order to compute a hyperbolic embedding from comparison or similarity information, one has to solve a hyperbolic distance geometry problem. In this paper, we propose a unified framework to compute hyperbolic embeddings from an arbitrary mix of noisy metric and non-metric data. Our algorithms are based on semidefinite programming and the notion of a hyperbolic distance matrix, in many ways parallel to its famous Euclidean counterpart. A central ingredient we put forward is a semidefinite characterization of the hyperbolic Gramian -- a matrix of Lorentzian inner products. This characterization allows us to formulate a semidefinite relaxation to efficiently compute hyperbolic embeddings in two stages: first, we complete and denoise the observed hyperbolic distance matrix; second, we propose a spectral factorization method to estimate the embedded points from the hyperbolic distance matrix. We show through numerical experiments how the flexibility to mix metric and non-metric constraints allows us to efficiently compute embeddings from arbitrary data. △ Less

Submitted 11 September, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

arXiv:1910.03749 [pdf, other]

The fastest $\ell_{1,\infty}$ prox in the west

Authors: Benjamín Béjar, Ivan Dokmanić, René Vidal

Abstract: Proximal operators are of particular interest in optimization problems dealing with non-smooth objectives because in many practical cases they lead to optimization algorithms whose updates can be computed in closed form or very efficiently. A well-known example is the proximal operator of the vector $\ell_1$ norm, which is given by the soft-thresholding operator. In this paper we study the proxima… ▽ More Proximal operators are of particular interest in optimization problems dealing with non-smooth objectives because in many practical cases they lead to optimization algorithms whose updates can be computed in closed form or very efficiently. A well-known example is the proximal operator of the vector $\ell_1$ norm, which is given by the soft-thresholding operator. In this paper we study the proximal operator of the mixed $\ell_{1,\infty}$ matrix norm and show that it can be computed in closed form by applying the well-known soft-thresholding operator to each column of the matrix. However, unlike the vector $\ell_1$ norm case where the threshold is constant, in the mixed $\ell_{1,\infty}$ norm case each column of the matrix might require a different threshold and all thresholds depend on the given matrix. We propose a general iterative algorithm for computing these thresholds, as well as two efficient implementations that further exploit easy to compute lower bounds for the mixed norm of the optimal solution. Experiments on large-scale synthetic and real data indicate that the proposed methods can be orders of magnitude faster than state-of-the-art methods. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 9 pages, 2 figures, journal

arXiv:1907.01703 [pdf, other]

Don't take it lightly: Phasing optical random projections with unknown operators

Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić

Abstract: In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\mathbfξ \in \mathbb{R}^N$ is mixed by a random sca… ▽ More In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\mathbfξ \in \mathbb{R}^N$ is mixed by a random scattering medium to compute the projection $\mathbf{y} = \mathbf{A} \mathbfξ$, with $\mathbf{A} \in \mathbb{C}^{M \times N}$ being a realization of a standard complex Gaussian iid random matrix. Such optics-based matrix multiplications can be much faster and energy-efficient than their CPU or GPU counterparts, yet two difficulties must be resolved: only the intensity ${|\mathbf{y}|}^2$ can be recorded by the camera, and the transmission matrix $\mathbf{A}$ is unknown. We show that even without knowing $\mathbf{A}$, we can recover the unknown phase of $\mathbf{y}$ for some equivalent transmission matrix with the same distribution as $\mathbf{A}$. Our method is based on two observations: first, conjugating or changing the phase of any row of $\mathbf{A}$ does not change its distribution; and second, since we control the input we can interfere $\mathbfξ$ with arbitrary reference signals. We show how to leverage these observations to cast the measurement phase retrieval problem as a Euclidean distance geometry problem. We demonstrate appealing properties of the proposed algorithm in both numerical simulations and real hardware experiments. Not only does our algorithm accurately recover the missing phase, but it mitigates the effects of quantization and the sensitivity threshold, thus improving the measured magnitudes. △ Less

Submitted 13 February, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

arXiv:1902.09959 [pdf, other]

doi 10.1109/TSP.2020.2982780

Shapes from Echoes: Uniqueness from Point-to-Plane Distance Matrices

Authors: Miranda Krekovic, Ivan Dokmanic, Martin Vetterli

Abstract: We study the problem of localizing a configuration of points and planes from the collection of point-to-plane distances. This problem models simultaneous localization and map** from acoustic echoes as well as the notable "structure from sound" approach to microphone localization with unknown sources. In our earlier work we proposed computational methods for localization from point-to-plane dista… ▽ More We study the problem of localizing a configuration of points and planes from the collection of point-to-plane distances. This problem models simultaneous localization and map** from acoustic echoes as well as the notable "structure from sound" approach to microphone localization with unknown sources. In our earlier work we proposed computational methods for localization from point-to-plane distances and noted that such localization suffers from various ambiguities beyond the usual rigid body motions; in this paper we provide a complete characterization of uniqueness. We enumerate equivalence classes of configurations which lead to the same distance measurements as a function of the number of planes and points, and algebraically characterize the related transformations in both 2D and 3D. Here we only discuss uniqueness; computational tools and heuristics for practical localization from point-to-plane distances using sound will be addressed in a companion paper. △ Less

Submitted 19 February, 2019; originally announced February 2019.

Comments: 13 pages, 13 figures

arXiv:1902.05612 [pdf, other]

doi 10.1109/TSP.2020.3011016

Solving Complex Quadratic Systems with Full-Rank Random Matrices

Authors: Shuai Huang, Sidharth Gupta, Ivan Dokmanić

Abstract: We tackle the problem of recovering a complex signal $\boldsymbol x\in\mathbb{C}^n$ from quadratic measurements of the form $y_i=\boldsymbol x^*\boldsymbol A_i\boldsymbol x$, where $\boldsymbol A_i$ is a full-rank, complex random measurement matrix whose entries are generated from a rotation-invariant sub-Gaussian distribution. We formulate it as the minimization of a nonconvex loss. This problem… ▽ More We tackle the problem of recovering a complex signal $\boldsymbol x\in\mathbb{C}^n$ from quadratic measurements of the form $y_i=\boldsymbol x^*\boldsymbol A_i\boldsymbol x$, where $\boldsymbol A_i$ is a full-rank, complex random measurement matrix whose entries are generated from a rotation-invariant sub-Gaussian distribution. We formulate it as the minimization of a nonconvex loss. This problem is related to the well understood phase retrieval problem where the measurement matrix is a rank-1 positive semidefinite matrix. Here we study the general full-rank case which models a number of key applications such as molecular geometry recovery from distance distributions and compound measurements in phaseless diffractive imaging. Most prior works either address the rank-1 case or focus on real measurements. The several papers that address the full-rank complex case adopt the computationally-demanding semidefinite relaxation approach. In this paper we prove that the general class of problems with rotation-invariant sub-Gaussian measurement models can be efficiently solved with high probability via the standard framework comprising a spectral initialization followed by iterative Wirtinger flow updates on a nonconvex loss. Numerical experiments on simulated data corroborate our theoretical analysis. △ Less

Submitted 25 April, 2021; v1 submitted 14 February, 2019; originally announced February 2019.

Comments: This updated version of the manuscript addresses several important issues in the initial arXiv submission

Journal ref: IEEE Transactions on Signal Processing, Vol. 68, 4782-4796, 2020

arXiv:1901.10076 [pdf, other]

Learning Schatten--von Neumann Operators

Authors: Puoya Tabaghi, Maarten de Hoop, Ivan Dokmanić

Abstract: We study the learnability of a class of compact operators known as Schatten--von Neumann operators. These operators between infinite-dimensional function spaces play a central role in a variety of applications in learning theory and inverse problems. We address the question of sample complexity of learning Schatten-von Neumann operators and provide an upper bound on the number of measurements requ… ▽ More We study the learnability of a class of compact operators known as Schatten--von Neumann operators. These operators between infinite-dimensional function spaces play a central role in a variety of applications in learning theory and inverse problems. We address the question of sample complexity of learning Schatten-von Neumann operators and provide an upper bound on the number of measurements required for the empirical risk minimizer to generalize with arbitrary precision and probability, as a function of class parameter $p$. Our results give generalization guarantees for regression of infinite-dimensional signals from infinite-dimensional data. Next, we adapt the representer theorem of Abernethy \emph{et al.} to show that empirical risk minimization over an a priori infinite-dimensional, non-compact set, can be converted to a convex finite dimensional optimization problem over a compact set. In summary, the class of $p$-Schatten--von Neumann operators is probably approximately correct (PAC)-learnable via a practical convex program for any $p < \infty$. △ Less

Submitted 22 February, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

arXiv:1812.00498 [pdf, other]

doi 10.1109/LSP.2019.2908505

Permutations Unlabeled beyond Sampling Unknown

Authors: Ivan Dokmanić

Abstract: A recent unlabeled sampling result by Unnikrishnan, Haghighatshoar and Vetterli states that with probability one over iid Gaussian matrices $A$, any $x$ can be uniquely recovered from an unknown permutation of $y = A x$ as soon as $A$ has at least twice as many rows as columns. We show that this condition on $A$ implies something much stronger: that an unknown vector $x$ can be recovered from meas… ▽ More A recent unlabeled sampling result by Unnikrishnan, Haghighatshoar and Vetterli states that with probability one over iid Gaussian matrices $A$, any $x$ can be uniquely recovered from an unknown permutation of $y = A x$ as soon as $A$ has at least twice as many rows as columns. We show that this condition on $A$ implies something much stronger: that an unknown vector $x$ can be recovered from measurements $y = T A x$, when the unknown $T$ belongs to an arbitrary set of invertible, diagonalizable linear transformations $\mathcal{T}$. The set $\mathcal{T}$ can be finite or countably infinite. When it is the set of $m \times m$ permutation matrices, we have the classical unlabeled sampling problem. We show that for almost all $A$ with at least twice as many rows as columns, all $x$ can be recovered either uniquely, or up to a scale depending on $\mathcal{T}$, and that the condition on the size of $A$ is necessary. Our proof is based on vector space geometry. Specializing to permutations we obtain a simplified proof of the uniqueness result of Unnikrishnan, Haghighatshoar and Vetterli. In this letter we are only concerned with uniqueness; stability and algorithms are left for future work. △ Less

Submitted 13 February, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

arXiv:1811.07065 [pdf, other]

Multipath-enabled private audio with noise

Authors: Anadi Chaman, Yu-Jeh Liu, Jonah Casebeer, Ivan Dokmanić

Abstract: We address the problem of privately communicating audio messages to multiple listeners in a reverberant room using a set of loudspeakers. We propose two methods based on emitting noise. In the first method, the loudspeakers emit noise signals that are appropriately filtered so that after echoing along multiple paths in the room, they sum up and descramble to yield distinct meaningful audio message… ▽ More We address the problem of privately communicating audio messages to multiple listeners in a reverberant room using a set of loudspeakers. We propose two methods based on emitting noise. In the first method, the loudspeakers emit noise signals that are appropriately filtered so that after echoing along multiple paths in the room, they sum up and descramble to yield distinct meaningful audio messages only at specific focusing spots, while being incoherent everywhere else. In the second method, adapted from wireless communications, we project noise signals onto the nullspace of the MIMO channel matrix between the loudspeakers and listeners. Loudspeakers reproduce a sum of the projected noise signals and intended messages. Again because of echoes, the MIMO nullspace changes across different locations in the room. Thus, the listeners at focusing spots hear intended messages, while the acoustic channel of an eavesdropper at any other location is jammed. We show, using both numerical and real experiments, that with a small number of speakers and a few impulse response measurements, audio messages can indeed be communicated to a set of listeners while ensuring negligible intelligibility elsewhere. △ Less

Submitted 13 March, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

arXiv:1810.07921 [pdf, other]

Concentration of the Frobenius norm of generalized matrix inverses

Authors: Ivan Dokmanić, Rémi Gribonval

Abstract: In many applications it is useful to replace the Moore-Penrose pseudoinverse (MPP) by a different generalized inverse with more favorable properties. We may want, for example, to have many zero entries, but without giving up too much of the stability of the MPP. One way to quantify stability is by how much the Frobenius norm of a generalized inverse exceeds that of the MPP. In this paper we derive… ▽ More In many applications it is useful to replace the Moore-Penrose pseudoinverse (MPP) by a different generalized inverse with more favorable properties. We may want, for example, to have many zero entries, but without giving up too much of the stability of the MPP. One way to quantify stability is by how much the Frobenius norm of a generalized inverse exceeds that of the MPP. In this paper we derive finite-size concentration bounds for the Frobenius norm of $\ell^p$-minimal general inverses of iid Gaussian matrices, with $1 \leq p \leq 2$. For $p = 1$ we prove exponential concentration of the Frobenius norm of the sparse pseudoinverse; for $p = 2$, we get a similar concentration bound for the MPP. Our proof is based on the convex Gaussian min-max theorem, but unlike previous applications which give asymptotic results, we derive finite-size bounds. △ Less

Submitted 23 November, 2018; v1 submitted 18 October, 2018; originally announced October 2018.

Comments: Revised/condensed/renamed version of arXiv:1706.08701

arXiv:1809.05862 [pdf, other]

Cocktails, but no party: multipath-enabled private audio

Authors: Yu-Jeh Liu, Jonah Casebeer, Ivan Dokmanić

Abstract: We describe a private audio messaging system that uses echoes to unscramble messages at a few predetermined locations in a room. The system works by splitting the audio into short chunks and emitting them from different loudspeakers. The chunks are filtered so that as they echo around the room, they sum to noise everywhere except at a few chosen focusing spots where they exactly reproduce the inte… ▽ More We describe a private audio messaging system that uses echoes to unscramble messages at a few predetermined locations in a room. The system works by splitting the audio into short chunks and emitting them from different loudspeakers. The chunks are filtered so that as they echo around the room, they sum to noise everywhere except at a few chosen focusing spots where they exactly reproduce the intended messages. Unlike in the case of standard personal audio zones, the proposed method renders sound outside the focusing spots unintelligible. Our method essentially depends on echoes: the room acts as a mixing system such that at given points we get the desired output. Finally, we only require a modest number of loudspeakers and only a few impulse response measurements at points where the messages should be delivered. We demonstrate the effectiveness of the proposed method via objective quantitative metrics as well as informal listening experiments in a real room. △ Less

Submitted 16 September, 2018; originally announced September 2018.

arXiv:1805.11718 [pdf, other]

Random mesh projectors for inverse problems

Authors: Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić

Abstract: We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to direct… ▽ More We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to directly learn the map** from the measured data to the reconstruction becomes unstable. Instead, we propose to first learn an ensemble of simpler map**s from the data to projections of the unknown image into random piecewise-constant subspaces. We then combine the projections to form a final reconstruction by solving a deconvolution-like problem. We show experimentally that the proposed method is more robust to measurement noise and corruptions not seen during training than a directly learned inverse. △ Less

Submitted 5 December, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: S. Gupta and K. Kothari contributed equally

arXiv:1804.02465 [pdf, other]

doi 10.1109/TSP.2021.3063458

Reconstructing Point Sets from Distance Distributions

Authors: Shuai Huang, Ivan Dokmanić

Abstract: We address the problem of reconstructing a set of points on a line or a loop from their unassigned noisy pairwise distances. When the points lie on a line, the problem is known as the turnpike; when they are on a loop, it is known as the beltway. We approximate the problem by discretizing the domain and representing the $N$ points via an $N$-hot encoding, which is a density supported on the discre… ▽ More We address the problem of reconstructing a set of points on a line or a loop from their unassigned noisy pairwise distances. When the points lie on a line, the problem is known as the turnpike; when they are on a loop, it is known as the beltway. We approximate the problem by discretizing the domain and representing the $N$ points via an $N$-hot encoding, which is a density supported on the discretized domain. We show how the distance distribution is then simply a collection of quadratic functionals of this density and propose to recover the point locations so that the estimated distance distribution matches the measured distance distribution. This can be cast as a constrained nonconvex optimization problem which we solve using projected gradient descent with a suitable spectral initializer. We derive conditions under which the proposed distance distribution matching approach locally converges to a global optimizer at a linear rate. Compared to the conventional backtracking approach, our method jointly reconstructs all the point locations and is robust to noise in the measurements. We substantiate these claims with state-of-the-art performance across a number of numerical experiments. Our method is the first practical approach to solve the large-scale noisy beltway problem where the points lie on a loop. △ Less

Submitted 25 April, 2021; v1 submitted 6 April, 2018; originally announced April 2018.

Journal ref: IEEE Transactions on Signal Processing, Vol. 69, 1181-1127, Mar. 2021

arXiv:1801.03740 [pdf, other]

doi 10.1109/TASLP.2018.2867081

Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

Authors: Dalia El Badawy, Ivan Dokmanić

Abstract: Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound so… ▽ More Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach. △ Less

Submitted 28 August, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

Comments: This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language processing (TASLP)

arXiv:1711.06805 [pdf, other]

doi 10.1109/ICASSP.2018.8461345

Separake: Source Separation with a Little Help From Echoes

Authors: Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanić

Abstract: It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show that multipath in fact helps sound source separation, even with very simple propagation models. Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. We rather assume that we know the positions of a few virtual microphones… ▽ More It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show that multipath in fact helps sound source separation, even with very simple propagation models. Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. We rather assume that we know the positions of a few virtual microphones generated by echoes and we show how this gives us enough spatial diversity to get a performance boost over the anechoic case. We show improvements for two standard algorithms---one that uses only magnitudes of the transfer functions, and one that also uses the phases. Concretely, we show that multichannel non-negative matrix factorization aided with a small number of echoes beats the vanilla variant of the same algorithm, and that with magnitude information only, echoes enable separation where it was previously impossible. △ Less

Submitted 17 November, 2017; originally announced November 2017.

arXiv:1710.04196 [pdf, other]

doi 10.1109/ICASSP.2018.8461310

Pyroomacoustics: A Python package for audio room simulations and array processing algorithms

Authors: Robin Scheibler, Eric Bezzam, Ivan Dokmanić

Abstract: We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the imag… ▽ More We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the image source model for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers; and finally, reference implementations of popular algorithms for beamforming, direction finding, and adaptive filtering. Together, they form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step. △ Less

Submitted 11 October, 2017; originally announced October 2017.

Comments: 5 pages, 5 figures, describes a software package

arXiv:1706.08701 [pdf, other]

Beyond Moore-Penrose Part II: The Sparse Pseudoinverse

Authors: Ivan Dokmanić, Rémi Gribonval

Abstract: This is the second part of a two-paper series on generalized inverses that minimize matrix norms. In Part II we focus on generalized inverses that are minimizers of entrywise p norms whose main representative is the sparse pseudoinverse for $p = 1$. We are motivated by the idea to replace the Moore-Penrose pseudoinverse by a sparser generalized inverse which is in some sense well-behaved. Sparsity… ▽ More This is the second part of a two-paper series on generalized inverses that minimize matrix norms. In Part II we focus on generalized inverses that are minimizers of entrywise p norms whose main representative is the sparse pseudoinverse for $p = 1$. We are motivated by the idea to replace the Moore-Penrose pseudoinverse by a sparser generalized inverse which is in some sense well-behaved. Sparsity implies that it is faster to apply the resulting matrix; well-behavedness would imply that we do not lose much in stability with respect to the least-squares performance of the MPP. We first address questions of uniqueness and non-zero count of (putative) sparse pseu-doinverses. We show that a sparse pseudoinverse is generically unique, and that it indeed reaches optimal sparsity for almost all matrices. We then turn to proving our main stability result: finite-size concentration bounds for the Frobenius norm of p-minimal inverses for $1 $\le$ p $\le$ 2$. Our proof is based on tools from convex analysis and random matrix theory, in particular the recently developed convex Gaussian min-max theorem. Along the way we prove several results about sparse representations and convex programming that were known folklore, but of which we could find no proof. △ Less

Submitted 13 July, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

arXiv:1706.08349 [pdf, other]

Beyond Moore-Penrose Part I: Generalized Inverses that Minimize Matrix Norms

Authors: Ivan Dokmanić, Rémi Gribonval

Abstract: This is the first paper of a two-long series in which we study linear generalized inverses that minimize matrix norms. Such generalized inverses are famously represented by the Moore-Penrose pseudoinverse (MPP) which happens to minimize the Frobenius norm. Freeing up the degrees of freedom associated with Frobenius optimality enables us to promote other interesting properties. In this Part I, we l… ▽ More This is the first paper of a two-long series in which we study linear generalized inverses that minimize matrix norms. Such generalized inverses are famously represented by the Moore-Penrose pseudoinverse (MPP) which happens to minimize the Frobenius norm. Freeing up the degrees of freedom associated with Frobenius optimality enables us to promote other interesting properties. In this Part I, we look at the basic properties of norm-minimizing generalized inverses, especially in terms of uniqueness and relation to the MPP. We first show that the MPP minimizes many norms beyond those unitarily invariant, thus further bolstering its role as a robust choice in many situations. We then concentrate on some norms which are generally not minimized by the MPP, but whose minimization is relevant for linear inverse problems and sparse representations. In particular, we look at mixed norms and the induced $\ell^p \rightarrow \ell^q$ norms. An interesting representative is the sparse pseudoinverse which we study in much more detail in Part II. Next, we shift attention from norms to matrices with interesting behaviors. We exhibit a class whose generalized inverse is always the MPP-even for norms that normally result in different inverses-and a class for which many generalized inverses coincide, but not with the MPP. Finally, we discuss efficient computation of norm-minimizing generalized inverses. △ Less

Submitted 13 July, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

arXiv:1612.00876 [pdf, other]

doi 10.1109/ICASSP.2017.7952744

FRIDA: FRI-Based DOA Estimation for Arbitrary Array Layouts

Authors: Hanjie Pan, Robin Scheibler, Eric Bezzam, Ivan Dokmanic, Martin Vetterli

Abstract: In this paper we present FRIDA---an algorithm for estimating directions of arrival of multiple wideband sound sources. FRIDA combines multi-band information coherently and achieves state-of-the-art resolution at extremely low signal-to-noise ratios. It works for arbitrary array layouts, but unlike the various steered response power and subspace methods, it does not require a grid search. FRIDA lev… ▽ More In this paper we present FRIDA---an algorithm for estimating directions of arrival of multiple wideband sound sources. FRIDA combines multi-band information coherently and achieves state-of-the-art resolution at extremely low signal-to-noise ratios. It works for arbitrary array layouts, but unlike the various steered response power and subspace methods, it does not require a grid search. FRIDA leverages recent advances in sampling signals with a finite rate of innovation. It is based on the insight that for any array layout, the entries of the spatial covariance matrix can be linearly transformed into a uniformly sampled sum of sinusoids. △ Less

Submitted 2 December, 2016; originally announced December 2016.

Comments: Submitted to ICASSP2017

arXiv:1609.05512 [pdf, other]

Omnidirectional Bats, Point-to-Plane Distances, and the Price of Uniqueness

Authors: Miranda Kreković, Ivan Dokmanić, Martin Vetterli

Abstract: We study simultaneous localization and map** with a device that uses reflections to measure its distance from walls. Such a device can be realized acoustically with a synchronized collocated source and receiver; it behaves like a bat with no capacity for directional hearing or vocalizing. In this paper we generalize our previous work in 2D, and show that the 3D case is not just a simple extensio… ▽ More We study simultaneous localization and map** with a device that uses reflections to measure its distance from walls. Such a device can be realized acoustically with a synchronized collocated source and receiver; it behaves like a bat with no capacity for directional hearing or vocalizing. In this paper we generalize our previous work in 2D, and show that the 3D case is not just a simple extension, but rather a fundamentally different inverse problem. While generically the 2D problem has a unique solution, in 3D uniqueness is always absent in rooms with fewer than nine walls. In addition to the complete characterization of ambiguities which arise due to this non-uniqueness, we propose a robust solution for inexact measurements similar to analogous results for Euclidean Distance Matrices. Our theoretical results have important consequences for the design of collocated range-only SLAM systems, and we support them with an array of computer experiments. △ Less

Submitted 18 September, 2016; originally announced September 2016.

Comments: 5 pages, 8 figures, submitted to ICASSP 2017

arXiv:1609.05502 [pdf, other]

Inverse Problems with Invariant Multiscale Statistics

Authors: Ivan Dokmanić, Joan Bruna, Stéphane Mallat, Maarten de Hoop

Abstract: We propose a new approach to linear ill-posed inverse problems. Our algorithm alternates between enforcing two constraints: the measurements and the statistical correlation structure in some transformed space. We use a non-linear multiscale scattering transform which discards the phase and thus exposes strong spectral correlations otherwise hidden beneath the phase fluctuations. As a result, both… ▽ More We propose a new approach to linear ill-posed inverse problems. Our algorithm alternates between enforcing two constraints: the measurements and the statistical correlation structure in some transformed space. We use a non-linear multiscale scattering transform which discards the phase and thus exposes strong spectral correlations otherwise hidden beneath the phase fluctuations. As a result, both constraints may be put into effect by linear projections in their respective spaces. We apply the algorithm to super-resolution and tomography and show that it outperforms ad hoc convex regularizers and stably recovers the missing spectrum. △ Less

Submitted 2 December, 2018; v1 submitted 18 September, 2016; originally announced September 2016.

arXiv:1608.08753 [pdf, other]

Look, no Beacons! Optimal All-in-One EchoSLAM

Authors: Miranda Krekovic, Ivan Dokmanic, Martin Vetterli

Abstract: We study the problem of simultaneously reconstructing a polygonal room and a trajectory of a device equipped with a (nearly) collocated omnidirectional source and receiver. The device measures arrival times of echoes of pulses emitted by the source and picked up by the receiver. No prior knowledge about the device's trajectory is required. Most existing approaches addressing this problem assume mu… ▽ More We study the problem of simultaneously reconstructing a polygonal room and a trajectory of a device equipped with a (nearly) collocated omnidirectional source and receiver. The device measures arrival times of echoes of pulses emitted by the source and picked up by the receiver. No prior knowledge about the device's trajectory is required. Most existing approaches addressing this problem assume multiple sources or receivers, or they assume that some of these are static, serving as beacons. Unlike earlier approaches, we take into account the measurement noise and various constraints on the geometry by formulating the solution as a minimizer of a cost function similar to \emph{stress} in multidimensional scaling. We study uniqueness of the reconstruction from first-order echoes, and we show that in addition to the usual invariance to rigid motions, new ambiguities arise for important classes of rooms and trajectories. We support our theoretical developments with a number of numerical experiments. △ Less

Submitted 31 August, 2016; originally announced August 2016.

Comments: 5 pages, 6 figures, submitted to Asilomar Conference on Signals, Systems, and Computers Website

arXiv:1502.07577 [pdf, other]

doi 10.1109/TSP.2015.2478751

Sampling Sparse Signals on the Sphere: Algorithms and Applications

Authors: Ivan Dokmanic, Yue M. Lu

Abstract: We propose a sampling scheme that can perfectly reconstruct a collection of spikes on the sphere from samples of their lowpass-filtered observations. Central to our algorithm is a generalization of the annihilating filter method, a tool widely used in array signal processing and finite-rate-of-innovation (FRI) sampling. The proposed algorithm can reconstruct $K$ spikes from $(K+\sqrt{K})^2$ spatia… ▽ More We propose a sampling scheme that can perfectly reconstruct a collection of spikes on the sphere from samples of their lowpass-filtered observations. Central to our algorithm is a generalization of the annihilating filter method, a tool widely used in array signal processing and finite-rate-of-innovation (FRI) sampling. The proposed algorithm can reconstruct $K$ spikes from $(K+\sqrt{K})^2$ spatial samples. This sampling requirement improves over previously known FRI sampling schemes on the sphere by a factor of four for large $K$. We showcase the versatility of the proposed algorithm by applying it to three different problems: 1) sampling diffusion processes induced by localized sources on the sphere, 2) shot noise removal, and 3) sound source localization (SSL) by a spherical microphone array. In particular, we show how SSL can be reformulated as a spherical sparse sampling problem. △ Less

Submitted 3 March, 2015; v1 submitted 26 February, 2015; originally announced February 2015.

Comments: 14 pages, 8 figures, submitted to IEEE Transactions on Signal Processing

arXiv:1502.07541 [pdf, other]

doi 10.1109/MSP.2015.2398954

Euclidean Distance Matrices: Essential Theory, Algorithms and Applications

Authors: Ivan Dokmanic, Reza Parhizkar, Juri Ranieri, Martin Vetterli

Abstract: Euclidean distance matrices (EDM) are matrices of squared distances between points. The definition is deceivingly simple: thanks to their many useful properties they have found applications in psychometrics, crystallography, machine learning, wireless sensor networks, acoustics, and more. Despite the usefulness of EDMs, they seem to be insufficiently known in the signal processing community. Our g… ▽ More Euclidean distance matrices (EDM) are matrices of squared distances between points. The definition is deceivingly simple: thanks to their many useful properties they have found applications in psychometrics, crystallography, machine learning, wireless sensor networks, acoustics, and more. Despite the usefulness of EDMs, they seem to be insufficiently known in the signal processing community. Our goal is to rectify this mishap in a concise tutorial. We review the fundamental properties of EDMs, such as rank or (non)definiteness. We show how various EDM properties can be used to design algorithms for completing and denoising distance data. Along the way, we demonstrate applications to microphone position calibration, ultrasound tomography, room reconstruction from echoes and phase retrieval. By spelling out the essential algorithms, we hope to fast-track the readers in applying EDMs to their own problems. Matlab code for all the described algorithms, and to generate the figures in the paper, is available online. Finally, we suggest directions for further research. △ Less

Submitted 15 August, 2015; v1 submitted 26 February, 2015; originally announced February 2015.

Comments: - 17 pages, 12 figures, to appear in IEEE Signal Processing Magazine - change of title in the last revision

arXiv:1407.5514 [pdf, other]

doi 10.1109/JSTSP.2015.2415761

Raking the Cocktail Party

Authors: Ivan Dokmanić, Robin Scheibler, Martin Vetterli

Abstract: We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not ortho… ▽ More We present the concept of an acoustic rake receiver---a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts. Therefore, we focus on the spatial structure, rather than temporal. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and the interfering signal offer additional spatial diversity that we can exploit in the beamformer design. We present several "intuitive" and optimal formulations of acoustic rake receivers, and show theoretically and numerically that the rake formulation of the maximum signal-to-interference-and-noise beamformer offers significant performance boosts in terms of noise and interference suppression. Beyond signal-to-noise ratio, we observe gains in terms of the \emph{perceptual evaluation of speech quality} (PESQ) metric for the speech quality. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at \url{http://lcav.github.io/AcousticRakeReceiver/}. △ Less

Submitted 15 January, 2015; v1 submitted 21 July, 2014; originally announced July 2014.

Comments: 12 pages, 11 figures, Accepted for publication in IEEE Journal on Selected Topics in Signal Processing (Special Issue on Spatial Audio)

Showing 1–50 of 50 results for author: Dokmanić, I