Search | arXiv e-print repository

doi 10.1109/TSP.2024.3418971

Pivotal Auto-Encoder via Self-Normalizing ReLU

Authors: Nelson Goldenstein, Jeremias Sulam, Yaniv Romano

Abstract: Sparse auto-encoders are useful for extracting low-dimensional representations from high-dimensional data. However, their performance degrades sharply when the input noise at test time differs from the noise employed during training. This limitation hinders the applicability of auto-encoders in real-world scenarios where the level of noise in the input is unpredictable. In this paper, we formalize… ▽ More Sparse auto-encoders are useful for extracting low-dimensional representations from high-dimensional data. However, their performance degrades sharply when the input noise at test time differs from the noise employed during training. This limitation hinders the applicability of auto-encoders in real-world scenarios where the level of noise in the input is unpredictable. In this paper, we formalize single hidden layer sparse auto-encoders as a transform learning problem. Leveraging the transform modeling interpretation, we propose an optimization problem that leads to a predictive model invariant to the noise level at test time. In other words, the same pre-trained model is able to generalize to different noise levels. The proposed optimization algorithm, derived from the square root lasso, is translated into a new, computationally efficient auto-encoding architecture. After proving that our new method is invariant to the noise level, we evaluate our approach by training networks using the proposed architecture for denoising tasks. Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise compared to commonly used architectures. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2405.19146 [pdf, other]

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

Authors: Jacopo Teneggi, Jeremias Sulam

Abstract: Recent works have extended notions of feature importance to \emph{semantic concepts} that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the glob… ▽ More Recent works have extended notions of feature importance to \emph{semantic concepts} that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the global (i.e., over a population) and local (i.e., for a sample) statistical importance of semantic concepts for the predictions of opaque models, by means of conditional independence, which allows for rigorous testing. We use recent ideas of sequential kernelized testing (SKIT) to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework on synthetic datasets as well as on image classification tasks using vision-language models such as CLIP. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.14176 [pdf, other]

Certified Robustness against Sparse Adversarial Perturbations via Data Localization

Authors: Ambar Pal, René Vidal, Jeremias Sulam

Abstract: Recent work in adversarial robustness suggests that natural data distributions are localized, i.e., they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this w… ▽ More Recent work in adversarial robustness suggests that natural data distributions are localized, i.e., they place high probability in small volume regions of the input space, and that this property can be utilized for designing classifiers with improved robustness guarantees for $\ell_2$-bounded perturbations. Yet, it is still unclear if this observation holds true for more general metrics. In this work, we extend this theory to $\ell_0$-bounded adversarial perturbations, where the attacker can modify a few pixels of the image but is unrestricted in the magnitude of perturbation, and we show necessary and sufficient conditions for the existence of $\ell_0$-robust classifiers. Theoretical certification approaches in this regime essentially employ voting over a large ensemble of classifiers. Such procedures are combinatorial and expensive or require complicated certification techniques. In contrast, a simple classifier emerges from our theory, dubbed Box-NN, which naturally incorporates the geometry of the problem and improves upon the current state-of-the-art in certified robustness against sparse attacks for the MNIST and Fashion-MNIST datasets. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2310.14344 [pdf, other]

What's in a Prior? Learned Proximal Networks for Inverse Problems

Authors: Zhenghan Fang, Sam Buchanan, Jeremias Sulam

Abstract: Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed. Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling, where they loosely resemble proximal operators. Yet, something essential is lost in employing these purely d… ▽ More Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed. Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling, where they loosely resemble proximal operators. Yet, something essential is lost in employing these purely data-driven approaches: there is no guarantee that a general deep network represents the proximal operator of any function, nor is there any characterization of the function for which the network might provide some approximate proximal. This not only makes guaranteeing convergence of iterative schemes challenging but, more fundamentally, complicates the analysis of what has been learned by these networks about their training data. Herein we provide a framework to develop learned proximal networks (LPN), prove that they provide exact proximal operators for a data-driven nonconvex regularizer, and show how a new training strategy, dubbed proximal matching, provably promotes the recovery of the log-prior of the true data distribution. Such LPN provide general, unsupervised, expressive proximal operators that can be used for general inverse problems with convergence guarantees. We illustrate our results in a series of cases of increasing complexity, demonstrating that these models not only result in state-of-the-art performance, but provide a window into the resulting priors learned from data. △ Less

Submitted 27 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2309.16096 [pdf, other]

Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness

Authors: Ambar Pal, Jeremias Sulam, René Vidal

Abstract: The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples… ▽ More The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, utilizing structure in data naturally leads to classifiers that enjoy data-dependent polyhedral robustness guarantees, improving upon methods for provable certification in certain regimes. △ Less

Submitted 25 May, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted to Neural Information Processing Systems (NeurIPS) 2023

arXiv:2307.00426 [pdf, other]

Sparsity-aware generalization theory for deep neural networks

Authors: Ramchandran Muthukumar, Jeremias Sulam

Abstract: Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By develo** a framework that accounts for this reduced effective model size for each inpu… ▽ More Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By develo** a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models. △ Less

Submitted 4 July, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

arXiv:2305.04746 [pdf, other]

Understanding Noise-Augmented Training for Randomized Smoothing

Authors: Ambar Pal, Jeremias Sulam

Abstract: Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks while making minimal assumptions about a classifier. This method relies on taking a majority vote of any base classifier over multiple noise-perturbed inputs to obtain a smoothed classifier, and it remains the tool of choice to certify deep and complex neural network models. Nonetheless, no… ▽ More Randomized smoothing is a technique for providing provable robustness guarantees against adversarial attacks while making minimal assumptions about a classifier. This method relies on taking a majority vote of any base classifier over multiple noise-perturbed inputs to obtain a smoothed classifier, and it remains the tool of choice to certify deep and complex neural network models. Nonetheless, non-trivial performance of such smoothed classifier crucially depends on the base model being trained on noise-augmented data, i.e., on a smoothed input distribution. While widely adopted in practice, it is still unclear how this noisy training of the base classifier precisely affects the risk of the robust smoothed classifier, leading to heuristics and tricks that are poorly understood. In this work we analyze these trade-offs theoretically in a binary classification setting, proving that these common observations are not universal. We show that, without making stronger distributional assumptions, no benefit can be expected from predictors trained with noise-augmentation, and we further characterize distributions where such benefit is obtained. Our analysis has direct implications to the practical deployment of randomized smoothing, and we illustrate some of these via experiments on CIFAR-10 and MNIST, as well as on synthetic datasets. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Transactions on Machine Learning Research, 2023

arXiv:2302.03791 [pdf, other]

How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control

Authors: Jacopo Teneggi, Matthew Tivnan, J. Webster Stayman, Jeremias Sulam

Abstract: Score-based generative modeling, informally referred to as diffusion models, continue to grow in popularity across several important domains and tasks. While they provide high-quality and diverse samples from empirical distributions, important questions remain on the reliability and trustworthiness of these sampling procedures for their responsible use in critical scenarios. Conformal prediction i… ▽ More Score-based generative modeling, informally referred to as diffusion models, continue to grow in popularity across several important domains and tasks. While they provide high-quality and diverse samples from empirical distributions, important questions remain on the reliability and trustworthiness of these sampling procedures for their responsible use in critical scenarios. Conformal prediction is a modern tool to construct finite-sample, distribution-free uncertainty guarantees for any black-box predictor. In this work, we focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure, that we term $K$-RCPS, which allows to $(i)$ provide entrywise calibrated intervals for future samples of any diffusion model, and $(ii)$ control a certain notion of risk with respect to a ground truth image with minimal mean interval length. Differently from existing conformal risk control procedures, ours relies on a novel convex optimization approach that allows for multidimensional risk control while provably minimizing the mean interval length. We illustrate our approach on two real-world image denoising problems: on natural images of faces as well as on computed tomography (CT) scans of the abdomen, demonstrating state of the art performance. △ Less

Submitted 27 December, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Journal ref: International Conference on Machine Learning (2023)

arXiv:2211.15924 [pdf, other]

Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT

Authors: Jacopo Teneggi, Paul H. Yi, Jeremias Sulam

Abstract: Modern machine learning pipelines, in particular those based on deep learning (DL) models, require large amounts of labeled data. For classification problems, the most common learning paradigm consists of presenting labeled examples during training, thus providing strong supervision on what constitutes positive and negative samples. This constitutes a major obstacle for the development of DL model… ▽ More Modern machine learning pipelines, in particular those based on deep learning (DL) models, require large amounts of labeled data. For classification problems, the most common learning paradigm consists of presenting labeled examples during training, thus providing strong supervision on what constitutes positive and negative samples. This constitutes a major obstacle for the development of DL models in radiology--in particular for cross-sectional imaging (e.g., computed tomography [CT] scans)--where labels must come from manual annotations by expert radiologists at the image or slice-level. These differ from examination-level annotations, which are coarser but cheaper, and could be extracted from radiology reports using natural language processing techniques. This work studies the question of what kind of labels should be collected for the problem of intracranial hemorrhage detection in brain CT. We investigate whether image-level annotations should be preferred to examination-level ones. By framing this task as a multiple instance learning problem, and employing modern attention-based DL architectures, we analyze the degree to which different levels of supervision improve detection performance. We find that strong supervision (i.e., learning with local image-level annotations) and weak supervision (i.e., learning with only global examination-level labels) achieve comparable performance in examination-level hemorrhage detection (the task of selecting the images in an examination that show signs of hemorrhage) as well as in image-level hemorrhage detection (highlighting those signs within the selected images). Furthermore, we study this behavior as a function of the number of labels available during training. Our results suggest that local labels may not be necessary at all for these tasks, drastically reducing the time and cost involved in collecting and curating datasets. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2209.04504 [pdf, other]

DeepSTI: Towards Tensor Reconstruction using Fewer Orientations in Susceptibility Tensor Imaging

Authors: Zhenghan Fang, Kuo-Wei Lai, Peter van Zijl, Xu Li, Jeremias Sulam

Abstract: Susceptibility tensor imaging (STI) is an emerging magnetic resonance imaging technique that characterizes the anisotropic tissue magnetic susceptibility with a second-order tensor model. STI has the potential to provide information for both the reconstruction of white matter fiber pathways and detection of myelin changes in the brain at mm resolution or less, which would be of great value for und… ▽ More Susceptibility tensor imaging (STI) is an emerging magnetic resonance imaging technique that characterizes the anisotropic tissue magnetic susceptibility with a second-order tensor model. STI has the potential to provide information for both the reconstruction of white matter fiber pathways and detection of myelin changes in the brain at mm resolution or less, which would be of great value for understanding brain structure and function in healthy and diseased brain. However, the application of STI in vivo has been hindered by its cumbersome and time-consuming acquisition requirement of measuring susceptibility induced MR phase changes at multiple (usually more than six) head orientations. This complexity is enhanced by the limitation in head rotation angles due to physical constraints of the head coil. As a result, STI has not yet been widely applied in human studies in vivo. In this work, we tackle these issues by proposing an image reconstruction algorithm for STI that leverages data-driven priors. Our method, called DeepSTI, learns the data prior implicitly via a deep neural network that approximates the proximal operator of a regularizer function for STI. The dipole inversion problem is then solved iteratively using the learned proximal network. Experimental results using both simulation and in vivo human data demonstrate great improvement over state-of-the-art algorithms in terms of the reconstructed tensor image, principal eigenvector maps and tractography results, while allowing for tensor reconstruction with MR phase measured at much less than six different orientations. Notably, promising reconstruction results are achieved by our method from only one orientation in human in vivo, and we demonstrate a potential application of this technique for estimating lesion susceptibility anisotropy in patients with multiple sclerosis. △ Less

Submitted 9 September, 2022; originally announced September 2022.

arXiv:2207.12497 [pdf, other]

Estimating and Controlling for Equalized Odds via Sensitive Attribute Predictors

Authors: Beepul Bharti, Paul Yi, Jeremias Sulam

Abstract: As the use of machine learning models in real world high-stakes decision settings continues to grow, it is highly important that we are able to audit and control for any potential fairness violations these models may exhibit towards certain groups. To do so, one naturally requires access to sensitive attributes, such as demographics, gender, or other potentially sensitive features that determine g… ▽ More As the use of machine learning models in real world high-stakes decision settings continues to grow, it is highly important that we are able to audit and control for any potential fairness violations these models may exhibit towards certain groups. To do so, one naturally requires access to sensitive attributes, such as demographics, gender, or other potentially sensitive features that determine group membership. Unfortunately, in many settings, this information is often unavailable. In this work we study the well known \emph{equalized odds} (EOD) definition of fairness. In a setting without sensitive attributes, we first provide tight and computable upper bounds for the EOD violation of a predictor. These bounds precisely reflect the worst possible EOD violation. Second, we demonstrate how one can provably control the worst-case EOD by a new post-processing correction method. Our results characterize when directly controlling for EOD with respect to the predicted sensitive attributes is -- and when is not -- optimal when it comes to controlling worst-case EOD. Our results hold under assumptions that are milder than previous works, and we illustrate these results with experiments on synthetic and real datasets. △ Less

Submitted 8 June, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.07038 [pdf, other]

SHAP-XRT: The Shapley Value Meets Conditional Independence Testing

Authors: Jacopo Teneggi, Beepul Bharti, Yaniv Romano, Jeremias Sulam

Abstract: The complex nature of artificial neural networks raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. The Shapley value -- a solution concept from game theory -- is one of the most popular explanation methods for machine learning models. More traditionally, from a statistical perspective, feature importance is defined in terms of conditional independence. So… ▽ More The complex nature of artificial neural networks raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. The Shapley value -- a solution concept from game theory -- is one of the most popular explanation methods for machine learning models. More traditionally, from a statistical perspective, feature importance is defined in terms of conditional independence. So far, these two approaches to interpretability and feature importance have been considered separate and distinct. In this work, we show that Shapley-based explanation methods and conditional independence testing are closely related. We introduce the SHAPley EXplanation Randomization Test (SHAP-XRT), a testing procedure inspired by the Conditional Randomization Test (CRT) for a specific notion of local (i.e., on a sample) conditional independence. With it, we prove that for binary classification problems, the marginal contributions in the Shapley value provide lower and upper bounds to the expected $p$-values of their respective tests. Furthermore, we show that the Shapley value itself provides an upper bound to the expected $p$-value of a global (i.e., overall) null hypothesis. As a result, we further our understanding of Shapley-based explanation methods from a novel perspective and characterize the conditions under which one can make statistically valid claims about feature importance via the Shapley value. △ Less

Submitted 27 December, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

Journal ref: Transactions on Machine Learning Research (2023)

arXiv:2202.13216 [pdf, other]

Adversarial robustness of sparse local Lipschitz predictors

Authors: Ramchandran Muthukumar, Jeremias Sulam

Abstract: This work studies the adversarial robustness of parametric functions composed of a linear predictor and a non-linear representation map. % that satisfies certain stability condition. Our analysis relies on \emph{sparse local Lipschitzness} (SLL), an extension of local Lipschitz continuity that better captures the stability and reduced effective dimensionality of predictors upon local perturbations… ▽ More This work studies the adversarial robustness of parametric functions composed of a linear predictor and a non-linear representation map. % that satisfies certain stability condition. Our analysis relies on \emph{sparse local Lipschitzness} (SLL), an extension of local Lipschitz continuity that better captures the stability and reduced effective dimensionality of predictors upon local perturbations. SLL functions preserve a certain degree of structure, given by the sparsity pattern in the representation map, and include several popular hypothesis classes, such as piece-wise linear models, Lasso and its variants, and deep feed-forward \relu networks. % are sparse local Lipschitz. We provide a tighter robustness certificate on the minimal energy of an adversarial example, as well as tighter data-dependent non-uniform bounds on the robust generalization error of these predictors. We instantiate these results for the case of deep neural networks and provide numerical evidence that supports our results, shedding new insights into natural regularization strategies to increase the robustness of these models. △ Less

Submitted 3 March, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

Comments: Updated experiments

arXiv:2112.07782 [pdf, other]

Deciphering antibody affinity maturation with language models and weakly supervised learning

Authors: Jeffrey A. Ruffolo, Jeffrey J. Gray, Jeremias Sulam

Abstract: In response to pathogens, the adaptive immune system generates specific antibodies that bind and neutralize foreign antigens. Understanding the composition of an individual's immune repertoire can provide insights into this process and reveal potential therapeutic antibodies. In this work, we explore the application of antibody-specific language models to aid understanding of immune repertoires. W… ▽ More In response to pathogens, the adaptive immune system generates specific antibodies that bind and neutralize foreign antigens. Understanding the composition of an individual's immune repertoire can provide insights into this process and reveal potential therapeutic antibodies. In this work, we explore the application of antibody-specific language models to aid understanding of immune repertoires. We introduce AntiBERTy, a language model trained on 558M natural antibody sequences. We find that within repertoires, our model clusters antibodies into trajectories resembling affinity maturation. Importantly, we show that models trained to predict highly redundant sequences under a multiple instance learning framework identify key binding residues in the process. With further development, the methods presented here will provide new insights into antigen binding from repertoire sequences alone. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: Presented at Machine Learning for Structural Biology Workshop, NeurIPS 2021

arXiv:2109.10778 [pdf, other]

Label Cleaning Multiple Instance Learning: Refining Coarse Annotations on Single Whole-Slide Images

Authors: Zhenzhen Wang, Carla Saoud, Sintawat Wangsiricharoen, Aaron W. James, Aleksander S. Popel, Jeremias Sulam

Abstract: Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development. However, generating exhaustive and accurate annotations is labor-intensive, challenging, and costly. Drawing only coarse and approximate annotations is a much easier task, less costly, and it alleviates patholo… ▽ More Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development. However, generating exhaustive and accurate annotations is labor-intensive, challenging, and costly. Drawing only coarse and approximate annotations is a much easier task, less costly, and it alleviates pathologists' workload. In this paper, we study the problem of refining these approximate annotations in digital pathology to obtain more accurate ones. Some previous works have explored obtaining machine learning models from these inaccurate annotations, but few of them tackle the refinement problem where the mislabeled regions should be explicitly identified and corrected, and all of them require a -- often very large -- number of training samples. We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data. Patches cropped from a WSI with inaccurate labels are processed jointly within a multiple instance learning framework, mitigating their impact on the predictive model and refining the segmentation. Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming state-of-the-art alternatives, even while learning from a single slide. Moreover, we demonstrate how real annotations drawn by pathologists can be efficiently refined and improved by the proposed approach. All these results demonstrate that LC-MIL is a promising, light-weight tool to provide fine-grained annotations from coarsely annotated pathology sets. △ Less

Submitted 7 June, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

arXiv:2105.02375 [pdf, other]

A Geometric Analysis of Neural Collapse with Unconstrained Features

Authors: Zhihui Zhu, Tianyu Ding, **xin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

Abstract: We provide the first global optimization landscape analysis of $Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that ($i$) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equi… ▽ More We provide the first global optimization landscape analysis of $Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that ($i$) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and ($ii$) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified $unconstrained\;feature\;model$, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures. These findings could have profound implications for optimization, generalization, and robustness of broad interests. For example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over $20\%$ on ResNet18 without sacrificing the generalization performance. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 42 pages, 8 figures, 1 table; the first two authors contributed to this work equally

arXiv:2104.06164 [pdf, other]

doi 10.1109/TPAMI.2022.3189849

Fast Hierarchical Games for Image Explanations

Authors: Jacopo Teneggi, Alexandre Luster, Jeremias Sulam

Abstract: As modern complex neural networks keep breaking records and solving harder problems, their predictions also become less and less intelligible. The current lack of interpretability often undermines the deployment of accurate machine learning tools in sensitive settings. In this work, we present a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley… ▽ More As modern complex neural networks keep breaking records and solving harder problems, their predictions also become less and less intelligible. The current lack of interpretability often undermines the deployment of accurate machine learning tools in sensitive settings. In this work, we present a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients--Hierarchical Shap (h-Shap)--that resolves some of the limitations of current approaches. Unlike other Shapley-based explanation methods, h-Shap is scalable and can be computed without the need of approximation. Under certain distributional assumptions, such as those common in multiple instance learning, h-Shap retrieves the exact Shapley coefficients with an exponential improvement in computational complexity. We compare our hierarchical approach with popular Shapley-based and non-Shapley-based methods on a synthetic dataset, a medical imaging scenario, and a general computer vision problem, showing that h-Shap outperforms the state of the art in both accuracy and runtime. Code and experiments are made publicly available. △ Less

Submitted 9 June, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: 20 pages, 8 figures

arXiv:2010.12088 [pdf, other]

Adversarial Robustness of Supervised Sparse Coding

Authors: Jeremias Sulam, Ramchandran Muthukumar, Raman Arora

Abstract: Several recent results provide theoretical insights into the phenomena of adversarial examples. Existing results, however, are often limited due to a gap between the simplicity of the models studied and the complexity of those deployed in practice. In this work, we strike a better balance by considering a model that involves learning a representation while at the same time giving a precise general… ▽ More Several recent results provide theoretical insights into the phenomena of adversarial examples. Existing results, however, are often limited due to a gap between the simplicity of the models studied and the complexity of those deployed in practice. In this work, we strike a better balance by considering a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate. We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear classifier, and show an interesting interplay between the expressivity and stability of the (supervised) representation map and a notion of margin in the feature space. We bound the robust risk (to $\ell_2$-bounded perturbations) of hypotheses parameterized by dictionaries that achieve a mild encoder gap on training data. Furthermore, we provide a robustness certificate for end-to-end classification. We demonstrate the applicability of our analysis by computing certified accuracy on real data, and compare with other alternatives for certified robustness. △ Less

Submitted 4 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

Journal ref: Advances in Neural Information Processing Systems, 2020

arXiv:2008.05024 [pdf, other]

Learned Proximal Networks for Quantitative Susceptibility Map**

Authors: Kuo-Wei Lai, Manisha Aggarwal, Peter van Zijl, Xu Li, Jeremias Sulam

Abstract: Quantitative Susceptibility Map** (QSM) estimates tissue magnetic susceptibility distributions from Magnetic Resonance (MR) phase measurements by solving an ill-posed dipole inversion problem. Conventional single orientation QSM methods usually employ regularization strategies to stabilize such inversion, but may suffer from streaking artifacts or over-smoothing. Multiple orientation QSM such as… ▽ More Quantitative Susceptibility Map** (QSM) estimates tissue magnetic susceptibility distributions from Magnetic Resonance (MR) phase measurements by solving an ill-posed dipole inversion problem. Conventional single orientation QSM methods usually employ regularization strategies to stabilize such inversion, but may suffer from streaking artifacts or over-smoothing. Multiple orientation QSM such as calculation of susceptibility through multiple orientation sampling (COSMOS) can give well-conditioned inversion and an artifact free solution but has expensive acquisition costs. On the other hand, Convolutional Neural Networks (CNN) show great potential for medical image reconstruction, albeit often with limited interpretability. Here, we present a Learned Proximal Convolutional Neural Network (LP-CNN) for solving the ill-posed QSM dipole inversion problem in an iterative proximal gradient descent fashion. This approach combines the strengths of data-driven restoration priors and the clear interpretability of iterative solvers that can take into account the physical model of dipole convolution. During training, our LP-CNN learns an implicit regularizer via its proximal, enabling the decoupling between the forward operator and the data-driven parameters in the reconstruction algorithm. More importantly, this framework is believed to be the first deep learning QSM approach that can naturally handle an arbitrary number of phase input measurements without the need for any ad-hoc rotation or re-training. We demonstrate that the LP-CNN provides state-of-the-art reconstruction results compared to both traditional and deep learning methods while allowing for more flexibility in the reconstruction process. △ Less

Submitted 11 August, 2020; originally announced August 2020.

Comments: 11 pages

arXiv:2007.08383 [pdf, other]

Deep Learning in Protein Structural Modeling and Design

Authors: Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, Jeffrey J. Gray

Abstract: Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a pr… ▽ More Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling, and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence -> structure -> function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:2006.06179 [pdf, other]

Recovery and Generalization in Over-Realized Dictionary Learning

Authors: Jeremias Sulam, Chong You, Zhihui Zhu

Abstract: In over two decades of research, the field of dictionary learning has gathered a large collection of successful applications, and theoretical guarantees for model recovery are known only whenever optimization is carried out in the same model class as that of the underlying dictionary. This work characterizes the surprising phenomenon that dictionary recovery can be facilitated by searching over th… ▽ More In over two decades of research, the field of dictionary learning has gathered a large collection of successful applications, and theoretical guarantees for model recovery are known only whenever optimization is carried out in the same model class as that of the underlying dictionary. This work characterizes the surprising phenomenon that dictionary recovery can be facilitated by searching over the space of larger over-realized models. This observation is general and independent of the specific dictionary learning algorithm used. We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds. In particular, we show that model recovery can be upper-bounded by the empirical risk, a model-dependent quantity and the generalization gap, reflecting our empirical findings. We further show that an efficient and provably correct distillation approach can be employed to recover the correct atoms from the over-realized model. As a result, our meta-algorithm provides dictionary estimates with consistently better recovery of the ground-truth model. △ Less

Submitted 1 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:1811.00312 [pdf, other]

A Local Block Coordinate Descent Algorithm for the Convolutional Sparse Coding Model

Authors: Ev Zisselman, Jeremias Sulam, Michael Elad

Abstract: The Convolutional Sparse Coding (CSC) model has recently gained considerable traction in the signal and image processing communities. By providing a global, yet tractable, model that operates on the whole image, the CSC was shown to overcome several limitations of the patch-based sparse model while achieving superior performance in various applications. Contemporary methods for pursuit and learnin… ▽ More The Convolutional Sparse Coding (CSC) model has recently gained considerable traction in the signal and image processing communities. By providing a global, yet tractable, model that operates on the whole image, the CSC was shown to overcome several limitations of the patch-based sparse model while achieving superior performance in various applications. Contemporary methods for pursuit and learning the CSC dictionary often rely on the Alternating Direction Method of Multipliers (ADMM) in the Fourier domain for the computational convenience of convolutions, while ignoring the local characterizations of the image. A recent work by Papyan et al. suggested the SBDL algorithm for the CSC, while operating locally on image patches. SBDL demonstrates better performance compared to the Fourier-based methods, albeit still relying on the ADMM. In this work we maintain the localized strategy of the SBDL, while proposing a new and much simpler approach based on the Block Coordinate Descent algorithm - this method is termed Local Block Coordinate Descent (LoBCoD). Furthermore, we introduce a novel stochastic gradient descent version of LoBCoD for training the convolutional filters. The Stochastic-LoBCoD leverages the benefits of online learning, while being applicable to a single training image. We demonstrate the advantages of the proposed algorithms for image inpainting and multi-focus image fusion, achieving state-of-the-art results. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: 13 pages, 10 figures

MSC Class: 08

arXiv:1806.10171 [pdf, other]

doi 10.1109/TSP.2019.2929464

MMSE Approximation For Sparse Coding Algorithms Using Stochastic Resonance

Authors: Dror Simon, Jeremias Sulam, Yaniv Romano, Yue M. Lu, Michael Elad

Abstract: Sparse coding refers to the pursuit of the sparsest representation of a signal in a typically overcomplete dictionary. From a Bayesian perspective, sparse coding provides a Maximum a Posteriori (MAP) estimate of the unknown vector under a sparse prior. In this work, we suggest enhancing the performance of sparse coding algorithms by a deliberate and controlled contamination of the input with rando… ▽ More Sparse coding refers to the pursuit of the sparsest representation of a signal in a typically overcomplete dictionary. From a Bayesian perspective, sparse coding provides a Maximum a Posteriori (MAP) estimate of the unknown vector under a sparse prior. In this work, we suggest enhancing the performance of sparse coding algorithms by a deliberate and controlled contamination of the input with random noise, a phenomenon known as stochastic resonance. The proposed method adds controlled noise to the input and estimates a sparse representation from the perturbed signal. A set of such solutions is then obtained by projecting the original input signal onto the recovered set of supports. We present two variants of the described method, which differ in their final step. The first is a provably convergent approximation to the Minimum Mean Square Error (MMSE) estimator, relying on the generative model and applying a weighted average over the recovered solutions. The second is a relaxed variant of the former that simply applies an empirical mean. We show that both methods provide a computationally efficient approximation to the MMSE estimator, which is typically intractable to compute. We demonstrate our findings empirically and provide a theoretical analysis of our method under several different cases. △ Less

Submitted 11 April, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

arXiv:1806.00701 [pdf, other]

On Multi-Layer Basis Pursuit, Efficient Algorithms and Convolutional Neural Networks

Authors: Jeremias Sulam, Aviad Aberdam, Amir Beck, Michael Elad

Abstract: Parsimonious representations are ubiquitous in modeling and processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors.… ▽ More Parsimonious representations are ubiquitous in modeling and processing information. Motivated by the recent Multi-Layer Convolutional Sparse Coding (ML-CSC) model, we herein generalize the traditional Basis Pursuit problem to a multi-layer setting, introducing similar sparse enforcing penalties at different representation layers in a symbiotic relation between synthesis and analysis sparse priors. We explore different iterative methods to solve this new problem in practice, and we propose a new Multi-Layer Iterative Soft Thresholding Algorithm (ML-ISTA), as well as a fast version (ML-FISTA). We show that these nested first order algorithms converge, in the sense that the function value of near-fixed points can get arbitrarily close to the solution of the original problem. We further show how these algorithms effectively implement particular recurrent convolutional neural networks (CNNs) that generalize feed-forward ones without introducing any parameters. We present and analyze different architectures resulting unfolding the iterations of the proposed pursuit algorithms, including a new Learned ML-ISTA, providing a principled way to construct deep recurrent CNNs. Unlike other similar constructions, these architectures unfold a global pursuit holistically for the entire network. We demonstrate the emerging constructions in a supervised learning setting, consistently improving the performance of classical CNNs while maintaining the number of parameters constant. △ Less

Submitted 21 November, 2018; v1 submitted 2 June, 2018; originally announced June 2018.

arXiv:1805.11596 [pdf, other]

Adversarial Noise Attacks of Deep Learning Architectures -- Stability Analysis via Sparse Modeled Signals

Authors: Yaniv Romano, Aviad Aberdam, Jeremias Sulam, Michael Elad

Abstract: Despite their impressive performance, deep convolutional neural networks (CNNs) have been shown to be sensitive to small adversarial perturbations. These nuisances, which one can barely notice, are powerful enough to fool sophisticated and well performing classifiers, leading to ridiculous misclassification results. In this paper we analyze the stability of state-of-the-art deep-learning classific… ▽ More Despite their impressive performance, deep convolutional neural networks (CNNs) have been shown to be sensitive to small adversarial perturbations. These nuisances, which one can barely notice, are powerful enough to fool sophisticated and well performing classifiers, leading to ridiculous misclassification results. In this paper we analyze the stability of state-of-the-art deep-learning classification machines to adversarial perturbations, where we assume that the signals belong to the (possibly multi-layer) sparse representation model. We start with convolutional sparsity and then proceed to its multi-layered version, which is tightly connected to CNNs. Our analysis links between the stability of the classification to noise and the underlying structure of the signal, quantified by the sparsity of its representation under a fixed dictionary. In addition, we offer similar stability theorems for two practical pursuit algorithms, which are posed as two different deep-learning architectures - the layered Thresholding and the layered Basis Pursuit. Our analysis establishes the better robustness of the later to adversarial attacks. We corroborate these theoretical results by numerical experiments on three datasets: MNIST, CIFAR-10 and CIFAR-100. △ Less

Submitted 5 August, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

arXiv:1804.09788 [pdf, other]

Multi-Layer Sparse Coding: The Holistic Way

Authors: Aviad Aberdam, Jeremias Sulam, Michael Elad

Abstract: The recently proposed multi-layer sparse model has raised insightful connections between sparse representations and convolutional neural networks (CNN). In its original conception, this model was restricted to a cascade of convolutional synthesis representations. In this paper, we start by addressing a more general model, revealing interesting ties to fully connected networks. We then show that th… ▽ More The recently proposed multi-layer sparse model has raised insightful connections between sparse representations and convolutional neural networks (CNN). In its original conception, this model was restricted to a cascade of convolutional synthesis representations. In this paper, we start by addressing a more general model, revealing interesting ties to fully connected networks. We then show that this multi-layer construction admits a brand new interpretation in a unique symbiosis between synthesis and analysis models: while the deepest layer indeed provides a synthesis representation, the mid-layers decompositions provide an analysis counterpart. This new perspective exposes the suboptimality of previously proposed pursuit approaches, as they do not fully leverage all the information comprised in the model constraints. Armed with this understanding, we address fundamental theoretical issues, revisiting previous analysis and expanding it. Motivated by the limitations of previous algorithms, we then propose an integrated - holistic - alternative that estimates all representations in the model simultaneously, and analyze all these different schemes under stochastic noise assumptions. Inspired by the synthesis-analysis duality, we further present a Holistic Pursuit algorithm, which alternates between synthesis and analysis sparse coding steps, eventually solving for the entire model as a whole, with provable improved performance. Finally, we present numerical results that demonstrate the practical advantages of our approach. △ Less

Submitted 25 July, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

arXiv:1708.08705 [pdf, other]

doi 10.1109/TSP.2018.2846226

Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning

Authors: Jeremias Sulam, Vardan Papyan, Yaniv Romano, Michael Elad

Abstract: The recently proposed Multi-Layer Convolutional Sparse Coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of Convolutional Neural Networks (CNNs). Under this framework, the computation of the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors -- or feature maps -- from a gi… ▽ More The recently proposed Multi-Layer Convolutional Sparse Coding (ML-CSC) model, consisting of a cascade of convolutional sparse layers, provides a new interpretation of Convolutional Neural Networks (CNNs). Under this framework, the computation of the forward pass in a CNN is equivalent to a pursuit algorithm aiming to estimate the nested sparse representation vectors -- or feature maps -- from a given input signal. Despite having served as a pivotal connection between CNNs and sparse modeling, a deeper understanding of the ML-CSC is still lacking: there are no pursuit algorithms that can serve this model exactly, nor are there conditions to guarantee a non-empty model. While one can easily obtain signals that approximately satisfy the ML-CSC constraints, it remains unclear how to simply sample from the model and, more importantly, how one can train the convolutional filters from real data. In this work, we propose a sound pursuit algorithm for the ML-CSC model by adopting a projection approach. We provide new and improved bounds on the stability of the solution of such pursuit and we analyze different practical alternatives to implement this in practice. We show that the training of the filters is essential to allow for non-trivial signals in the model, and we derive an online algorithm to learn the dictionaries from real data, effectively resulting in cascaded sparse convolutional layers. Last, but not least, we demonstrate the applicability of the ML-CSC model for several applications in an unsupervised setting, providing competitive results. Our work represents a bridge between matrix factorization, sparse dictionary learning and sparse auto-encoders, and we analyze these connections in detail. △ Less

Submitted 30 June, 2018; v1 submitted 29 August, 2017; originally announced August 2017.

Journal ref: IEEE Transactions on Signal Processing, vol. 66, no. 15, pp. 4090-4104, Aug.1, 1 2018

arXiv:1707.06066 [pdf, other]

doi 10.1109/TSP.2017.2733447

Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding

Authors: Vardan Papyan, Jeremias Sulam, Michael Elad

Abstract: The celebrated sparse representation model has led to remarkable results in various signal processing tasks in the last decade. However, despite its initial purpose of serving as a global prior for entire signals, it has been commonly used for modeling low dimensional patches due to the computational constraints it entails when deployed with learned dictionaries. A way around this problem has been… ▽ More The celebrated sparse representation model has led to remarkable results in various signal processing tasks in the last decade. However, despite its initial purpose of serving as a global prior for entire signals, it has been commonly used for modeling low dimensional patches due to the computational constraints it entails when deployed with learned dictionaries. A way around this problem has been recently proposed, adopting a convolutional sparse representation model. This approach assumes that the global dictionary is a concatenation of banded Circulant matrices. While several works have presented algorithmic solutions to the global pursuit problem under this new model, very few truly-effective guarantees are known for the success of such methods. In this work, we address the theoretical aspects of the convolutional sparse model providing the first meaningful answers to questions of uniqueness of solutions and success of pursuit algorithms, both greedy and convex relaxations, in ideal and noisy regimes. To this end, we generalize mathematical quantities, such as the $\ell_0$ norm, mutual coherence, Spark and RIP to their counterparts in the convolutional setting, intrinsically capturing local measures of the global model. On the algorithmic side, we demonstrate how to solve the global pursuit problem by using simple local processing, thus offering a first of its kind bridge between global modeling of signals and their patch-based local treatment. △ Less

Submitted 12 July, 2017; originally announced July 2017.

Comments: This is the journal version of arXiv:1607.02005 and arXiv:1607.02009, accepted to IEEE Transactions on Signal Processing

arXiv:1705.03239 [pdf, other]

Convolutional Dictionary Learning via Local Processing

Authors: Vardan Papyan, Yaniv Romano, Jeremias Sulam, Michael Elad

Abstract: Convolutional Sparse Coding (CSC) is an increasingly popular model in the signal and image processing communities, tackling some of the limitations of traditional patch-based sparse representations. Although several works have addressed the dictionary learning problem under this model, these relied on an ADMM formulation in the Fourier domain, losing the sense of locality and the relation to the t… ▽ More Convolutional Sparse Coding (CSC) is an increasingly popular model in the signal and image processing communities, tackling some of the limitations of traditional patch-based sparse representations. Although several works have addressed the dictionary learning problem under this model, these relied on an ADMM formulation in the Fourier domain, losing the sense of locality and the relation to the traditional patch-based sparse pursuit. A recent work suggested a novel theoretical analysis of this global model, providing guarantees that rely on a localized sparsity measure. Herein, we extend this local-global relation by showing how one can efficiently solve the convolutional sparse pursuit problem and train the filters involved, while operating locally on image patches. Our approach provides an intuitive algorithm that can leverage standard techniques from the sparse representations field. The proposed method is fast to train, simple to implement, and flexible enough that it can be easily deployed in a variety of applications. We demonstrate the proposed training scheme for image inpainting and image separation, while achieving state-of-the-art results. △ Less

Submitted 9 May, 2017; originally announced May 2017.

arXiv:1607.02009 [pdf, other]

Working Locally Thinking Globally - Part II: Stability and Algorithms for Convolutional Sparse Coding

Authors: Vardan Papyan, Jeremias Sulam, Michael Elad

Abstract: The convolutional sparse model has recently gained increasing attention in the signal and image processing communities, and several methods have been proposed for solving the pursuit problem emerging from it -- in particular its convex relaxation, Basis Pursuit. In the first of this two-part work, we have provided a theoretical back-bone for this model, providing guarantees for the uniqueness of t… ▽ More The convolutional sparse model has recently gained increasing attention in the signal and image processing communities, and several methods have been proposed for solving the pursuit problem emerging from it -- in particular its convex relaxation, Basis Pursuit. In the first of this two-part work, we have provided a theoretical back-bone for this model, providing guarantees for the uniqueness of the sparsest solution and for the success of pursuit algorithms by introducing the notion of stripe sparsity and other related measures. Herein, we extend the analysis to a noisy regime, thereby considering signal perturbations and model deviations. We address questions of stability of the sparsest solutions and the success of pursuit algorithms, both greedy and convex. Classical definitions such as the RIP are generalized to the convolutional model, and existing notions such as the ERC are connected to our setting. On the algorithmic side, we demonstrate how to solve the global pursuit problem by using simple local processing, thus offering a first of its kind bridge between global modeling of signals and their patch-based local treatment. △ Less

Submitted 22 February, 2017; v1 submitted 7 July, 2016; originally announced July 2016.

arXiv:1607.02005 [pdf, other]

Working Locally Thinking Globally - Part I: Theoretical Guarantees for Convolutional Sparse Coding

Authors: Vardan Papyan, Jeremias Sulam, Michael Elad

Abstract: The celebrated sparse representation model has led to remarkable results in various signal processing tasks in the last decade. However, despite its initial purpose of serving as a global prior for entire signals, it has been commonly used for modeling low dimensional patches due to the computational constraints it entails when deployed with learned dictionaries. A way around this problem has been… ▽ More The celebrated sparse representation model has led to remarkable results in various signal processing tasks in the last decade. However, despite its initial purpose of serving as a global prior for entire signals, it has been commonly used for modeling low dimensional patches due to the computational constraints it entails when deployed with learned dictionaries. A way around this problem has been proposed recently, adopting a convolutional sparse representation model. This approach assumes that the global dictionary is a concatenation of banded Circulant matrices. Although several works have presented algorithmic solutions to the global pursuit problem under this new model, very few truly-effective guarantees are known for the success of such methods. In the first of this two-part work, we address the theoretical aspects of the sparse convolutional model, providing the first meaningful answers to corresponding questions of uniqueness of solutions and success of pursuit algorithms. To this end, we generalize mathematical quantities, such as the $\ell_0$ norm, the mutual coherence and the Spark, to their counterparts in the convolutional setting, which intrinsically capture local measures of the global model. In a companion paper, we extend the analysis to a noisy regime, addressing the stability of the sparsest solutions and pursuit algorithms, and demonstrate practical approaches for solving the global pursuit problem via simple local processing. △ Less

Submitted 22 February, 2017; v1 submitted 7 July, 2016; originally announced July 2016.

arXiv:1602.00212 [pdf, other]

doi 10.1109/TSP.2016.2540599

Trainlets: Dictionary Learning in High Dimensions

Authors: Jeremias Sulam, Boaz Ophir, Michael Zibulevsky, Michael Elad

Abstract: Sparse representations has shown to be a very powerful model for real world signals, and has enabled the development of applications with notable performance. Combined with the ability to learn a dictionary from signal examples, sparsity-inspired algorithms are often achieving state-of-the-art results in a wide variety of tasks. Yet, these methods have traditionally been restricted to small dimens… ▽ More Sparse representations has shown to be a very powerful model for real world signals, and has enabled the development of applications with notable performance. Combined with the ability to learn a dictionary from signal examples, sparsity-inspired algorithms are often achieving state-of-the-art results in a wide variety of tasks. Yet, these methods have traditionally been restricted to small dimensions mainly due to the computational constraints that the dictionary learning problem entails. In the context of image processing, this implies handling small image patches. In this work we show how to efficiently handle bigger dimensions and go beyond the small patches in sparsity-based signal and image processing methods. We build our approach based on a new cropped wavelet decomposition, which enables a multi-scale analysis with virtually no border effects. We then employ this as the base dictionary within a double sparsity model to enable the training of adaptive dictionaries. To cope with the increase of training data, while at the same time improving the training performance, we present an Online Sparse Dictionary Learning (OSDL) algorithm to train this model effectively, enabling it to handle millions of examples. This work shows that dictionary learning can be up-scaled to tackle a new level of signal dimensions, obtaining large adaptable atoms that we call trainlets. △ Less

Submitted 12 May, 2016; v1 submitted 31 January, 2016; originally announced February 2016.

Showing 1–32 of 32 results for author: Sulam, J