-
ReLUs Are Sufficient for Learning Implicit Neural Representations
Authors:
Joseph Shenouda,
Yamin Zhou,
Robert D. Nowak
Abstract:
Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to…
▽ More
Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to remedy the spectral bias. This in turn enables its use for various INR tasks. Empirically, we demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons. Next, by leveraging recent theoretical works which characterize the kinds of functions ReLU neural networks learn, we provide a way to quantify the regularity of the learned function. This offers a principled approach to selecting the hyperparameters in INR architectures. We substantiate our claims through experiments in signal representation, super resolution, and computed tomography, demonstrating the versatility and effectiveness of our method. The code for all experiments can be found at https://github.com/joeshenouda/relu-inrs.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Authors:
Gantavya Bhatt,
Yifang Chen,
Arnav M. Das,
Jifan Zhang,
Sang T. Truong,
Stephen Mussmann,
Yinglun Zhu,
Jeffrey Bilmes,
Simon S. Du,
Kevin Jamieson,
Jordan T. Ash,
Robert D. Nowak
Abstract:
Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues t…
▽ More
Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
△ Less
Submitted 7 July, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
While Loops in Coq
Authors:
David Nowak,
Vlad Rusu
Abstract:
While loops are present in virtually all imperative programming languages. They are important both for practical reasons (performing a number of iterations not known in advance) and theoretical reasons (achieving Turing completeness). In this paper we propose an approach for incorporating while loops in an imperative language shallowly embedded in the Coq proof assistant. The main difficulty is th…
▽ More
While loops are present in virtually all imperative programming languages. They are important both for practical reasons (performing a number of iterations not known in advance) and theoretical reasons (achieving Turing completeness). In this paper we propose an approach for incorporating while loops in an imperative language shallowly embedded in the Coq proof assistant. The main difficulty is that proving the termination of while loops is nontrivial, or impossible in the case of non-termination, whereas Coq only accepts programs endowed with termination proofs. Our solution is based on a new, general method for defining possibly non-terminating recursive functions in Coq. We illustrate the approach by proving termination and partial correctness of a program on linked lists.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Weighted variation spaces and approximation by shallow ReLU networks
Authors:
Ronald DeVore,
Robert D. Nowak,
Rahul Parhi,
Jonathan W. Siegel
Abstract:
We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that int…
▽ More
We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $Ω$ whose approximation rates avoid the curse of dimensionality. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes.
The present paper is concerned with the definition of these novel model classes on domains $Ω$. The current definition of these model classes does not depend on the domain $Ω$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Authors:
Jifan Zhang,
Yifang Chen,
Gregory Canal,
Stephen Mussmann,
Arnav M. Das,
Gantavya Bhatt,
Yinglun Zhu,
Jeffrey Bilmes,
Simon Shaolei Du,
Kevin Jamieson,
Robert D Nowak
Abstract:
Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires…
▽ More
Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.
△ Less
Submitted 1 March, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression
Authors:
Joseph Shenouda,
Rahul Parhi,
Kangwook Lee,
Robert D. Nowak
Abstract:
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper unders…
▽ More
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper understanding of multi-output networks and their function-space characteristics. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This representer theorem establishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks, shedding new light on multi-task learning with neural networks. Finally, this paper develops a connection between weight-decay regularization and the multi-task lasso problem. This connection leads to novel bounds for layer widths in deep networks that depend on the intrinsic dimensions of the training data representations. This insight not only deepens the understanding of the deep network architectural requirements, but also yields a simple convex optimization method for deep neural network compression. The performance of this compression procedure is evaluated on various architectures.
△ Less
Submitted 9 March, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Filtered Iterative Denoising for Linear Inverse Problems
Authors:
Danica Fliss,
Willem Marais,
Robert D. Nowak
Abstract:
Iterative denoising algorithms (IDAs) have been tremendously successful in a range of linear inverse problems arising in signal and image processing. The classic instance of this is the famous Iterative Soft-Thresholding Algorithm (ISTA), based on soft-thresholding of wavelet coefficients. More modern approaches to IDAs replace soft-thresholding with a black-box denoiser, such as BM3D or a learned…
▽ More
Iterative denoising algorithms (IDAs) have been tremendously successful in a range of linear inverse problems arising in signal and image processing. The classic instance of this is the famous Iterative Soft-Thresholding Algorithm (ISTA), based on soft-thresholding of wavelet coefficients. More modern approaches to IDAs replace soft-thresholding with a black-box denoiser, such as BM3D or a learned deep neural network denoiser. These are often referred to as ``plug-and-play" (PnP) methods because, in principle, an off-the-shelf denoiser can be used for a variety of different inverse problems. The problem with PnP methods is that they may not provide the best solutions to a specific linear inverse problem; better solutions can often be obtained by a denoiser that is customized to the problem domain. A problem-specific denoiser, however, requires expensive re-engineering or re-learning which eliminates the simplicity and ease that makes PnP methods attractive in the first place. This paper proposes a new IDA that allows one to use a general, black-box denoiser more effectively via a simple linear filtering modification to the usual gradient update steps that accounts for the specific linear inverse problem. The proposed Filtered IDA (FIDA) is mathematically derived from the classical ISTA and wavelet denoising viewpoint. We show experimentally that FIDA can produce superior results compared to existing IDA methods with BM3D.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
Deep learning has been wildly successful in practice and most state-of-the-art machine learning methods are based on neural networks. Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep neural networks. In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of deep learning.…
▽ More
Deep learning has been wildly successful in practice and most state-of-the-art machine learning methods are based on neural networks. Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep neural networks. In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of deep learning. This framework precisely characterizes the functional properties of neural networks that are trained to fit to data. The key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory, which are all techniques deeply rooted in signal processing. This framework explains the effect of weight decay regularization in neural network training, the use of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems.
△ Less
Submitted 8 June, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Breaking the Architecture Barrier: A Method for Efficient Knowledge Transfer Across Networks
Authors:
Maciej A. Czyzewski,
Daniel Nowak,
Kamil Piechowiak
Abstract:
Transfer learning is a popular technique for improving the performance of neural networks. However, existing methods are limited to transferring parameters between networks with same architectures. We present a method for transferring parameters between neural networks with different architectures. Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures…
▽ More
Transfer learning is a popular technique for improving the performance of neural networks. However, existing methods are limited to transferring parameters between networks with same architectures. We present a method for transferring parameters between neural networks with different architectures. Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently. Compared to existing parameter prediction and random initialization methods, it significantly improves training efficiency and validation accuracy. In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training. DPIAT allows both researchers and neural architecture search systems to modify trained networks and reuse knowledge, avoiding the need for retraining from scratch. We also introduce a network architecture similarity measure, enabling users to choose the best source network without any training.
△ Less
Submitted 28 December, 2022;
originally announced December 2022.
-
PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks
Authors:
Liu Yang,
Jifan Zhang,
Joseph Shenouda,
Dimitris Papailiopoulos,
Kangwook Lee,
Robert D. Nowak
Abstract:
Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks w…
▽ More
Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of $\ell_2$ (not squared) norms of the input and output weights associated with each ReLU neuron. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.
△ Less
Submitted 5 July, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performanc…
▽ More
We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.
△ Less
Submitted 12 October, 2022; v1 submitted 18 September, 2021;
originally announced September 2021.
-
What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit activations to data. We propose a new function space, which is reminiscent of classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU ne…
▽ More
We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit activations to data. We propose a new function space, which is reminiscent of classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems over functions from this space. The function space consists of compositions of functions from the Banach spaces of second-order bounded variation in the Radon domain. These are Banach spaces with sparsity-promoting norms, giving insight into the role of sparsity in deep neural networks. The neural network solutions have skip connections and rank bounded weight matrices, providing new theoretical support for these common architectural choices. The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization. Finally, our analysis builds on techniques from variational spline theory, providing new connections between deep neural networks and splines.
△ Less
Submitted 26 September, 2021; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Extending Equational Monadic Reasoning with Monad Transformers
Authors:
Reynald Affeldt,
David Nowak
Abstract:
There is a recent interest for the verification of monadic programs using proof assistants. This line of research raises the question of the integration of monad transformers, a standard technique to combine monads. In this paper, we extend Monae, a Coq library for monadic equational reasoning, with monad transformers and we explain the benefits of this extension. Our starting point is the existin…
▽ More
There is a recent interest for the verification of monadic programs using proof assistants. This line of research raises the question of the integration of monad transformers, a standard technique to combine monads. In this paper, we extend Monae, a Coq library for monadic equational reasoning, with monad transformers and we explain the benefits of this extension. Our starting point is the existing theory of modular monad transformers, which provides a uniform treatment of operations. Using this theory, we simplify the formalization of models in Monae and we propose an approach to support monadic equational reasoning in the presence of monad transformers. We also use Monae to revisit the lifting theorems of modular monad transformers by providing equational proofs and explaining how to patch a known bug using a non-standard use of Coq that combines impredicative polymorphism and parametricity.
△ Less
Submitted 18 July, 2021; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Banach Space Representer Theorems for Neural Networks and Ridge Splines
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
We develop a variational framework to understand the properties of the functions learned by neural networks fit to data. We propose and study a family of continuous-domain linear inverse problems with total variation-like regularization in the Radon domain subject to data fitting constraints. We derive a representer theorem showing that finite-width, single-hidden layer neural networks are solutio…
▽ More
We develop a variational framework to understand the properties of the functions learned by neural networks fit to data. We propose and study a family of continuous-domain linear inverse problems with total variation-like regularization in the Radon domain subject to data fitting constraints. We derive a representer theorem showing that finite-width, single-hidden layer neural networks are solutions to these inverse problems. We draw on many techniques from variational spline theory and so we propose the notion of polynomial ridge splines, which correspond to single-hidden layer neural networks with truncated power functions as the activation function. The representer theorem is reminiscent of the classical reproducing kernel Hilbert space representer theorem, but we show that the neural network problem is posed over a non-Hilbertian Banach space. While the learning problems are posed in the continuous-domain, similar to kernel methods, the problems can be recast as finite-dimensional neural network training problems. These neural network training problems have regularizers which are related to the well-known weight decay and path-norm regularizers. Thus, our result gives insight into functional characteristics of trained neural networks and also into the design neural network regularizers. We also show that these regularizers promote neural network solutions with desirable generalization properties.
△ Less
Submitted 11 February, 2021; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Local Smoothing for the Schrödinger Equation on a Multi-Warped Product Manifold with Inflection-Transmission Trap**
Authors:
Hans Christianson,
Derrick Nowak
Abstract:
Geodesic trap** is an obstruction to dispersive estimates for solutions to the Schrödinger equation. Surprisingly little is known about solutions to the Schrödinger equation on manifolds with degenerate trap**, since the conditions for degenerate trap** are not stable under perturbations. In this paper we extend some of the results of [CM14] on inflection-transmission type trap** on warped…
▽ More
Geodesic trap** is an obstruction to dispersive estimates for solutions to the Schrödinger equation. Surprisingly little is known about solutions to the Schrödinger equation on manifolds with degenerate trap**, since the conditions for degenerate trap** are not stable under perturbations. In this paper we extend some of the results of [CM14] on inflection-transmission type trap** on warped product manifolds to the case of multi-warped products. The main result is that the trap** on one cross section does not interact with the trap** on other cross sections provided the manifold has only one infinite end and only inflection-transmission type trap**.
△ Less
Submitted 2 April, 2021; v1 submitted 27 May, 2020;
originally announced May 2020.
-
A Trustful Monad for Axiomatic Reasoning with Probability and Nondeterminism
Authors:
Reynald Affeldt,
Jacques Garrigue,
David Nowak,
Takafumi Saikawa
Abstract:
The algebraic properties of the combination of probabilistic choice and nondeterministic choice have long been a research topic in program semantics. This paper explains a formalization in the Coq proof assistant of a monad equipped with both choices: the geometrically convex monad. This formalization has an immediate application: it provides a model for a monad that implements a non-trivial inter…
▽ More
The algebraic properties of the combination of probabilistic choice and nondeterministic choice have long been a research topic in program semantics. This paper explains a formalization in the Coq proof assistant of a monad equipped with both choices: the geometrically convex monad. This formalization has an immediate application: it provides a model for a monad that implements a non-trivial interface which allows for proofs by equational reasoning using probabilistic and nondeterministic effects. We explain the technical choices we made to go from the literature to a complete Coq formalization, from which we identify reusable theories about mathematical structures such as convex spaces and concrete categories, and that we integrate in a framework for monadic equational reasoning.
△ Less
Submitted 25 October, 2020; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Optimal Confidence Regions for the Multinomial Parameter
Authors:
Matthew L. Malloy,
Ardhendu Tripathy,
Robert D. Nowak
Abstract:
Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider an empirical distribution $\widehat{\boldsymbol{p}}$ generated from $n$ iid realizations of a random variable that takes one of $k$ possible values according to…
▽ More
Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider an empirical distribution $\widehat{\boldsymbol{p}}$ generated from $n$ iid realizations of a random variable that takes one of $k$ possible values according to an unknown distribution $\boldsymbol{p}$. This is analogous to a single draw from a multinomial distribution. A confidence region is a subset of the probability simplex that depends on $\widehat{\boldsymbol{p}}$ and contains the unknown $\boldsymbol{p}$ with a specified confidence. This paper shows how one can construct minimum average volume confidence regions, answering a long standing question. We also show the optimality of the regions directly translates to optimal confidence intervals of linear functionals such as the mean, implying sample complexity and regret improvements for adaptive machine learning algorithms.
△ Less
Submitted 29 January, 2021; v1 submitted 3 February, 2020;
originally announced February 2020.
-
A Ray Tracing Technique for the Navigation on a Non-convex Pareto Front
Authors:
Dimitri Nowak,
Karl-Heinz Küfer
Abstract:
A new interactive approach to navigate on approximations of in general non-convex but connected Pareto fronts is introduced. Given a finite number of precalculated representative Pareto-efficient solutions, an adapted Delaunay triangulation is generated. Based on interpolation and ray tracing techniques, real time navigation in the vicinity of Pareto-optimal solutions is made possible.
A new interactive approach to navigate on approximations of in general non-convex but connected Pareto fronts is introduced. Given a finite number of precalculated representative Pareto-efficient solutions, an adapted Delaunay triangulation is generated. Based on interpolation and ray tracing techniques, real time navigation in the vicinity of Pareto-optimal solutions is made possible.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
The Role of Neural Network Activation Functions
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
A wide variety of activation functions have been proposed for neural networks. The Rectified Linear Unit (ReLU) is especially popular today. There are many practical reasons that motivate the use of the ReLU. This paper provides new theoretical characterizations that support the use of the ReLU, its variants such as the leaky ReLU, as well as other activation functions in the case of univariate, s…
▽ More
A wide variety of activation functions have been proposed for neural networks. The Rectified Linear Unit (ReLU) is especially popular today. There are many practical reasons that motivate the use of the ReLU. This paper provides new theoretical characterizations that support the use of the ReLU, its variants such as the leaky ReLU, as well as other activation functions in the case of univariate, single-hidden layer feedforward neural networks. Our results also explain the importance of commonly used strategies in the design and training of neural networks such as "weight decay" and "path-norm" regularization, and provide a new justification for the use of "skip connections" in network architectures. These new insights are obtained through the lens of spline theory. In particular, we show how neural network training problems are related to infinite-dimensional optimizations posed over Banach spaces of functions whose solutions are well-known to be fractional and polynomial splines, where the particular Banach space (which controls the order of the spline) depends on the choice of activation function.
△ Less
Submitted 16 October, 2020; v1 submitted 5 October, 2019;
originally announced October 2019.
-
(Co)inductive Proof Systems for Compositional Proofs in Reachability Logic
Authors:
Vlad Rusu,
David Nowak
Abstract:
Reachability Logic is a formalism that can be used, among others, for expressing partial-correctness properties of transition systems. In this paper we present three proof systems for this formalism, all of which are sound and complete and inherit the coinductive nature of the logic. The proof systems differ, however, in several aspects. First, they use induction and coinduction in different propo…
▽ More
Reachability Logic is a formalism that can be used, among others, for expressing partial-correctness properties of transition systems. In this paper we present three proof systems for this formalism, all of which are sound and complete and inherit the coinductive nature of the logic. The proof systems differ, however, in several aspects. First, they use induction and coinduction in different proportions. The second aspect regards compositionality, broadly meaning their ability to prove simpler formulas on smaller systems, and to reuse those formulas as lemmas for more complex formulas on larger systems. The third aspect is the difficulty of their soundness proofs. We show that the more induction a proof system uses, and the more specialised is its use of coinduction (with respect to our problem domain), the more compositional the proof system is, but the more difficult its soundness proof becomes. We also briefly present mechanisations of these results in the Isabelle/HOL and Coq proof assistants.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
MaxiMin Active Learning in Overparameterized Model Classes}
Authors:
Mina Karzand,
Robert D. Nowak
Abstract:
Generating labeled training datasets has become a major bottleneck in Machine Learning (ML) pipelines. Active ML aims to address this issue by designing learning algorithms that automatically and adaptively select the most informative examples for labeling so that human time is not wasted labeling irrelevant, redundant, or trivial examples. This paper proposes a new approach to active ML with nonp…
▽ More
Generating labeled training datasets has become a major bottleneck in Machine Learning (ML) pipelines. Active ML aims to address this issue by designing learning algorithms that automatically and adaptively select the most informative examples for labeling so that human time is not wasted labeling irrelevant, redundant, or trivial examples. This paper proposes a new approach to active ML with nonparametric or overparameterized models such as kernel methods and neural networks. In the context of binary classification, the new approach is shown to possess a variety of desirable properties that allow active learning algorithms to automatically and efficiently identify decision boundaries and data clusters.
△ Less
Submitted 28 April, 2020; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Concentration Inequalities for the Empirical Distribution
Authors:
Jay Mardia,
Jiantao Jiao,
Ervin Tánczos,
Robert D. Nowak,
Tsachy Weissman
Abstract:
We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining…
▽ More
We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining tighter concentration inequalities for $L_1$ deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviors between small and large sample sizes compared to the alphabet size.
△ Less
Submitted 18 October, 2019; v1 submitted 18 September, 2018;
originally announced September 2018.
-
Tensor Methods for Nonlinear Matrix Completion
Authors:
Greg Ongie,
Daniel Pimentel-Alarcón,
Laura Balzano,
Rebecca Willett,
Robert D. Nowak
Abstract:
In the low-rank matrix completion (LRMC) problem, the low-rank assumption means that the columns (or rows) of the matrix to be completed are points on a low-dimensional linear algebraic variety. This paper extends this thinking to cases where the columns are points on a low-dimensional nonlinear algebraic variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC). Matrices whose…
▽ More
In the low-rank matrix completion (LRMC) problem, the low-rank assumption means that the columns (or rows) of the matrix to be completed are points on a low-dimensional linear algebraic variety. This paper extends this thinking to cases where the columns are points on a low-dimensional nonlinear algebraic variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC). Matrices whose columns belong to a union of subspaces are an important special case. We propose a LADMC algorithm that leverages existing LRMC methods on a tensorized representation of the data. For example, a second-order tensorized representation is formed by taking the Kronecker product of each column with itself, and we consider higher order tensorizations as well. This approach will succeed in many cases where traditional LRMC is guaranteed to fail because the data are low-rank in the tensorized representation but not in the original representation. We provide a formal mathematical justification for the success of our method. In particular, we give bounds of the rank of these data in the tensorized representation, and we prove sampling requirements to guarantee uniqueness of the solution. We also provide experimental results showing that the new approach outperforms existing state-of-the-art methods for matrix completion under a union of subspaces model.
△ Less
Submitted 4 September, 2020; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Nanoscale spectroscopic studies of two different physical origins of the tip-enhanced force: dipole and thermal
Authors:
Junghoon Jahng,
Sung Park,
Will A. Morrison,
Hyuksang Kwon,
Derek Nowak,
Eric O. Potma,
Eun Seong Lee
Abstract:
When light illuminates the junction formed between a sharp metal tip and a sample, different mechanisms can con-tribute to the measured photo-induced force simultaneously. Of particular interest are the instantaneous force be-tween the induced dipoles in the tip and in the sample and the force related to thermal heating of the junction. A key difference between these two force mechanisms is their…
▽ More
When light illuminates the junction formed between a sharp metal tip and a sample, different mechanisms can con-tribute to the measured photo-induced force simultaneously. Of particular interest are the instantaneous force be-tween the induced dipoles in the tip and in the sample and the force related to thermal heating of the junction. A key difference between these two force mechanisms is their spectral behaviors. The magnitude of the thermal response follows a dissipative Lorentzian lineshape, which measures the heat exchange between light and matter, while the induced dipole response exhibits a dispersive spectrum and relates to the real part of the material polarizability. Be-cause the two interactions are sometimes comparable in magnitude, the origin of the nanoscale chemical selectivity in the recently developed photo-induced force microscopy (PiFM) is often unclear. Here, we demonstrate theoretically and experimentally how light absorption followed by nanoscale thermal expansion generates a photo-induced force in PiFM. Furthermore, we explain how this thermal force can be distinguished from the induced dipole force by tuning the relaxation time of samples. Our analysis presented here helps the interpretation of nanoscale chemical measure-ments of heterogeneous materials and sheds light on the nature of light-matter coupling in van der Waals materials.
△ Less
Submitted 7 March, 2018; v1 submitted 7 November, 2017;
originally announced November 2017.
-
Nearest-neighbor Kitaev exchange blocked by charge order in electron doped $α$-RuCl$_{3}$
Authors:
A. Koitzsch,
C. Habenicht,
E. Mueller,
M. Knupfer,
B. Buechner,
S. Kretschmer,
M. Richter,
J. van den Brink,
F. Boerrnert,
D. Nowak,
A. Isaeva,
Th. Doert
Abstract:
A quantum spin-liquid might be realized in $α$-RuCl$_{3}$, a honeycomb-lattice magnetic material with substantial spin-orbit coupling. Moreover, $α$-RuCl$_{3}$ is a Mott insulator, which implies the possibility that novel exotic phases occur upon do**. Here, we study the electronic structure of this material when intercalated with potassium by photoemission spectroscopy, electron energy loss spe…
▽ More
A quantum spin-liquid might be realized in $α$-RuCl$_{3}$, a honeycomb-lattice magnetic material with substantial spin-orbit coupling. Moreover, $α$-RuCl$_{3}$ is a Mott insulator, which implies the possibility that novel exotic phases occur upon do**. Here, we study the electronic structure of this material when intercalated with potassium by photoemission spectroscopy, electron energy loss spectroscopy, and density functional theory calculations. We obtain a stable stoichiometry at K$_{0.5}$RuCl$_3$. This gives rise to a peculiar charge disproportionation into formally Ru$^{2+}$ (4$d^6$) and Ru$^{3+}$ (4$d^5$). Every Ru 4$d^5$ site with one hole in the $t_{2g}$ shell is surrounded by nearest neighbors of 4$d^6$ character, where the $t_{2g}$ level is full and magnetically inert. Thus, each type of Ru sites forms a triangular lattice and nearest-neighbor interactions of the original honeycomb are blocked.
△ Less
Submitted 13 October, 2017; v1 submitted 15 September, 2017;
originally announced September 2017.
-
Low temperature enhancement of ferromagnetic Kitaev correlations in α-RuCl3
Authors:
Andreas Koitzsch,
Eric Mueller,
Martin Knupfer,
Bernd Buechner,
Domenic Nowak,
Anna Isaeva,
Thomas Doert,
Markus Grueninger,
Satoshi Nishimoto,
Jeroen van den Brink
Abstract:
Kitaev-type interactions between neighbouring magnetic moments emerge in the honeycomb material $α$-RuCl3. It is debated however whether these Kitaev interactions are ferromagnetic or antiferromagnetic. With electron energy loss spectroscopy (EELS) we study the lowest excitation across the Mott-Hubbard gap, which involves a d4 triplet in the final state and therefore is sensitive to nearest-neighb…
▽ More
Kitaev-type interactions between neighbouring magnetic moments emerge in the honeycomb material $α$-RuCl3. It is debated however whether these Kitaev interactions are ferromagnetic or antiferromagnetic. With electron energy loss spectroscopy (EELS) we study the lowest excitation across the Mott-Hubbard gap, which involves a d4 triplet in the final state and therefore is sensitive to nearest-neighbor spin-spin correlations. At low temperature the spectral weight of these triplets is strongly enhanced, in accordance with optical data. We show that the magnetic correlation function that determines this EELS spectral weight is directly related to a Kitaev-type spin-spin correlator and that the temperature dependence agrees very well with the results of a microscopic magnetic Hamiltonian for $α$-RuCl3 with ferromagnetic Kitaev coupling.
△ Less
Submitted 8 September, 2017;
originally announced September 2017.
-
Algebraic Variety Models for High-Rank Matrix Completion
Authors:
Greg Ongie,
Rebecca Willett,
Robert D. Nowak,
Laura Balzano
Abstract:
We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e. each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly high-rank, but it becomes low-rank after map** each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, incl…
▽ More
We consider a generalization of low-rank matrix completion to the case where the data belongs to an algebraic variety, i.e. each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly high-rank, but it becomes low-rank after map** each column to a higher dimensional space of monomial features. Many well-studied extensions of linear models, including affine subspaces and their union, can be described by a variety model. In addition, varieties can be used to model a richer class of nonlinear quadratic and higher degree curves and surfaces. We study the sampling requirements for matrix completion under a variety model with a focus on a union of affine subspaces. We also propose an efficient matrix completion algorithm that minimizes a convex or non-convex surrogate of the rank of the matrix of monomial features. Our algorithm uses the well-known "kernel trick" to avoid working directly with the high-dimensional monomial matrix. We show the proposed algorithm is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques in experiments with real data.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Large field-induced gap of Kitaev-Heisenberg paramagnons in $α$-RuCl$_{3}$
Authors:
Richard Hentrich,
Anja U. B. Wolter,
Xenophon Zotos,
Wolfram Brenig,
Domenic Nowak,
Anna Isaeva,
Thomas Doert,
Arnab Banerjee,
Paula Lampen-Kelley,
David G. Mandrus,
Stephen E. Nagler,
Jennifer Sears,
Young-June Kim,
Bernd Büchner,
Christian Hess
Abstract:
The honeycomb Kitaev-Heisenberg model is a source of a quantum spin liquid with Majorana fermions and gauge flux excitations as fractional quasiparticles. In the quest of finding a pertinent material, $α$-RuCl$_{3}$ recently emerged as a prime candidate. Here we unveil highly unusual low-temperature heat conductivity $κ$ of $α$-RuCl$_{3}$: beyond a magnetic field of $B_c\approx$ 7.5 T, $κ$ increas…
▽ More
The honeycomb Kitaev-Heisenberg model is a source of a quantum spin liquid with Majorana fermions and gauge flux excitations as fractional quasiparticles. In the quest of finding a pertinent material, $α$-RuCl$_{3}$ recently emerged as a prime candidate. Here we unveil highly unusual low-temperature heat conductivity $κ$ of $α$-RuCl$_{3}$: beyond a magnetic field of $B_c\approx$ 7.5 T, $κ$ increases by about one order of magnitude, resulting in a large magnetic field dependent peak at about 7 K, both for in-plane as well as out-of-plane transport. This clarifies the unusual magnetic field dependence unambiguously to be the result of severe scattering of phonons off putative Kitaev-Heisenberg excitations in combination with a drastic field-induced change of the magnetic excitation spectrum. In particular, an unexpectedly large energy gap arises, which increases approximately linearly with the magnetic field and reaches a remarkably large $\hbarω_0/k_B\approx $ 50 K at 18 T.
△ Less
Submitted 24 March, 2017;
originally announced March 2017.
-
J$_{eff}$ description of the honeycomb Mott insulator $α$-RuCl$_3$
Authors:
A. Koitzsch,
C. Habenicht,
E. Mueller,
M. Knupfer,
B. Buechner,
H. Kandpal,
J. van den Brink,
D. Nowak,
A. Isaeva,
Th. Doert
Abstract:
Novel ground states might be realized in honeycomb lattices with strong spin-orbit coupling. Here we study the electronic structure of $α$-RuCl$_3$, in which the Ru ions are in a d5 configuration and form a honeycomb lattice, by angle-resolved photoemission, x-ray photoemission and electron energy loss spectroscopy supported by density functional theory and multiplet calculations. We find that…
▽ More
Novel ground states might be realized in honeycomb lattices with strong spin-orbit coupling. Here we study the electronic structure of $α$-RuCl$_3$, in which the Ru ions are in a d5 configuration and form a honeycomb lattice, by angle-resolved photoemission, x-ray photoemission and electron energy loss spectroscopy supported by density functional theory and multiplet calculations. We find that $α$-RuCl$_3$ is a Mott insulator with significant spin-orbit coupling, whose low energy electronic structure is naturally mapped onto Jeff states. This makes $α$-RuCl$_3$ a promising candidate for the realization of Kitaev physics. Relevant electronic parameters such as the Hubbard energy U, the crystal field splitting 10Dq and the charge transfer energy are evaluated. Furthermore, we observe significant Cl photodesorption with time, which must be taken into account when interpreting photoemission and other surface sensitive experiments.
△ Less
Submitted 17 March, 2016;
originally announced March 2016.
-
A Characterization of Deterministic Sampling Patterns for Low-Rank Matrix Completion
Authors:
Daniel L. Pimentel-Alarcón,
Nigel Boston,
Robert D. Nowak
Abstract:
Low-rank matrix completion (LRMC) problems arise in a wide variety of applications. Previous theory mainly provides conditions for completion under missing-at-random samplings. This paper studies deterministic conditions for completion. An incomplete $d \times N$ matrix is finitely rank-$r$ completable if there are at most finitely many rank-$r$ matrices that agree with all its observed entries. F…
▽ More
Low-rank matrix completion (LRMC) problems arise in a wide variety of applications. Previous theory mainly provides conditions for completion under missing-at-random samplings. This paper studies deterministic conditions for completion. An incomplete $d \times N$ matrix is finitely rank-$r$ completable if there are at most finitely many rank-$r$ matrices that agree with all its observed entries. Finite completability is the tip** point in LRMC, as a few additional samples of a finitely completable matrix guarantee its unique completability. The main contribution of this paper is a deterministic sampling condition for finite completability. We use this to also derive deterministic sampling conditions for unique completability that can be efficiently verified. We also show that under uniform random sampling schemes, these conditions are satisfied with high probability if $O(\max\{r,\log d\})$ entries per column are observed. These findings have several implications on LRMC regarding lower bounds, sample and computational complexity, the role of coherence, adaptive settings and the validation of any completion algorithm. We complement our theoretical results with experiments that support our findings and motivate future analysis of uncharted sampling regimes.
△ Less
Submitted 11 October, 2016; v1 submitted 9 March, 2015;
originally announced March 2015.
-
Deterministic Conditions for Subspace Identifiability from Incomplete Sampling
Authors:
Daniel L. Pimentel-Alarcón,
Robert D. Nowak,
Nigel Boston
Abstract:
Consider a generic $r$-dimensional subspace of $\mathbb{R}^d$, $r<d$, and suppose that we are only given projections of this subspace onto small subsets of the canonical coordinates. The paper establishes necessary and sufficient deterministic conditions on the subsets for subspace identifiability.
Consider a generic $r$-dimensional subspace of $\mathbb{R}^d$, $r<d$, and suppose that we are only given projections of this subspace onto small subsets of the canonical coordinates. The paper establishes necessary and sufficient deterministic conditions on the subsets for subspace identifiability.
△ Less
Submitted 24 May, 2015; v1 submitted 2 October, 2014;
originally announced October 2014.
-
Sparse Estimation with Strongly Correlated Variables using Ordered Weighted L1 Regularization
Authors:
Mario A. T. Figueiredo,
Robert D. Nowak
Abstract:
This paper studies ordered weighted L1 (OWL) norm regularization for sparse estimation problems with strongly correlated variables. We prove sufficient conditions for clustering based on the correlation/colinearity of variables using the OWL norm, of which the so-called OSCAR is a particular case. Our results extend previous ones for OSCAR in several ways: for the squared error loss, our condition…
▽ More
This paper studies ordered weighted L1 (OWL) norm regularization for sparse estimation problems with strongly correlated variables. We prove sufficient conditions for clustering based on the correlation/colinearity of variables using the OWL norm, of which the so-called OSCAR is a particular case. Our results extend previous ones for OSCAR in several ways: for the squared error loss, our conditions hold for the more general OWL norm and under weaker assumptions; we also establish clustering conditions for the absolute error loss, which is, as far as we know, a novel result. Furthermore, we characterize the statistical performance of OWL norm regularization for generative models in which certain clusters of regression variables are strongly (even perfectly) correlated, but variables in different clusters are uncorrelated. We show that if the true p-dimensional signal generating the data involves only s of the clusters, then O(s log p) samples suffice to accurately estimate the signal, regardless of the number of coefficients within the clusters. The estimation of s-sparse signals with completely independent variables requires just as many measurements. In other words, using the OWL we pay no price (in terms of the number of measurements) for the presence of strongly correlated variables.
△ Less
Submitted 13 September, 2014;
originally announced September 2014.
-
Active Learning for Undirected Graphical Model Selection
Authors:
Divyanshu Vats,
Robert D. Nowak,
Richard G. Baraniuk
Abstract:
This paper studies graphical model selection, i.e., the problem of estimating a graph of statistical relationships among a collection of random variables. Conventional graphical model selection algorithms are passive, i.e., they require all the measurements to have been collected before processing begins. We propose an active learning algorithm that uses junction tree representations to adapt futu…
▽ More
This paper studies graphical model selection, i.e., the problem of estimating a graph of statistical relationships among a collection of random variables. Conventional graphical model selection algorithms are passive, i.e., they require all the measurements to have been collected before processing begins. We propose an active learning algorithm that uses junction tree representations to adapt future measurements based on the information gathered from prior measurements. We prove that, under certain conditions, our active learning algorithm requires fewer scalar measurements than any passive algorithm to reliably estimate a graph. A range of numerical results validate our theory and demonstrates the benefits of active learning.
△ Less
Submitted 13 April, 2014;
originally announced April 2014.
-
Investigating Active Galactic Nuclei Variability Using Combined Multi-Quarter Kepler Data
Authors:
Mitchell Revalski,
Dawid Nowak,
Paul J. Wiita,
Ann E. Wehrle,
Stephen C. Unwin
Abstract:
We have used photometry from the Kepler satellite to characterize the variability of four radio-loud active galactic nuclei (AGN) on timescales from years to minutes. The Kepler satellite produced nearly continuous high precision data sets which provided better temporal coverage than possible with ground based observations. We have now accumulated eleven quarters of data, eight of which were repor…
▽ More
We have used photometry from the Kepler satellite to characterize the variability of four radio-loud active galactic nuclei (AGN) on timescales from years to minutes. The Kepler satellite produced nearly continuous high precision data sets which provided better temporal coverage than possible with ground based observations. We have now accumulated eleven quarters of data, eight of which were reported in our previous paper. In addition to constructing power spectral densities (PSDs) and characterizing the variability of the last three quarters, we have linked together the individual quarters using a multiplicative scaling process, providing data sets spanning ~2.8 years with >98% coverage at a 30 minute sampling rate. We compute PSDs on these connected data sets that yield power law slopes at low frequencies in the approximate range of -1.5 to -2.0, with white noise seen at higher frequencies. These PSDs are similar to those of both the individual quarters and to those of ground based optical observations of other AGN. We also have explored a PSD binning method intended to reduce a bias toward shallow slope fits by evenly distributing the points within the PSDs. This tends to steepen the computed PSD slopes, especially when the low frequencies are relatively poorly fit. We detected flares lasting several days in which the brightness increased by ~15-20% in one object, as well a smaller flare in another. Two AGN showed only small, ~1-2%, fluctuations in brightness.
△ Less
Submitted 18 March, 2014; v1 submitted 17 November, 2013;
originally announced November 2013.
-
Near-Optimal Adaptive Compressed Sensing
Authors:
Matthew L. Malloy,
Robert D. Nowak
Abstract:
This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be near-optimal in that it succeeds at the lowest possible signal-to-noise-ratio (SNR) levels, improving on previous work in adaptive compressed sensing. Like traditional compressed sensing based on random non-adaptive…
▽ More
This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be near-optimal in that it succeeds at the lowest possible signal-to-noise-ratio (SNR) levels, improving on previous work in adaptive compressed sensing. Like traditional compressed sensing based on random non-adaptive design matrices, the CASS algorithm requires only k log n measurements to recover a k-sparse signal of dimension n. However, CASS succeeds at SNR levels that are a factor log n less than required by standard compressed sensing. From the point of view of constructing and implementing the sensing operation as well as computing the reconstruction, the proposed algorithm is substantially less computationally intensive than standard compressed sensing. CASS is also demonstrated to perform considerably better in practice through simulation. To the best of our knowledge, this is the first demonstration of an adaptive compressed sensing algorithm with near-optimal theoretical guarantees and excellent practical performance. This paper also shows that methods like compressed sensing, group testing, and pooling have an advantage beyond simply reducing the number of measurements or tests -- adaptive versions of such methods can also improve detection and estimation performance when compared to non-adaptive direct (uncompressed) sensing.
△ Less
Submitted 29 April, 2014; v1 submitted 26 June, 2013;
originally announced June 2013.
-
Query Complexity of Derivative-Free Optimization
Authors:
Kevin G. Jamieson,
Robert D. Nowak,
Benjamin Recht
Abstract:
This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm…
▽ More
This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.
△ Less
Submitted 11 September, 2012;
originally announced September 2012.
-
The Sample Complexity of Search over Multiple Populations
Authors:
Matthew L. Malloy,
Gongguo Tang,
Robert D. Nowak
Abstract:
This paper studies the sample complexity of searching over multiple populations. We consider a large number of populations, each corresponding to either distribution P0 or P1. The goal of the search problem studied here is to find one population corresponding to distribution P1 with as few samples as possible. The main contribution is to quantify the number of samples needed to correctly find one…
▽ More
This paper studies the sample complexity of searching over multiple populations. We consider a large number of populations, each corresponding to either distribution P0 or P1. The goal of the search problem studied here is to find one population corresponding to distribution P1 with as few samples as possible. The main contribution is to quantify the number of samples needed to correctly find one such population. We consider two general approaches: non-adaptive sampling methods, which sample each population a predetermined number of times until a population following P1 is found, and adaptive sampling methods, which employ sequential sampling schemes for each population. We first derive a lower bound on the number of samples required by any sampling scheme. We then consider an adaptive procedure consisting of a series of sequential probability ratio tests, and show it comes within a constant factor of the lower bound. We give explicit expressions for this constant when samples of the populations follow Gaussian and Bernoulli distributions. An alternative adaptive scheme is discussed which does not require full knowledge of P1, and comes within a constant factor of the optimal scheme. For comparison, a lower bound on the sampling requirements of any non-adaptive scheme is presented.
△ Less
Submitted 1 May, 2013; v1 submitted 6 September, 2012;
originally announced September 2012.
-
Experimental investigation of transverse spin asymmetries in muon-p SIDIS processes: Sivers asymmetries
Authors:
C. Adolph,
M. G. Alekseev,
V. Yu. Alexakhin,
Yu. Alexandrov,
G. D. Alexeev,
A. Amoroso,
A. A. Antonov,
A. Austregesilo,
B. Badelek,
F. Balestra,
J. Barth,
G. Baum,
Y. Bedfer,
J. Bernhard,
R. Bertini,
M. Bettinelli,
K. Bicker,
J. Bieling,
R. Birsa,
J. Bisplinghoff,
P. Bordalo,
F. Bradamante,
C. Braun,
A. Bravar,
A. Bressan
, et al. (194 additional authors not shown)
Abstract:
The COMPASS Collaboration at CERN has measured the transverse spin azimuthal asymmetry of charged hadrons produced in semi-inclusive deep inelastic scattering using a 160 GeV positive muon beam and a transversely polarised NH_3 target. The Sivers asymmetry of the proton has been extracted in the Bjorken x range 0.003<x<0.7. The new measurements have small statistical and systematic uncertainties o…
▽ More
The COMPASS Collaboration at CERN has measured the transverse spin azimuthal asymmetry of charged hadrons produced in semi-inclusive deep inelastic scattering using a 160 GeV positive muon beam and a transversely polarised NH_3 target. The Sivers asymmetry of the proton has been extracted in the Bjorken x range 0.003<x<0.7. The new measurements have small statistical and systematic uncertainties of a few percent and confirm with considerably better accuracy the previous COMPASS measurement. The Sivers asymmetry is found to be compatible with zero for negative hadrons and positive for positive hadrons, a clear indication of a spin-orbit coupling of quarks in a transversely polarised proton. As compared to measurements at lower energy, a smaller Sivers asymmetry for positive hadrons is found in the region x > 0.03. The asymmetry is different from zero and positive also in the low x region, where sea-quarks dominate. The kinematic dependence of the asymmetry has also been investigated and results are given for various intervals of hadron and virtual photon fractional energy. In contrast to the case of the Collins asymmetry, the results on the Sivers asymmetry suggest a strong dependence on the four-momentum transfer to the nucleon, in agreement with the most recent calculations.
△ Less
Submitted 23 May, 2012;
originally announced May 2012.
-
Experimental investigation of transverse spin asymmetries in muon-p SIDIS processes: Collins asymmetries
Authors:
C. Adolph,
M. G. Alekseev,
V. Yu. Alexakhin,
Yu. Alexandrov,
G. D. Alexeev,
A. Amoroso,
A. A. Antonov,
A. Austregesilo,
B. Badelek,
F. Balestra,
J. Barth,
G. Baum,
Y. Bedfer,
J. Bernhard,
R. Bertini,
M. Bettinelli,
K. Bicker,
J. Bieling,
R. Birsa,
J. Bisplinghoff,
P. Bordalo,
F. Bradamante,
C. Braun,
A. Bravar,
A. Bressan
, et al. (194 additional authors not shown)
Abstract:
The COMPASS Collaboration at CERN has measured the transverse spin azimuthal asymmetry of charged hadrons produced in semi-inclusive deep inelastic scattering using a 160 GeV positive muon beam and a transversely polarised NH_3 target. The Collins asymmetry of the proton was extracted in the Bjorken x range 0.003<x<0.7. These new measurements confirm with higher accuracy previous measurements from…
▽ More
The COMPASS Collaboration at CERN has measured the transverse spin azimuthal asymmetry of charged hadrons produced in semi-inclusive deep inelastic scattering using a 160 GeV positive muon beam and a transversely polarised NH_3 target. The Collins asymmetry of the proton was extracted in the Bjorken x range 0.003<x<0.7. These new measurements confirm with higher accuracy previous measurements from the COMPASS and HERMES collaborations, which exhibit a definite effect in the valence quark region. The asymmetries for negative and positive hadrons are similar in magnitude and opposite in sign. They are compatible with model calculations in which the u-quark transversity is opposite in sign and somewhat larger than the d-quark transversity distribution function. The asymmetry is extracted as a function of Bjorken $x$, the relative hadron energy $z$ and the hadron transverse momentum p_T^h. The high statistics and quality of the data also allow for more detailed investigations of the dependence on the kinematic variables. These studies confirm the leading-twist nature of the Collins asymmetry.
△ Less
Submitted 23 May, 2012;
originally announced May 2012.
-
Near-Optimal Compressive Binary Search
Authors:
Matthew L. Malloy,
Robert D. Nowak
Abstract:
We propose a simple modification to the recently proposed compressive binary search. The modification removes an unnecessary and suboptimal factor of log log n from the SNR requirement, making the procedure optimal (up to a small constant). Simulations show that the new procedure performs significantly better in practice as well. We also contrast this problem with the more well known problem of no…
▽ More
We propose a simple modification to the recently proposed compressive binary search. The modification removes an unnecessary and suboptimal factor of log log n from the SNR requirement, making the procedure optimal (up to a small constant). Simulations show that the new procedure performs significantly better in practice as well. We also contrast this problem with the more well known problem of noisy binary search.
△ Less
Submitted 8 May, 2012; v1 submitted 8 March, 2012;
originally announced March 2012.
-
Transverse spin effects in hadron-pair production from semi-inclusive deep inelastic scattering
Authors:
C. Adolph,
M. G. Alekseev,
V. Yu. Alexakhin,
Yu. Alexandrov,
G. D. Alexeev,
A. Amoroso,
A. A. Antonov,
A. Austregesilo,
B. Badellek,
F. Balestra,
J. Barth,
G. Baum,
Y. Bedfer,
J. Bernhard,
R. Bertini,
M. Bettinelli,
K. Bicker,
J. Bieling,
R. Birsa,
J. Bisplinghoff,
P. Bordalo,
F. Bradamante,
C. Braun,
A. Bravar,
A. Bressan
, et al. (191 additional authors not shown)
Abstract:
First measurements of azimuthal asymmetries in hadron-pair production in deep-inelastic scattering of muons on transversely polarised ^6LiD (deuteron) and NH_3 (proton) targets are presented. The data were taken in the years 2002-2004 and 2007 with the COMPASS spectrometer using a muon beam of 160 GeV/c at the CERN SPS. The asymmetries provide access to the transversity distribution functions, wit…
▽ More
First measurements of azimuthal asymmetries in hadron-pair production in deep-inelastic scattering of muons on transversely polarised ^6LiD (deuteron) and NH_3 (proton) targets are presented. The data were taken in the years 2002-2004 and 2007 with the COMPASS spectrometer using a muon beam of 160 GeV/c at the CERN SPS. The asymmetries provide access to the transversity distribution functions, without involving the Collins effect as in single hadron production. The sizeable asymmetries measured on the NH_ target indicate non-vanishing u-quark transversity and two-hadron interference fragmentation functions. The small asymmetries measured on the ^6LiD target can be interpreted as indication for a cancellation of u- and d-quark transversities.
△ Less
Submitted 1 June, 2012; v1 submitted 28 February, 2012;
originally announced February 2012.
-
Leading order determination of the gluon polarisation from DIS events with high-p_T hadron pairs
Authors:
C. Adolph,
M. G. Alekseev,
V. Yu. Alexakhin,
Yu. Alexandrov,
G. D. Alexeev,
A. Amoroso,
A. A. Antonov,
A. Austregesilo,
B. Badelek,
F. Balestra,
J. Barth,
G. Baum,
Y. Bedfer,
J. Bernhard,
R. Bertini,
M. Bettinelli,
K. Bicker,
J. Bieling,
R. Birsa,
J. Bisplinghoff,
P. Bordalo,
F. Bradamante,
C. Braun,
A. Bravar,
A. Bressan
, et al. (191 additional authors not shown)
Abstract:
We present a determination of the gluon polarisation Delta g/g in the nucleon, based on the longitudinal double-spin asymmetry of DIS events with a pair of large transverse-momentum hadrons in the final state. The data were obtained by the COMPASS experiment at CERN using a 160 GeV/c polarised muon beam scattering off a polarised ^6LiD target. The gluon polarisation is evaluated by a Neural Networ…
▽ More
We present a determination of the gluon polarisation Delta g/g in the nucleon, based on the longitudinal double-spin asymmetry of DIS events with a pair of large transverse-momentum hadrons in the final state. The data were obtained by the COMPASS experiment at CERN using a 160 GeV/c polarised muon beam scattering off a polarised ^6LiD target. The gluon polarisation is evaluated by a Neural Network approach for three intervals of the gluon momentum fraction x_g covering the range 0.04 < x_g < 0.27. The values obtained at leading order in QCD do not show any significant dependence on x_g. Their average is Delta g/g = 0.125 +/- 0.060 (stat.) +/- 0.063 (syst.) at x_g=0.09 and a scale of mu^2 = 3 (GeV/c)^2.
△ Less
Submitted 18 February, 2012;
originally announced February 2012.
-
Active Ranking using Pairwise Comparisons
Authors:
Kevin G. Jamieson,
Robert D. Nowak
Abstract:
This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically,…
▽ More
This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a $d$-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in $R^d$. We show that under this assumption the number of possible rankings grows like $n^{2d}$ and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than $d log n$ adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis.
△ Less
Submitted 9 December, 2011; v1 submitted 16 September, 2011;
originally announced September 2011.
-
Convex Approaches to Model Wavelet Sparsity Patterns
Authors:
Nikhil S Rao,
Robert D. Nowak,
Stephen J. Wright,
Nick G. Kingsbury
Abstract:
Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-con…
▽ More
Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-convex optimizations. Past work has dealt with this issue by resorting to greedy or suboptimal iterative reconstruction methods. In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. We show that the methods we develop perform significantly better in deconvolution and compressed sensing applications, while being as computationally efficient as standard coefficient-wise approaches such as lasso.
△ Less
Submitted 22 April, 2011;
originally announced April 2011.
-
A Formalization of Polytime Functions
Authors:
Sylvain Heraud,
David Nowak
Abstract:
We present a deep embedding of Bellantoni and Cook's syntactic characterization of polytime functions. We prove formally that it is correct and complete with respect to the original characterization by Cobham that required a bound to be proved manually. Compared to the paper proof by Bellantoni and Cook, we have been careful in making our proof fully contructive so that we obtain more precise boun…
▽ More
We present a deep embedding of Bellantoni and Cook's syntactic characterization of polytime functions. We prove formally that it is correct and complete with respect to the original characterization by Cobham that required a bound to be proved manually. Compared to the paper proof by Bellantoni and Cook, we have been careful in making our proof fully contructive so that we obtain more precise bounding polynomials and more efficient translations between the two characterizations. Another difference is that we consider functions on bitstrings instead of functions on positive integers. This latter change is motivated by the application of our formalization in the context of formal security proofs in cryptography. Based on our core formalization, we have started develo** a library of polytime functions that can be reused to build more complex ones.
△ Less
Submitted 31 May, 2011; v1 submitted 27 February, 2011;
originally announced February 2011.
-
Detecting Weak but Hierarchically-Structured Patterns in Networks
Authors:
Aarti Singh,
Robert D. Nowak,
Robert Calderbank
Abstract:
The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global network-wide a…
▽ More
The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global network-wide aggregate. Most prior work considers situations in which the activation/non-activation of each node is statistically independent, but this is unrealistic in many problems. In this paper, we consider structured patterns arising from statistical dependencies in the activation process. Our contributions are three-fold. First, we propose a sparsifying transform that succinctly represents structured activation patterns that conform to a hierarchical dependency graph. Second, we establish that the proposed transform facilitates detection of very weak activation patterns that cannot be detected with existing methods. Third, we show that the structure of the hierarchical dependency graph governing the activation process, and hence the network transform, can be learnt from very few (logarithmic in network size) independent snapshots of network activity.
△ Less
Submitted 28 February, 2010;
originally announced March 2010.
-
The Geometry of Generalized Binary Search
Authors:
Robert D. Nowak
Abstract:
This paper investigates the problem of determining a binary-valued function through a sequence of strategically selected queries. The focus is an algorithm called Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued function through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses…
▽ More
This paper investigates the problem of determining a binary-valued function through a sequence of strategically selected queries. The focus is an algorithm called Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued function through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. This paper develops novel incoherence and geometric conditions under which GBS achieves the information-theoretically optimal query complexity; i.e., given a collection of N hypotheses, GBS terminates with the correct function after no more than a constant times log N queries. Furthermore, a noise-tolerant version of GBS is developed that also achieves the optimal query complexity. These results are applied to learning halfspaces, a problem arising routinely in image processing and machine learning.
△ Less
Submitted 25 June, 2013; v1 submitted 22 October, 2009;
originally announced October 2009.
-
On formal verification of arithmetic-based cryptographic primitives
Authors:
David Nowak
Abstract:
Cryptographic primitives are fundamental for information security: they are used as basic components for cryptographic protocols or public-key cryptosystems. In many cases, their security proofs consist in showing that they are reducible to computationally hard problems. Those reductions can be subtle and tedious, and thus not easily checkable. On top of the proof assistant Coq, we had implement…
▽ More
Cryptographic primitives are fundamental for information security: they are used as basic components for cryptographic protocols or public-key cryptosystems. In many cases, their security proofs consist in showing that they are reducible to computationally hard problems. Those reductions can be subtle and tedious, and thus not easily checkable. On top of the proof assistant Coq, we had implemented in previous work a toolbox for writing and checking game-based security proofs of cryptographic primitives. In this paper we describe its extension with number-theoretic capabilities so that it is now possible to write and check arithmetic-based cryptographic primitives in our toolbox. We illustrate our work by machine checking the game-based proofs of unpredictability of the pseudo-random bit generator of Blum, Blum and Shub, and semantic security of the public-key cryptographic scheme of Goldwasser and Micali.
△ Less
Submitted 7 April, 2009;
originally announced April 2009.
-
On Completeness of Logical Relations for Monadic Types
Authors:
Slawomir Lasota,
David Nowak,
Yu Zhang
Abstract:
Software security can be ensured by specifying and verifying security properties of software using formal methods with strong theoretical bases. In particular, programs can be modeled in the framework of lambda-calculi, and interesting properties can be expressed formally by contextual equivalence (a.k.a. observational equivalence). Furthermore, imperative features, which exist in most real-life…
▽ More
Software security can be ensured by specifying and verifying security properties of software using formal methods with strong theoretical bases. In particular, programs can be modeled in the framework of lambda-calculi, and interesting properties can be expressed formally by contextual equivalence (a.k.a. observational equivalence). Furthermore, imperative features, which exist in most real-life software, can be nicely expressed in the so-called computational lambda-calculus. Contextual equivalence is difficult to prove directly, but we can often use logical relations as a tool to establish it in lambda-calculi. We have already defined logical relations for the computational lambda-calculus in previous work. We devote this paper to the study of their completeness w.r.t. contextual equivalence in the computational lambda-calculus.
△ Less
Submitted 21 December, 2006;
originally announced December 2006.
-
On the freeze quantifier in Constraint LTL: decidability and complexity
Authors:
Stéphane Demri,
Ranko Lazic,
David Nowak
Abstract:
Constraint LTL, a generalisation of LTL over Presburger constraints, is often used as a formal language to specify the behavior of operational models with constraints. The freeze quantifier can be part of the language, as in some real-time logics, but this variable-binding mechanism is quite general and ubiquitous in many logical languages (first-order temporal logics, hybrid logics, logics for…
▽ More
Constraint LTL, a generalisation of LTL over Presburger constraints, is often used as a formal language to specify the behavior of operational models with constraints. The freeze quantifier can be part of the language, as in some real-time logics, but this variable-binding mechanism is quite general and ubiquitous in many logical languages (first-order temporal logics, hybrid logics, logics for sequence diagrams, navigation logics, logics with lambda-abstraction etc.). We show that Constraint LTL over the simple domain (N,=) augmented with the freeze quantifier is undecidable which is a surprising result in view of the poor language for constraints (only equality tests). Many versions of freeze-free Constraint LTL are decidable over domains with qualitative predicates and our undecidability result actually establishes Sigma_1^1-completeness. On the positive side, we provide complexity results when the domain is finite (EXPSPACE-completeness) or when the formulae are flat in a sense introduced in the paper. Our undecidability results are sharp (i.e. with restrictions on the number of variables) and all our complexity characterisations ensure completeness with respect to some complexity class (mainly PSPACE and EXPSPACE).
△ Less
Submitted 29 September, 2006; v1 submitted 4 September, 2006;
originally announced September 2006.