-
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
Authors:
Akshay Kumar,
Jarvis Haupt
Abstract:
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, t…
▽ More
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks
Authors:
Akshay Kumar,
Jarvis Haupt
Abstract:
This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converg…
▽ More
This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.
△ Less
Submitted 20 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Leakage in restless quantum gate calibration
Authors:
Conrad J. Haupt,
Daniel J. Egger
Abstract:
Quantum computers require high fidelity quantum gates. These gates are obtained by routine calibration tasks that eat into the availability of cloud-based devices. Restless circuit execution speeds-up characterization and calibration by foregoing qubit reset in between circuits. Post-processing the measured data recovers the desired signal. However, since the qubits are not reset, leakage -- typic…
▽ More
Quantum computers require high fidelity quantum gates. These gates are obtained by routine calibration tasks that eat into the availability of cloud-based devices. Restless circuit execution speeds-up characterization and calibration by foregoing qubit reset in between circuits. Post-processing the measured data recovers the desired signal. However, since the qubits are not reset, leakage -- typically present at the beginning of the calibration -- may cause issues. Here, we develop a simulator of restless circuit execution based on a Markov Chain to study the effect of leakage. In the context of error amplifying single-qubit gates sequences, we show that restless calibration tolerates up to 0.5% of leakage which is large compared to the $10^{-4}$ gate fidelity of modern single-qubit gates. Furthermore, we show that restless circuit execution with leaky gates reduces by 33% the sensitivity of the ORBIT cost function developed by J. Kelly et al. which is typically used in closed-loop optimal control~[Phys. Rev. Lett. 112, 240504 (2014)]. Our results are obtained with standard qubit state discrimination showing that restless circuit execution is resilient against misclassified non-computational states. In summary, the restless method is sufficiently robust against leakage in both standard and closed-loop optimal control gate calibration to provided accurate results.
△ Less
Submitted 9 November, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Optimizing Jastrow factors for the transcorrelated method
Authors:
J. Philip Haupt,
Seyed Mohammadreza Hosseini,
Pablo Lopez Rios,
Werner Dobrautz,
Aron Cohen,
Ali Alavi
Abstract:
We investigate the optimization of flexible tailored real-space Jastrow factors for use in the transcorrelated (TC) method in combination with highly accurate quantum chemistry methods such as initiator full configuration interaction quantum Monte Carlo (FCIQMC). Jastrow factors obtained by minimizing the variance of the TC reference energy are found to yield better, more consistent results than t…
▽ More
We investigate the optimization of flexible tailored real-space Jastrow factors for use in the transcorrelated (TC) method in combination with highly accurate quantum chemistry methods such as initiator full configuration interaction quantum Monte Carlo (FCIQMC). Jastrow factors obtained by minimizing the variance of the TC reference energy are found to yield better, more consistent results than those obtained by minimizing the variational energy. We compute all-electron atomization energies for the challenging first-row molecules C2 , CN, N2 , and O2 and find that the TC method yields chemically accurate results using only the cc-pVTZ basis set, roughly matching the accuracy of non-TC calculations with the much larger cc-pV5Z basis set. We also investigate an approximation in which pure three-body excitations are neglected from the TC-FCIQMC dynamics, saving storage and computational cost, and show that it affects relative energies negligibly. Our results demonstrate that the combination of tailored real-space Jastrow factors with the multi-configurational TC-FCIQMC method provides a route to obtaining chemical accuracy using modest basis sets, obviating the need for basis-set extrapolation and composite techniques.
△ Less
Submitted 12 May, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Online Stochastic Gradient Descent Learns Linear Dynamical Systems from A Single Trajectory
Authors:
Navid Reyhanian,
Jarvis Haupt
Abstract:
This work investigates the problem of estimating the weight matrices of a stable time-invariant linear dynamical system from a single sequence of noisy measurements. We show that if the unknown weight matrices describing the system are in Brunovsky canonical form, we can efficiently estimate the ground truth unknown matrices of the system from a linear system of equations formulated based on the t…
▽ More
This work investigates the problem of estimating the weight matrices of a stable time-invariant linear dynamical system from a single sequence of noisy measurements. We show that if the unknown weight matrices describing the system are in Brunovsky canonical form, we can efficiently estimate the ground truth unknown matrices of the system from a linear system of equations formulated based on the transfer function of the system, using both online and offline stochastic gradient descent (SGD) methods. Specifically, by deriving concrete complexity bounds, we show that SGD converges linearly in expectation to any arbitrary small Frobenius norm distance from the ground truth weights. To the best of our knowledge, ours is the first work to establish linear convergence characteristics for online and offline gradient-based iterative methods for weight matrix estimation in linear dynamical systems from a single trajectory. Extensive numerical tests verify that the performance of the proposed methods is consistent with our theory, and show their superior performance relative to existing state of the art methods.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Quantum-Assisted Optical Interferometers: Instrument Requirements
Authors:
Andrei Nomerotski,
Paul Stankus,
Anže Slosar,
Stephen Vintskevich,
Shane Andrewski,
Gabriella Carini,
Denis Dolzhenko,
Duncan England,
Eden Figueroa,
Sonali Gera,
Justine Haupt,
Sven Herrmann,
Dimitrios Katramatos,
Michael Keach,
Alexander Parsells,
Olli Saira,
Jonathan Schiff,
Peter Svihra,
Thomas Tsang,
Yingwen Zhang
Abstract:
It has been recently suggested that optical interferometers may not require a phase-stable optical link between the stations if instead sources of quantum-mechanically entangled pairs could be provided to them, enabling extra-long baselines and benefiting numerous topics in astrophysics and cosmology. We developed a new variation of this idea, proposing that photons from two different sources coul…
▽ More
It has been recently suggested that optical interferometers may not require a phase-stable optical link between the stations if instead sources of quantum-mechanically entangled pairs could be provided to them, enabling extra-long baselines and benefiting numerous topics in astrophysics and cosmology. We developed a new variation of this idea, proposing that photons from two different sources could be interfered at two decoupled stations, requiring only a slow classical connection between them. We show that this approach could allow high-precision measurements of the relative astrometry of the two sources, with a simple estimate giving angular resolution of $10 \ μ$as in a few hours' observation of two bright stars. We also give requirements on the instrument for these observations, in particular on its temporal and spectral resolution. Finally, we discuss possible technologies for the instrument implementation and first proof-of-principle experiments.
△ Less
Submitted 11 December, 2020; v1 submitted 4 December, 2020;
originally announced December 2020.
-
The Baryon Map** Experiment (BMX), a 21cm intensity map** pathfinder
Authors:
Paul O'Connor,
Anže Slosar,
Maile Harris,
Justine Haupt,
John Kuczewski,
Emily Kuhn,
Laura Newburgh,
Annie Polish,
Benjamin Saliwanchik,
Christopher Sheehy,
Paul Stankus,
Gregory Troiani,
Will Tyndall
Abstract:
The Baryon Map** eXperiment (BMX) is an interferometric array designed as a pathfinder for a future post-reionization 21 cm intensity map** survey. It consists of four 4-meter parabolic reflectors each having offset pyramidal horn feed, quad-ridge orthomode transducer, temperature-stabilized RF amplification and filtering, and pulsed noise injection diode. An undersampling readout scheme uses…
▽ More
The Baryon Map** eXperiment (BMX) is an interferometric array designed as a pathfinder for a future post-reionization 21 cm intensity map** survey. It consists of four 4-meter parabolic reflectors each having offset pyramidal horn feed, quad-ridge orthomode transducer, temperature-stabilized RF amplification and filtering, and pulsed noise injection diode. An undersampling readout scheme uses 8-bit digitizers running at 1.1 Gsamples/sec to provide access to signals from 1.1 - 1.55 GHz (third Nyquist zone), corresponding to HI emission from sources at redshift $0 < z < 0.3$. An FX correlator is implemented in GPU and generates 28 GB/day of time-ordered visibility data. About 7,000 hours of data were collected from Jan. 2019 - May 2020, and we will present results on system performance including sensitivity, beam map** studies, observations of bright celestial targets, and system electronics upgrades. BMX is a pathfinder for the proposed PUMA intensity map** survey in the 2030s.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Convexifying Sparse Interpolation with Infinitely Wide Neural Networks: An Atomic Norm Approach
Authors:
Akshay Kumar,
Jarvis Haupt
Abstract:
This work examines the problem of exact data interpolation via sparse (neuron count), infinitely wide, single hidden layer neural networks with leaky rectified linear unit activations. Using the atomic norm framework of [Chandrasekaran et al., 2012], we derive simple characterizations of the convex hulls of the corresponding atomic sets for this problem under several different constraints on the w…
▽ More
This work examines the problem of exact data interpolation via sparse (neuron count), infinitely wide, single hidden layer neural networks with leaky rectified linear unit activations. Using the atomic norm framework of [Chandrasekaran et al., 2012], we derive simple characterizations of the convex hulls of the corresponding atomic sets for this problem under several different constraints on the weights and biases of the network, thus obtaining equivalent convex formulations for these problems. A modest extension of our proposed framework to a binary classification problem is also presented. We explore the efficacy of the resulting formulations experimentally, and compare with networks trained via gradient descent.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Provable Online CP/PARAFAC Decomposition of a Structured Tensor via Dictionary Learning
Authors:
Sirisha Rambhatla,
Xingguo Li,
Jarvis Haupt
Abstract:
We consider the problem of factorizing a structured 3-way tensor into its constituent Canonical Polyadic (CP) factors. This decomposition, which can be viewed as a generalization of singular value decomposition (SVD) for tensors, reveals how the tensor dimensions (features) interact with each other. However, since the factors are a priori unknown, the corresponding optimization problems are inhere…
▽ More
We consider the problem of factorizing a structured 3-way tensor into its constituent Canonical Polyadic (CP) factors. This decomposition, which can be viewed as a generalization of singular value decomposition (SVD) for tensors, reveals how the tensor dimensions (features) interact with each other. However, since the factors are a priori unknown, the corresponding optimization problems are inherently non-convex. The existing guaranteed algorithms which handle this non-convexity incur an irreducible error (bias), and only apply to cases where all factors have the same structure. To this end, we develop a provable algorithm for online structured tensor factorization, wherein one of the factors obeys some incoherence conditions, and the others are sparse. Specifically we show that, under some relatively mild conditions on initialization, rank, and sparsity, our algorithm recovers the factors exactly (up to scaling and permutation) at a linear rate. Complementary to our theoretical results, our synthetic and real-world data evaluations showcase superior performance compared to related techniques. Moreover, its scalability and ability to learn on-the-fly makes it suitable for real-world tasks.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Targeting customers under response-dependent costs
Authors:
Johannes Haupt,
Stefan Lessmann
Abstract:
This study provides a formal analysis of the customer targeting problem when the cost for a marketing action depends on the customer response and proposes a framework to estimate the decision variables for campaign profit optimization. Targeting a customer is profitable if the impact and associated profit of the marketing treatment are higher than its cost. Despite the growing literature on uplift…
▽ More
This study provides a formal analysis of the customer targeting problem when the cost for a marketing action depends on the customer response and proposes a framework to estimate the decision variables for campaign profit optimization. Targeting a customer is profitable if the impact and associated profit of the marketing treatment are higher than its cost. Despite the growing literature on uplift models to identify the strongest treatment-responders, no research has investigated optimal targeting when the costs of the treatment are unknown at the time of the targeting decision. Stochastic costs are ubiquitous in direct marketing and customer retention campaigns because marketing incentives are conditioned on a positive customer response. This study makes two contributions to the literature, which are evaluated on an e-commerce coupon targeting campaign. First, we formally analyze the targeting decision problem under response-dependent costs. Profit-optimal targeting requires an estimate of the treatment effect on the customer and an estimate of the customer response probability under treatment. The empirical results demonstrate that the consideration of treatment cost substantially increases campaign profit when used for customer targeting in combination with an estimate of the average or customer-level treatment effect. Second, we propose a framework to jointly estimate the treatment effect and the response probability by combining methods for causal inference with a hurdle mixture model. The proposed causal hurdle model achieves competitive campaign profit while streamlining model building. Code is available at https://github.com/Humboldt-WI/response-dependent-costs.
△ Less
Submitted 10 August, 2021; v1 submitted 13 March, 2020;
originally announced March 2020.
-
Affordable Uplift: Supervised Randomization in Controlled Experiments
Authors:
Johannes Haupt,
Daniel Jacob,
Robin M. Gubela,
Stefan Lessmann
Abstract:
Customer scoring models are the core of scalable direct marketing. Uplift models provide an estimate of the incremental benefit from a treatment that is used for operational decision-making. Training and monitoring of uplift models require experimental data. However, the collection of data under randomized treatment assignment is costly, since random targeting deviates from an established targetin…
▽ More
Customer scoring models are the core of scalable direct marketing. Uplift models provide an estimate of the incremental benefit from a treatment that is used for operational decision-making. Training and monitoring of uplift models require experimental data. However, the collection of data under randomized treatment assignment is costly, since random targeting deviates from an established targeting policy. To increase the cost-efficiency of experimentation and facilitate frequent data collection and model training, we introduce supervised randomization. It is a novel approach that integrates existing scoring models into randomized trials to target relevant customers, while ensuring consistent estimates of treatment effects through correction for active sample selection. An empirical Monte Carlo study shows that data collection under supervised randomization is cost-efficient, while downstream uplift models perform competitively.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
A Provably Communication-Efficient Asynchronous Distributed Inference Method for Convex and Nonconvex Problems
Authors:
**eng Ren,
Jarvis Haupt
Abstract:
This paper proposes and analyzes a communication-efficient distributed optimization framework for general nonconvex nonsmooth signal processing and machine learning problems under an asynchronous protocol. At each iteration, worker machines compute gradients of a known empirical loss function using their own local data, and a master machine solves a related minimization problem to update the curre…
▽ More
This paper proposes and analyzes a communication-efficient distributed optimization framework for general nonconvex nonsmooth signal processing and machine learning problems under an asynchronous protocol. At each iteration, worker machines compute gradients of a known empirical loss function using their own local data, and a master machine solves a related minimization problem to update the current estimate. We prove that for nonconvex nonsmooth problems, the proposed algorithm converges with a sublinear rate over the number of communication rounds, coinciding with the best theoretical rate that can be achieved for this class of problems. Linear convergence is established without any statistical assumptions of the local data for problems characterized by composite loss functions whose smooth parts are strongly convex. Extensive numerical experiments verify that the performance of the proposed approach indeed improves -- sometimes significantly -- over other state-of-the-art algorithms in terms of total communication efficiency.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
NOODL: Provable Online Dictionary Learning and Sparse Coding
Authors:
Sirisha Rambhatla,
Xingguo Li,
Jarvis Haupt
Abstract:
We consider the dictionary learning problem, where the aim is to model the given data as a linear combination of a few columns of a matrix known as a dictionary, where the sparse weights forming the linear combination are known as coefficients. Since the dictionary and coefficients, parameterizing the linear model are unknown, the corresponding optimization is inherently non-convex. This was a maj…
▽ More
We consider the dictionary learning problem, where the aim is to model the given data as a linear combination of a few columns of a matrix known as a dictionary, where the sparse weights forming the linear combination are known as coefficients. Since the dictionary and coefficients, parameterizing the linear model are unknown, the corresponding optimization is inherently non-convex. This was a major challenge until recently, when provable algorithms for dictionary learning were proposed. Yet, these provide guarantees only on the recovery of the dictionary, without explicit recovery guarantees on the coefficients. Moreover, any estimation error in the dictionary adversely impacts the ability to successfully localize and estimate the coefficients. This potentially limits the utility of existing provable dictionary learning methods in applications where coefficient recovery is of interest. To this end, we develop NOODL: a simple Neurally plausible alternating Optimization-based Online Dictionary Learning algorithm, which recovers both the dictionary and coefficients exactly at a geometric rate, when initialized appropriately. Our algorithm, NOODL, is also scalable and amenable for large scale distributed implementations in neural architectures, by which we mean that it only involves simple linear and non-linear operations. Finally, we corroborate these theoretical results via experimental evaluation of the proposed algorithm with the current state-of-the-art techniques.
Keywords: dictionary learning, provable dictionary learning, online dictionary learning, non-convex, sparse coding, support recovery, iterative hard thresholding, matrix factorization, neural architectures, neural networks, noodl, sparse representations, sparse signal processing.
△ Less
Submitted 27 August, 2019; v1 submitted 28 February, 2019;
originally announced February 2019.
-
Target-based Hyperspectral Demixing via Generalized Robust PCA
Authors:
Sirisha Rambhatla,
Xingguo Li,
Jarvis Haupt
Abstract:
Localizing targets of interest in a given hyperspectral (HS) image has applications ranging from remote sensing to surveillance. This task of target detection leverages the fact that each material/object possesses its own characteristic spectral response, depending upon its composition. As $\textit{signatures}$ of different materials are often correlated, matched filtering based approaches may not…
▽ More
Localizing targets of interest in a given hyperspectral (HS) image has applications ranging from remote sensing to surveillance. This task of target detection leverages the fact that each material/object possesses its own characteristic spectral response, depending upon its composition. As $\textit{signatures}$ of different materials are often correlated, matched filtering based approaches may not be appropriate in this case. In this work, we present a technique to localize targets of interest based on their spectral signatures. We also present the corresponding recovery guarantees, leveraging our recent theoretical results. To this end, we model a HS image as a superposition of a low-rank component and a dictionary sparse component, wherein the dictionary consists of the $\textit{a priori}$ known characteristic spectral responses of the target we wish to localize. Finally, we analyze the performance of the proposed approach via experimental validation on real HS data for a classification task, and compare it with related techniques.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
A Dictionary-Based Generalization of Robust PCA Part II: Applications to Hyperspectral Demixing
Authors:
Sirisha Rambhatla,
Xingguo Li,
**eng Ren,
Jarvis Haupt
Abstract:
We consider the task of localizing targets of interest in a hyperspectral (HS) image based on their spectral signature(s), by posing the problem as two distinct convex demixing task(s). With applications ranging from remote sensing to surveillance, this task of target detection leverages the fact that each material/object possesses its own characteristic spectral response, depending upon its compo…
▽ More
We consider the task of localizing targets of interest in a hyperspectral (HS) image based on their spectral signature(s), by posing the problem as two distinct convex demixing task(s). With applications ranging from remote sensing to surveillance, this task of target detection leverages the fact that each material/object possesses its own characteristic spectral response, depending upon its composition. However, since $\textit{signatures}$ of different materials are often correlated, matched filtering-based approaches may not be apply here. To this end, we model a HS image as a superposition of a low-rank component and a dictionary sparse component, wherein the dictionary consists of the $\textit{a priori}$ known characteristic spectral responses of the target we wish to localize, and develop techniques for two different sparsity structures, resulting from different model assumptions. We also present the corresponding recovery guarantees, leveraging our recent theoretical results from a companion paper. Finally, we analyze the performance of the proposed approach via experimental evaluations on real HS datasets for a classification task, and compare its performance with related techniques.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
TensorMap: Lidar-Based Topological Map** and Localization via Tensor Decompositions
Authors:
Sirisha Rambhatla,
Nikos D. Sidiropoulos,
Jarvis Haupt
Abstract:
We propose a technique to develop (and localize in) topological maps from light detection and ranging (Lidar) data. Localizing an autonomous vehicle with respect to a reference map in real-time is crucial for its safe operation. Owing to the rich information provided by Lidar sensors, these are emerging as a promising choice for this task. However, since a Lidar outputs a large amount of data ever…
▽ More
We propose a technique to develop (and localize in) topological maps from light detection and ranging (Lidar) data. Localizing an autonomous vehicle with respect to a reference map in real-time is crucial for its safe operation. Owing to the rich information provided by Lidar sensors, these are emerging as a promising choice for this task. However, since a Lidar outputs a large amount of data every fraction of a second, it is progressively harder to process the information in real-time. Consequently, current systems have migrated towards faster alternatives at the expense of accuracy. To overcome this inherent trade-off between latency and accuracy, we propose a technique to develop topological maps from Lidar data using the orthogonal Tucker3 tensor decomposition. Our experimental evaluations demonstrate that in addition to achieving a high compression ratio as compared to full data, the proposed technique, $\textit{TensorMap}$, also accurately detects the position of the vehicle in a graph-based representation of a map. We also analyze the robustness of the proposed technique to Gaussian and translational noise, thus initiating explorations into potential applications of tensor decompositions in Lidar data analysis.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
A Dictionary-Based Generalization of Robust PCA with Applications to Target Localization in Hyperspectral Imaging
Authors:
Sirisha Rambhatla,
Xingguo Li,
**eng Ren,
Jarvis Haupt
Abstract:
We consider the decomposition of a data matrix assumed to be a superposition of a low-rank matrix and a component which is sparse in a known dictionary, using a convex demixing method. We consider two sparsity structures for the sparse factor of the dictionary sparse component, namely entry-wise and column-wise sparsity, and provide a unified analysis, encompassing both undercomplete and the overc…
▽ More
We consider the decomposition of a data matrix assumed to be a superposition of a low-rank matrix and a component which is sparse in a known dictionary, using a convex demixing method. We consider two sparsity structures for the sparse factor of the dictionary sparse component, namely entry-wise and column-wise sparsity, and provide a unified analysis, encompassing both undercomplete and the overcomplete dictionary cases, to show that the constituent matrices can be successfully recovered under some relatively mild conditions on incoherence, sparsity, and rank. We leverage these results to localize targets of interest in a hyperspectral (HS) image based on their spectral signature(s) using the a priori known characteristic spectral responses of the target. We corroborate our theoretical results and analyze target localization performance of our approach via experimental evaluations and comparisons to related techniques.
△ Less
Submitted 29 June, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
A Dictionary Based Generalization of Robust PCA
Authors:
Sirisha Rambhatla,
Xingguo Li,
Jarvis Haupt
Abstract:
We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demixing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recovered under some relatively mild assumptions up t…
▽ More
We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demixing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recovered under some relatively mild assumptions up to a certain $\textit{global}$ sparsity level. Further, we corroborate our theoretical results by presenting empirical evaluations in terms of phase transitions in rank and sparsity for various dictionary sizes.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond
Authors:
Xingguo Li,
Junwei Lu,
Zhaoran Wang,
Jarvis Haupt,
Tuo Zhao
Abstract:
We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks. Through introducing a new characterization of the Lipschitz properties of neural network family, we achieve significantly tighter generalization bounds than existing results. Moreover, we show that the generaliz…
▽ More
We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks. Through introducing a new characterization of the Lipschitz properties of neural network family, we achieve significantly tighter generalization bounds than existing results. Moreover, we show that the generalization bound can be further improved for bounded losses. Aside from the general feedforward deep neural networks, our results can be applied to derive new bounds for popular architectures, including convolutional neural networks (CNNs) and residual networks (ResNets). When achieving same generalization errors with previous arts, our bounds allow for the choice of larger parameter spaces of weight matrices, inducing potentially stronger expressive ability for neural networks. Numerical evaluation is also provided to support our theory.
△ Less
Submitted 3 July, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization
Authors:
Zhehui Chen,
Xingguo Li,
Lin F. Yang,
Jarvis Haupt,
Tuo Zhao
Abstract:
We study constrained nonconvex optimization problems in machine learning, signal processing, and stochastic control. It is well-known that these problems can be rewritten to a minimax problem in a Lagrangian form. However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown. To bridge the gap, we stu…
▽ More
We study constrained nonconvex optimization problems in machine learning, signal processing, and stochastic control. It is well-known that these problems can be rewritten to a minimax problem in a Lagrangian form. However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown. To bridge the gap, we study the landscape of the Lagrangian function. Further, we define a special class of Lagrangian functions. They enjoy two properties: 1.Equilibria are either stable or unstable (Formal definition in Section 2); 2.Stable equilibria correspond to the global optima of the original problem. We show that a generalized eigenvalue (GEV) problem, including canonical correlation analysis and other problems, belongs to the class. Specifically, we characterize its stable and unstable equilibria by leveraging an invariant group and symmetric property (more details in Section 3). Motivated by these neat geometric structures, we propose a simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem. Theoretically, we provide sufficient conditions, based on which we establish an asymptotic convergence rate and obtain the first sample complexity result for the online GEV problem by diffusion approximations, which are widely used in applied probability and stochastic control. Numerical results are provided to support our theory.
△ Less
Submitted 27 October, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Robust identification of email tracking: A machine learning approach
Authors:
Johannes Haupt,
Benedict Bender,
Benjamin Fabian,
Stefan Lessmann
Abstract:
Email tracking allows email senders to collect fine-grained behavior and location data on email recipients, who are uniquely identifiable via their email address. Such tracking invades user privacy in that email tracking techniques gather data without user consent or awareness. Striving to increase privacy in email communication, this paper develops a detection engine to be the core of a selective…
▽ More
Email tracking allows email senders to collect fine-grained behavior and location data on email recipients, who are uniquely identifiable via their email address. Such tracking invades user privacy in that email tracking techniques gather data without user consent or awareness. Striving to increase privacy in email communication, this paper develops a detection engine to be the core of a selective tracking blocking mechanism in the form of three contributions. First, a large collection of email newsletters is analyzed to show the wide usage of tracking over different countries, industries and time. Second, we propose a set of features geared towards the identification of tracking images under real-world conditions. Novel features are devised to be computationally feasible and efficient, generalizable and resilient towards changes in tracking infrastructure. Third, we test the predictive power of these features in a benchmarking experiment using a selection of state- of-the-art classifiers to clarify the effectiveness of model-based tracking identification. We evaluate the expected accuracy of the approach on out-of-sample data, over increasing periods of time, and when faced with unknown senders.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Floating Forests: Quantitative Validation of Citizen Science Data Generated From Consensus Classifications
Authors:
Isaac S. Rosenthal,
Jarrett E. K. Byrnes,
Kyle C. Cavanaugh,
Tom W. Bell,
Briana Harder,
Alison J. Haupt,
Andrew T. W. Rassweiler,
Alejandro Pérez-Matus,
Jorge Assis,
Ali Swanson,
Amy Boyer,
Adam McMaster,
Laura Trouille
Abstract:
Large-scale research endeavors can be hindered by logistical constraints limiting the amount of available data. For example, global ecological questions require a global dataset, and traditional sampling protocols are often too inefficient for a small research team to collect an adequate amount of data. Citizen science offers an alternative by crowdsourcing data collection. Despite growing popular…
▽ More
Large-scale research endeavors can be hindered by logistical constraints limiting the amount of available data. For example, global ecological questions require a global dataset, and traditional sampling protocols are often too inefficient for a small research team to collect an adequate amount of data. Citizen science offers an alternative by crowdsourcing data collection. Despite growing popularity, the community has been slow to embrace it largely due to concerns about quality of data collected by citizen scientists. Using the citizen science project Floating Forests (http://floatingforests.org), we show that consensus classifications made by citizen scientists produce data that is of comparable quality to expert generated classifications. Floating Forests is a web-based project in which citizen scientists view satellite photographs of coastlines and trace the borders of kelp patches. Since launch in 2014, over 7,000 citizen scientists have classified over 750,000 images of kelp forests largely in California and Tasmania. Images are classified by 15 users. We generated consensus classifications by overlaying all citizen classifications and assessed accuracy by comparing to expert classifications. Matthews correlation coefficient (MCC) was calculated for each threshold (1-15), and the threshold with the highest MCC was considered optimal. We showed that optimal user threshold was 4.2 with an MCC of 0.400 (0.023 SE) for Landsats 5 and 7, and a MCC of 0.639 (0.246 SE) for Landsat 8. These results suggest that citizen science data derived from consensus classifications are of comparable accuracy to expert classifications. Citizen science projects should implement methods such as consensus classification in conjunction with a quantitative comparison to expert generated classifications to avoid concerns about data quality.
△ Less
Submitted 25 January, 2018;
originally announced January 2018.
-
Near Optimal Sketching of Low-Rank Tensor Regression
Authors:
Jarvis Haupt,
Xingguo Li,
David P. Woodruff
Abstract:
We study the least squares regression problem \begin{align*} \min_{Θ\in \mathcal{S}_{\odot D,R}} \|AΘ-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $Θ$ for which $Θ= \sum_{r=1}^{R} θ_1^{(r)} \circ \cdots \circ θ_D^{(r)}$ for vectors $θ_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $Θ$ is a low-dimen…
▽ More
We study the least squares regression problem \begin{align*} \min_{Θ\in \mathcal{S}_{\odot D,R}} \|AΘ-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $Θ$ for which $Θ= \sum_{r=1}^{R} θ_1^{(r)} \circ \cdots \circ θ_D^{(r)}$ for vectors $θ_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $Θ$ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in $Θ$ is only $R \cdot \sum_{d=1}^D p_d$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $Θ$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $Φ\in \mathbb{R}^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_Θ \|ΦA Θ- Φb\|_2$, for which if $Θ'$ is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in $Φ$ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.
△ Less
Submitted 20 September, 2017;
originally announced September 2017.
-
Communication-efficient Algorithm for Distributed Sparse Learning via Two-way Truncation
Authors:
**eng Ren,
Jarvis Haupt
Abstract:
We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$ regularized minimization problem. The communication cost is reduced from constant times of the dimension number for the state-of-the-art algorithm to constant tim…
▽ More
We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$ regularized minimization problem. The communication cost is reduced from constant times of the dimension number for the state-of-the-art algorithm to constant times of the sparsity number via Two-way Truncation procedure. Theoretically, we prove that the estimation error of the proposed algorithm decreases exponentially and matches that of the centralized method under mild assumptions. Extensive experiments on both simulated data and real data verify that the proposed algorithm is efficient and has performance comparable with the centralized method on solving high-dimensional sparse learning problems.
△ Less
Submitted 9 September, 2017; v1 submitted 2 September, 2017;
originally announced September 2017.
-
Improved Support Recovery Guarantees for the Group Lasso With Applications to Structural Health Monitoring
Authors:
Mojtaba Kadkhodaie Elyaderani,
Swayambhoo Jain,
Jeffrey Druce,
Stefano Gonella,
Jarvis Haupt
Abstract:
This paper considers the problem of estimating an unknown high dimensional signal from noisy linear measurements, {when} the signal is assumed to possess a \emph{group-sparse} structure in a {known,} fixed dictionary. We consider signals generated according to a natural probabilistic model, and establish new conditions under which the set of indices of the non-zero groups of the signal (called the…
▽ More
This paper considers the problem of estimating an unknown high dimensional signal from noisy linear measurements, {when} the signal is assumed to possess a \emph{group-sparse} structure in a {known,} fixed dictionary. We consider signals generated according to a natural probabilistic model, and establish new conditions under which the set of indices of the non-zero groups of the signal (called the group-level support) may be accurately estimated via the group Lasso. Our results strengthen existing coherence-based analyses that exhibit the well-known "square root" bottleneck, allowing for the number of recoverable nonzero groups to be nearly as large as the total number of groups. We also establish a sufficient recovery condition relating the number of nonzero groups and the signal to noise ratio (quantified in terms of the ratio of the squared Euclidean norms of nonzero groups and the variance of the random additive {measurement} noise), and validate this trend empirically. Finally, we examine the implications of our results in the context of a structural health monitoring application, where the group Lasso approach facilitates demixing of a propagating acoustic wavefield, acquired on the material surface by a scanning laser Doppler vibrometer, into antithetical components, one of which indicates the locations of internal material defects.
△ Less
Submitted 19 May, 2018; v1 submitted 29 August, 2017;
originally announced August 2017.
-
On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions
Authors:
Xingguo Li,
Lin F. Yang,
Jason Ge,
Jarvis Haupt,
Tong Zhang,
Tuo Zhao
Abstract:
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of…
▽ More
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures/assumptions (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.
△ Less
Submitted 15 February, 2018; v1 submitted 19 June, 2017;
originally announced June 2017.
-
Noisy Tensor Completion for Tensors with a Sparse Canonical Polyadic Factor
Authors:
Swayambhoo Jain,
Alexander Gutierrez,
Jarvis Haupt
Abstract:
In this paper we study the problem of noisy tensor completion for tensors that admit a canonical polyadic or CANDECOMP/PARAFAC (CP) decomposition with one of the factors being sparse. We present general theoretical error bounds for an estimate obtained by using a complexity-regularized maximum likelihood principle and then instantiate these bounds for the case of additive white Gaussian noise. We…
▽ More
In this paper we study the problem of noisy tensor completion for tensors that admit a canonical polyadic or CANDECOMP/PARAFAC (CP) decomposition with one of the factors being sparse. We present general theoretical error bounds for an estimate obtained by using a complexity-regularized maximum likelihood principle and then instantiate these bounds for the case of additive white Gaussian noise. We also provide an ADMM-type algorithm for solving the complexity-regularized maximum likelihood problem and validate the theoretical finding via experiments on synthetic data set.
△ Less
Submitted 8 April, 2017;
originally announced April 2017.
-
An automated system to measure the quantum efficiency of CCDs for astronomy
Authors:
Rebecca Coles,
James Chiang,
David Cinabro,
Justine Haupt,
Ivan Kotov,
Homer Neal,
Andrei Nomerotski,
Peter Takacs
Abstract:
We describe a system to measure the Quantum Efficiency in the wavelength range of 300 nm to 1100 nm of 40x40 mm n-channel CCD sensors for the construction of the 3.2 gigapixel LSST focal plane. The technique uses a series of instrument to create a very uniform flux of photons of controllable intensity in the wavelength range of interest across the face the sensor. This allows the absolute Quantum…
▽ More
We describe a system to measure the Quantum Efficiency in the wavelength range of 300 nm to 1100 nm of 40x40 mm n-channel CCD sensors for the construction of the 3.2 gigapixel LSST focal plane. The technique uses a series of instrument to create a very uniform flux of photons of controllable intensity in the wavelength range of interest across the face the sensor. This allows the absolute Quantum Efficiency to be measured with an accuracy in the 7% range. This system will be part of a production facility at Brookhaven National Lab for the basic component of the LSST camera.
△ Less
Submitted 7 September, 2017; v1 submitted 13 January, 2017;
originally announced January 2017.
-
Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization
Authors:
Xingguo Li,
Junwei Lu,
Raman Arora,
Jarvis Haupt,
Han Liu,
Zhaoran Wang,
Tuo Zhao
Abstract:
We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via…
▽ More
We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via the lens of invariant groups\removed{for associated optimization problems, including low-rank matrix factorization, phase retrieval, and deep linear neural networks}. As a major motivating example, we apply the proposed general theory to characterize the global \xl{landscape} of the \xl{nonconvex optimization in} low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: ($\cR_1$) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; ($\cR_2$) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and ($\cR_3$) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. Such global landscape implies strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.
△ Less
Submitted 19 January, 2018; v1 submitted 29 December, 2016;
originally announced December 2016.
-
Robust Low-Complexity Randomized Methods for Locating Outliers in Large Matrices
Authors:
Xingguo Li,
Jarvis Haupt
Abstract:
This paper examines the problem of locating outlier columns in a large, otherwise low-rank matrix, in settings where {}{the data} are noisy, or where the overall matrix has missing elements. We propose a randomized two-step inference framework, and establish sufficient conditions on the required sample complexities under which these methods succeed (with high probability) in accurately locating th…
▽ More
This paper examines the problem of locating outlier columns in a large, otherwise low-rank matrix, in settings where {}{the data} are noisy, or where the overall matrix has missing elements. We propose a randomized two-step inference framework, and establish sufficient conditions on the required sample complexities under which these methods succeed (with high probability) in accurately locating the outliers for each task. Comprehensive numerical experimental results are provided to verify the theoretical bounds and demonstrate the computational efficiency of the proposed algorithm.
△ Less
Submitted 7 December, 2016;
originally announced December 2016.
-
On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About Its Nonsmooth Loss Function
Authors:
Xingguo Li,
Haoming Jiang,
Jarvis Haupt,
Raman Arora,
Han Liu,
Mingyi Hong,
Tuo Zhao
Abstract:
Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. However, by exploring the modeling structures, we find these "sacrifices" do not always require more computational efforts. To shed light on such a "free-lunch" phenomenon, we study the square-root-Lasso (SQRT-Lasso) type regression problem. Specifically, we show t…
▽ More
Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. However, by exploring the modeling structures, we find these "sacrifices" do not always require more computational efforts. To shed light on such a "free-lunch" phenomenon, we study the square-root-Lasso (SQRT-Lasso) type regression problem. Specifically, we show that the nonsmooth loss functions of SQRT-Lasso type regression ease tuning effort and gain adaptivity to inhomogeneous noise, but is not necessarily more challenging than Lasso in computation. We can directly apply proximal algorithms (e.g. proximal gradient descent, proximal Newton, and proximal Quasi-Newton algorithms) without worrying the nonsmoothness of the loss function. Theoretically, we prove that the proximal algorithms combined with the pathwise optimization scheme enjoy fast convergence guarantees with high probability. Numerical results are provided to support our theory.
△ Less
Submitted 13 April, 2019; v1 submitted 25 May, 2016;
originally announced May 2016.
-
Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction
Authors:
Xingguo Li,
Raman Arora,
Han Liu,
Jarvis Haupt,
Tuo Zhao
Abstract:
We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedu…
▽ More
We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedup. Numerical experiments demonstrate the efficiency of our algorithm in terms of both parameter estimation and computational performance.
△ Less
Submitted 23 December, 2017; v1 submitted 9 May, 2016;
originally announced May 2016.
-
A Compressed Sensing Based Decomposition of Electrodermal Activity Signals
Authors:
Swayambhoo Jain,
Urvashi Oswal,
Kevin S. Xu,
Brian Eriksson,
Jarvis Haupt
Abstract:
The measurement and analysis of Electrodermal Activity (EDA) offers applications in diverse areas ranging from market research, to seizure detection, to human stress analysis. Unfortunately, the analysis of EDA signals is made difficult by the superposition of numerous components which can obscure the signal information related to a user's response to a stimulus. We show how simple pre-processing…
▽ More
The measurement and analysis of Electrodermal Activity (EDA) offers applications in diverse areas ranging from market research, to seizure detection, to human stress analysis. Unfortunately, the analysis of EDA signals is made difficult by the superposition of numerous components which can obscure the signal information related to a user's response to a stimulus. We show how simple pre-processing followed by a novel compressed sensing based decomposition can mitigate the effects of the undesired noise components and help reveal the underlying physiological signal. The proposed framework allows for decomposition of EDA signals with provable bounds on the recovery of user responses. We test our procedure on both synthetic and real-world EDA signals from wearable sensors and demonstrate that our approach allows for more accurate recovery of user responses as compared to the existing techniques.
△ Less
Submitted 26 January, 2017; v1 submitted 24 February, 2016;
originally announced February 2016.
-
Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models
Authors:
Abhinav V. Sambasivan,
Jarvis D. Haupt
Abstract:
This paper examines fundamental error characteristics for a general class of matrix completion problems, where the matrix of interest is a product of two a priori unknown matrices, one of which is sparse, and the observations are noisy. Our main contributions come in the form of minimax lower bounds for the expected per-element squared error for this problem under under several common noise models…
▽ More
This paper examines fundamental error characteristics for a general class of matrix completion problems, where the matrix of interest is a product of two a priori unknown matrices, one of which is sparse, and the observations are noisy. Our main contributions come in the form of minimax lower bounds for the expected per-element squared error for this problem under under several common noise models. Specifically, we analyze scenarios where the corruptions are characterized by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations, as instances of our general result. Our results establish that the error bounds derived in (Soni et al., 2016) for complexity-regularized maximum likelihood estimators achieve, up to multiplicative constants and logarithmic factors, the minimax error rates in each of these noise scenarios, provided that the nominal number of observations is large enough, and the sparse factor has (on an average) at least one non-zero per column.
△ Less
Submitted 26 October, 2017; v1 submitted 2 October, 2015;
originally announced October 2015.
-
Testing a Novel Self-Assembling Data Paradigm in the Context of IACT Data
Authors:
Amanda Weinstein,
Lucy Fortson,
Thomas Brantseg,
Cameron Rulten,
Robyn Lutz,
Jarvis Haupt,
Mojtaba Kakhodaie Elyaderani,
John Quinn
Abstract:
The process of gathering and associating data from multiple sensors or sub-detectors due to a common physical event (the process of event-building) is used in many fields, including high-energy physics and $γ$-ray astronomy. Fault tolerance in event-building is a challenging problem that increases in difficulty with higher data throughput rates and increasing numbers of sub-detectors. We draw on b…
▽ More
The process of gathering and associating data from multiple sensors or sub-detectors due to a common physical event (the process of event-building) is used in many fields, including high-energy physics and $γ$-ray astronomy. Fault tolerance in event-building is a challenging problem that increases in difficulty with higher data throughput rates and increasing numbers of sub-detectors. We draw on biological self-assembly models in the development of a novel event-building paradigm that treats each packet of data from an individual sensor or sub-detector as if it were a molecule in solution. Just as molecules are capable of forming chemical bonds, "bonds" can be defined between data packets using metadata-based discriminants. A database -- which plays the role of a beaker of solution -- continually selects pairs of assemblies at random to test for bonds, which allows single packets and small assemblies to aggregate into larger assemblies. During this process higher-quality associations supersede spurious ones. The database thereby becomes fluid, dynamic, and self-correcting rather than static. We will describe tests of the self-assembly paradigm using our first fluid database prototype and data from the VERITAS $γ$-ray telescope.
△ Less
Submitted 7 September, 2015;
originally announced September 2015.
-
On Convolutional Approximations to Linear Dimensionality Reduction Operators for Large Scale Data Processing
Authors:
Swayambhoo Jain,
Jarvis Haupt
Abstract:
In this paper, we examine the problem of approximating a general linear dimensionality reduction (LDR) operator, represented as a matrix $A \in \mathbb{R}^{m \times n}$ with $m < n$, by a partial circulant matrix with rows related by circular shifts. Partial circulant matrices admit fast implementations via Fourier transform methods and subsampling operations; our investigation here is motivated b…
▽ More
In this paper, we examine the problem of approximating a general linear dimensionality reduction (LDR) operator, represented as a matrix $A \in \mathbb{R}^{m \times n}$ with $m < n$, by a partial circulant matrix with rows related by circular shifts. Partial circulant matrices admit fast implementations via Fourier transform methods and subsampling operations; our investigation here is motivated by a desire to leverage these potential computational improvements in large-scale data processing tasks. We establish a fundamental result, that most large LDR matrices (whose row spaces are uniformly distributed) in fact cannot be well approximated by partial circulant matrices. Then, we propose a natural generalization of the partial circulant approximation framework that entails approximating the range space of a given LDR operator $A$ over a restricted domain of inputs, using a matrix formed as a product of a partial circulant matrix having $m '> m$ rows and a $m \times k$ 'post processing' matrix. We introduce a novel algorithmic technique, based on sparse matrix factorization, for identifying the factors comprising such approximations, and provide preliminary evidence to demonstrate the potential of this approach.
△ Less
Submitted 24 February, 2015;
originally announced February 2015.
-
Noisy Matrix Completion under Sparse Factor Models
Authors:
Akshay Soni,
Swayambhoo Jain,
Jarvis Haupt,
Stefano Gonella
Abstract:
This paper examines a general class of noisy matrix completion tasks where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random noise or corruption. Our specific focus is on settings where the matrix to be estimated is well-approximated by a product of two (a priori unknown) matrices, one of which is sparse. Such structural mode…
▽ More
This paper examines a general class of noisy matrix completion tasks where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random noise or corruption. Our specific focus is on settings where the matrix to be estimated is well-approximated by a product of two (a priori unknown) matrices, one of which is sparse. Such structural models - referred to here as "sparse factor models" - have been widely used, for example, in subspace clustering applications, as well as in contemporary sparse modeling and dictionary learning tasks. Our main theoretical contributions are estimation error bounds for sparsity-regularized maximum likelihood estimators for problems of this form, which are applicable to a number of different observation noise or corruption models. Several specific implications are examined, including scenarios where observations are corrupted by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations. We also propose a simple algorithmic approach based on the alternating direction method of multipliers for these tasks, and provide experimental evidence to support our error analyses.
△ Less
Submitted 2 November, 2014;
originally announced November 2014.
-
Identifying Outliers in Large Matrices via Randomized Adaptive Compressive Sampling
Authors:
Xingguo Li,
Jarvis Haupt
Abstract:
This paper examines the problem of locating outlier columns in a large, otherwise low-rank, matrix. We propose a simple two-step adaptive sensing and inference approach and establish theoretical guarantees for its performance; our results show that accurate outlier identification is achievable using very few linear summaries of the original data matrix -- as few as the squared rank of the low-rank…
▽ More
This paper examines the problem of locating outlier columns in a large, otherwise low-rank, matrix. We propose a simple two-step adaptive sensing and inference approach and establish theoretical guarantees for its performance; our results show that accurate outlier identification is achievable using very few linear summaries of the original data matrix -- as few as the squared rank of the low-rank component plus the number of outliers, times constant and logarithmic factors. We demonstrate the performance of our approach experimentally in two stylized applications, one motivated by robust collaborative filtering tasks, and the other by saliency map estimation tasks arising in computer vision and automated surveillance, and also investigate extensions to settings where the data are noisy, or possibly incomplete.
△ Less
Submitted 18 November, 2014; v1 submitted 1 July, 2014;
originally announced July 2014.
-
Anomaly-Sensitive Dictionary Learning for Unsupervised Diagnostics of Solid Media
Authors:
Jeffrey M. Druce,
Jarvis D. Haupt,
Stefano Gonella
Abstract:
This paper proposes a strategy for the detection and triangulation of structural anomalies in solid media. The method revolves around the construction of sparse representations of the medium's dynamic response, obtained by learning instructive dictionaries which form a suitable basis for the response data. The resulting sparse coding problem is recast as a modified dictionary learning task with ad…
▽ More
This paper proposes a strategy for the detection and triangulation of structural anomalies in solid media. The method revolves around the construction of sparse representations of the medium's dynamic response, obtained by learning instructive dictionaries which form a suitable basis for the response data. The resulting sparse coding problem is recast as a modified dictionary learning task with additional spatial sparsity constraints enforced on the atoms of the learned dictionaries, which provides them with a prescribed spatial topology that is designed to unveil anomalous regions in the physical domain. The proposed methodology is model agnostic, i.e., it forsakes the need for a physical model and requires virtually no a priori knowledge of the structure's material properties, as all the inferences are exclusively informed by the data through the layers of information that are available in the intrinsic salient structure of the material's dynamic response. This characteristic makes the approach powerful for anomaly identification in systems with unknown or heterogeneous property distribution, for which a model is unsuitable or unreliable. The method is validated using both synthetically
△ Less
Submitted 11 May, 2014;
originally announced May 2014.
-
Compressive Measurement Designs for Estimating Structured Signals in Structured Clutter: A Bayesian Experimental Design Approach
Authors:
Swayambhoo Jain,
Akshay Soni,
Jarvis Haupt
Abstract:
This work considers an estimation task in compressive sensing, where the goal is to estimate an unknown signal from compressive measurements that are corrupted by additive pre-measurement noise (interference, or clutter) as well as post-measurement noise, in the specific setting where some (perhaps limited) prior knowledge on the signal, interference, and noise is available. The specific aim here…
▽ More
This work considers an estimation task in compressive sensing, where the goal is to estimate an unknown signal from compressive measurements that are corrupted by additive pre-measurement noise (interference, or clutter) as well as post-measurement noise, in the specific setting where some (perhaps limited) prior knowledge on the signal, interference, and noise is available. The specific aim here is to devise a strategy for incorporating this prior information into the design of an appropriate compressive measurement strategy. Here, the prior information is interpreted as statistics of a prior distribution on the relevant quantities, and an approach based on Bayesian Experimental Design is proposed. Experimental results on synthetic data demonstrate that the proposed approach outperforms traditional random compressive measurement designs, which are agnostic to the prior information, as well as several other knowledge-enhanced sensing matrix designs based on more heuristic notions.
△ Less
Submitted 21 November, 2013;
originally announced November 2013.
-
Automated Defect Localization via Low Rank Plus Outlier Modeling of Propagating Wavefield Data
Authors:
Stefano Gonella,
Jarvis D. Haupt
Abstract:
This work proposes an agnostic inference strategy for material diagnostics, conceived within the context of laser-based non-destructive evaluation methods, which extract information about structural anomalies from the analysis of acoustic wavefields measured on the structure's surface by means of a scanning laser interferometer. The proposed approach couples spatiotemporal windowing with low rank…
▽ More
This work proposes an agnostic inference strategy for material diagnostics, conceived within the context of laser-based non-destructive evaluation methods, which extract information about structural anomalies from the analysis of acoustic wavefields measured on the structure's surface by means of a scanning laser interferometer. The proposed approach couples spatiotemporal windowing with low rank plus outlier modeling, to identify a priori unknown deviations in the propagating wavefields caused by material inhomogeneities or defects, using virtually no knowledge of the structural and material properties of the medium. This characteristic makes the approach particularly suitable for diagnostics scenarios where the mechanical and material models are complex, unknown, or unreliable. We demonstrate our approach in a simulated environment using benchmark point and line defect localization problems based on propagating flexural waves in a thin plate.
△ Less
Submitted 18 July, 2013;
originally announced July 2013.
-
On the Fundamental Limits of Recovering Tree Sparse Vectors from Noisy Linear Measurements
Authors:
Akshay Soni,
Jarvis Haupt
Abstract:
Recent breakthrough results in compressive sensing (CS) have established that many high dimensional signals can be accurately recovered from a relatively small number of non-adaptive linear observations, provided that the signals possess a sparse representation in some basis. Subsequent efforts have shown that the performance of CS can be improved by exploiting additional structure in the location…
▽ More
Recent breakthrough results in compressive sensing (CS) have established that many high dimensional signals can be accurately recovered from a relatively small number of non-adaptive linear observations, provided that the signals possess a sparse representation in some basis. Subsequent efforts have shown that the performance of CS can be improved by exploiting additional structure in the locations of the nonzero signal coefficients during inference, or by utilizing some form of data-dependent adaptive measurement focusing during the sensing process. To our knowledge, our own previous work was the first to establish the potential benefits that can be achieved when fusing the notions of adaptive sensing and structured sparsity -- that work examined the task of support recovery from noisy linear measurements, and established that an adaptive sensing strategy specifically tailored to signals that are tree-sparse can significantly outperform adaptive and non-adaptive sensing strategies that are agnostic to the underlying structure. In this work we establish fundamental performance limits for the task of support recovery of tree-sparse signals from noisy measurements, in settings where measurements may be obtained either non-adaptively (using a randomized Gaussian measurement strategy motivated by initial CS investigations) or by any adaptive sensing strategy. Our main results here imply that the adaptive tree sensing procedure analyzed in our previous work is nearly optimal, in the sense that no other sensing and estimation strategy can perform fundamentally better for identifying the support of tree-sparse signals.
△ Less
Submitted 15 October, 2013; v1 submitted 18 June, 2013;
originally announced June 2013.
-
Semi-blind Source Separation via Sparse Representations and Online Dictionary Learning
Authors:
Sirisha Rambhatla,
Jarvis D. Haupt
Abstract:
This work examines a semi-blind single-channel source separation problem. Our specific aim is to separate one source whose local structure is approximately known, from another a priori unspecified background source, given only a single linear combination of the two sources. We propose a separation technique based on local sparse approximations along the lines of recent efforts in sparse representa…
▽ More
This work examines a semi-blind single-channel source separation problem. Our specific aim is to separate one source whose local structure is approximately known, from another a priori unspecified background source, given only a single linear combination of the two sources. We propose a separation technique based on local sparse approximations along the lines of recent efforts in sparse representations and dictionary learning. A key feature of our procedure is the online learning of dictionaries (using only the data itself) to sparsely model the background source, which facilitates its separation from the partially-known source. Our approach is applicable to source separation problems in various application domains; here, we demonstrate the performance of our proposed approach via simulation on a stylized audio source separation task.
△ Less
Submitted 24 January, 2015; v1 submitted 3 December, 2012;
originally announced December 2012.
-
Level Set Estimation from Compressive Measurements using Box Constrained Total Variation Regularization
Authors:
Akshay Soni,
Jarvis Haupt
Abstract:
Estimating the level set of a signal from measurements is a task that arises in a variety of fields, including medical imaging, astronomy, and digital elevation map**. Motivated by scenarios where accurate and complete measurements of the signal may not available, we examine here a simple procedure for estimating the level set of a signal from highly incomplete measurements, which may additional…
▽ More
Estimating the level set of a signal from measurements is a task that arises in a variety of fields, including medical imaging, astronomy, and digital elevation map**. Motivated by scenarios where accurate and complete measurements of the signal may not available, we examine here a simple procedure for estimating the level set of a signal from highly incomplete measurements, which may additionally be corrupted by additive noise. The proposed procedure is based on box-constrained Total Variation (TV) regularization. We demonstrate the performance of our approach, relative to existing state-of-the-art techniques for level set estimation from compressive measurements, via several simulation examples.
△ Less
Submitted 8 October, 2012;
originally announced October 2012.
-
Efficient Adaptive Compressive Sensing Using Sparse Hierarchical Learned Dictionaries
Authors:
Akshay Soni,
Jarvis Haupt
Abstract:
Recent breakthrough results in compressed sensing (CS) have established that many high dimensional objects can be accurately recovered from a relatively small number of non- adaptive linear projection observations, provided that the objects possess a sparse representation in some basis. Subsequent efforts have shown that the performance of CS can be improved by exploiting the structure in the loca…
▽ More
Recent breakthrough results in compressed sensing (CS) have established that many high dimensional objects can be accurately recovered from a relatively small number of non- adaptive linear projection observations, provided that the objects possess a sparse representation in some basis. Subsequent efforts have shown that the performance of CS can be improved by exploiting the structure in the location of the non-zero signal coefficients (structured sparsity) or using some form of online measurement focusing (adaptivity) in the sensing process. In this paper we examine a powerful hybrid of these two techniques. First, we describe a simple adaptive sensing procedure and show that it is a provably effective method for acquiring sparse signals that exhibit structured sparsity characterized by tree-based coefficient dependencies. Next, employing techniques from sparse hierarchical dictionary learning, we show that representations exhibiting the appropriate form of structured sparsity can be learned from collections of training data. The combination of these techniques results in an effective and efficient adaptive compressive acquisition procedure.
△ Less
Submitted 29 November, 2011;
originally announced November 2011.
-
Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation
Authors:
Jarvis Haupt,
Rui Castro,
Robert Nowak
Abstract:
Adaptive sampling results in dramatic improvements in the recovery of sparse signals in white Gaussian noise. A sequential adaptive sampling-and-refinement procedure called Distilled Sensing (DS) is proposed and analyzed. DS is a form of multi-stage experimental design and testing. Because of the adaptive nature of the data collection, DS can detect and localize far weaker signals than possible fr…
▽ More
Adaptive sampling results in dramatic improvements in the recovery of sparse signals in white Gaussian noise. A sequential adaptive sampling-and-refinement procedure called Distilled Sensing (DS) is proposed and analyzed. DS is a form of multi-stage experimental design and testing. Because of the adaptive nature of the data collection, DS can detect and localize far weaker signals than possible from non-adaptive measurements. In particular, reliable detection and localization (support estimation) using non-adaptive samples is possible only if the signal amplitudes grow logarithmically with the problem dimension. Here it is shown that using adaptive sampling, reliable detection is possible provided the amplitude exceeds a constant, and localization is possible when the amplitude exceeds any arbitrarily slowly growing function of the dimension.
△ Less
Submitted 27 May, 2010; v1 submitted 29 January, 2010;
originally announced January 2010.
-
Radiation hardness qualification of PbWO4 scintillation crystals for the CMS Electromagnetic Calorimeter
Authors:
The CMS Electromagnetic Calorimeter Group,
P. Adzic,
N. Almeida,
D. Andelin,
I. Anicin,
Z. Antunovic,
R. Arcidiacono,
M. W. Arenton,
E. Auffray,
S. Argiro,
A. Askew,
S. Baccaro,
S. Baffioni,
M. Balazs,
D. Bandurin,
D. Barney,
L. M. Barone,
A. Bartoloni,
C. Baty,
S. Beauceron,
K. W. Bell,
C. Bernet,
M. Besancon,
B. Betev,
R. Beuselinck
, et al. (245 additional authors not shown)
Abstract:
Ensuring the radiation hardness of PbWO4 crystals was one of the main priorities during the construction of the electromagnetic calorimeter of the CMS experiment at CERN. The production on an industrial scale of radiation hard crystals and their certification over a period of several years represented a difficult challenge both for CMS and for the crystal suppliers. The present article reviews t…
▽ More
Ensuring the radiation hardness of PbWO4 crystals was one of the main priorities during the construction of the electromagnetic calorimeter of the CMS experiment at CERN. The production on an industrial scale of radiation hard crystals and their certification over a period of several years represented a difficult challenge both for CMS and for the crystal suppliers. The present article reviews the related scientific and technological problems encountered.
△ Less
Submitted 21 December, 2009;
originally announced December 2009.
-
LSST: from Science Drivers to Reference Design and Anticipated Data Products
Authors:
Željko Ivezić,
Steven M. Kahn,
J. Anthony Tyson,
Bob Abel,
Emily Acosta,
Robyn Allsman,
David Alonso,
Yusra AlSayyad,
Scott F. Anderson,
John Andrew,
James Roger P. Angel,
George Z. Angeli,
Reza Ansari,
Pierre Antilogus,
Constanza Araujo,
Robert Armstrong,
Kirk T. Arndt,
Pierre Astier,
Éric Aubourg,
Nicole Auza,
Tim S. Axelrod,
Deborah J. Bard,
Jeff D. Barr,
Aurelian Barrau,
James G. Bartlett
, et al. (288 additional authors not shown)
Abstract:
(Abridged) We describe here the most ambitious survey currently planned in the optical, the Large Synoptic Survey Telescope (LSST). A vast array of science will be enabled by a single wide-deep-fast sky survey, and LSST will have unique survey capability in the faint time domain. The LSST design is driven by four main science themes: probing dark energy and dark matter, taking an inventory of the…
▽ More
(Abridged) We describe here the most ambitious survey currently planned in the optical, the Large Synoptic Survey Telescope (LSST). A vast array of science will be enabled by a single wide-deep-fast sky survey, and LSST will have unique survey capability in the faint time domain. The LSST design is driven by four main science themes: probing dark energy and dark matter, taking an inventory of the Solar System, exploring the transient optical sky, and map** the Milky Way. LSST will be a wide-field ground-based system sited at Cerro Pachón in northern Chile. The telescope will have an 8.4 m (6.5 m effective) primary mirror, a 9.6 deg$^2$ field of view, and a 3.2 Gigapixel camera. The standard observing sequence will consist of pairs of 15-second exposures in a given field, with two such visits in each pointing in a given night. With these repeats, the LSST system is capable of imaging about 10,000 square degrees of sky in a single filter in three nights. The typical 5$σ$ point-source depth in a single visit in $r$ will be $\sim 24.5$ (AB). The project is in the construction phase and will begin regular survey operations by 2022. The survey area will be contained within 30,000 deg$^2$ with $δ<+34.5^\circ$, and will be imaged multiple times in six bands, $ugrizy$, covering the wavelength range 320--1050 nm. About 90\% of the observing time will be devoted to a deep-wide-fast survey mode which will uniformly observe a 18,000 deg$^2$ region about 800 times (summed over all six bands) during the anticipated 10 years of operations, and yield a coadded map to $r\sim27.5$. The remaining 10\% of the observing time will be allocated to projects such as a Very Deep and Fast time domain survey. The goal is to make LSST data products, including a relational database of about 32 trillion observations of 40 billion objects, available to the public and scientists around the world.
△ Less
Submitted 23 May, 2018; v1 submitted 15 May, 2008;
originally announced May 2008.