-
Hierarchical clustering with dot products recovers hidden tree structure
Authors:
Annie Gray,
Alexander Modell,
Patrick Rubin-Delanchy,
Nick Whiteley
Abstract:
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a…
▽ More
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model. The key technical innovations are to understand how hierarchical information in this model translates into tree geometry which can be recovered from data, and to characterise the benefits of simultaneously growing sample size and data dimension. We demonstrate superior tree recovery performance with real data over existing approaches such as UPGMA, Ward's method, and HDBSCAN.
△ Less
Submitted 1 March, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Correlation-Based And-Operations Can Be Copulas: A Proof
Authors:
Enrique Miralles-Dolz,
Ander Gray,
Edoardo Patelli,
Scott Ferson,
Vladik Kreinovich,
Olga Kosheleva
Abstract:
In many practical situations, we know the probabilities $a$ and $b$ of two events $A$ and $B$, and we want to estimate the joint probability ${\rm Prob}(A\,\&\,B)$. The algorithm that estimates the joint probability based on the known values $a$ and $b$ is called an and-operation. An important case when such a reconstruction is possible is when we know the correlation between $A$ and $B$; we call…
▽ More
In many practical situations, we know the probabilities $a$ and $b$ of two events $A$ and $B$, and we want to estimate the joint probability ${\rm Prob}(A\,\&\,B)$. The algorithm that estimates the joint probability based on the known values $a$ and $b$ is called an and-operation. An important case when such a reconstruction is possible is when we know the correlation between $A$ and $B$; we call the resulting and-operation correlation-based. On the other hand, in statistics, there is a widely used class of and-operations known as copulas. Empirical evidence seems to indicate that the correlation-based and-operation derived in https://doi.org/10.1007/978-3-031-08971-8_64 is a copula, but until now, no proof of this statement was available. In this paper, we provide such a proof.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Statistical exploration of the Manifold Hypothesis
Authors:
Nick Whiteley,
Annie Gray,
Patrick Rubin-Delanchy
Abstract:
The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as…
▽ More
The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as a key factor in the success of modern AI technologies. We show that rich and sometimes intricate manifold structure in data can emerge from a generic and remarkably simple statistical model -- the Latent Metric Model -- via elementary concepts such as latent variables, correlation and stationarity. This establishes a general statistical explanation for why the Manifold Hypothesis seems to hold in so many situations. Informed by the Latent Metric Model we derive procedures to discover and interpret the geometry of high-dimensional data, and explore hypotheses about the data generating mechanism. These procedures operate under minimal assumptions and make use of well known, scaleable graph-analytic algorithms.
△ Less
Submitted 9 February, 2024; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Correlated Boolean Operators for Uncertainty Logic
Authors:
Enrique Miralles-Dolz,
Ander Gray,
Edoardo Patelli,
Scott Ferson
Abstract:
We present a correlated \textit{and} gate which may be used to propagate uncertainty and dependence through Boolean functions, since any Boolean function may be expressed as a combination of \textit{and} and \textit{not} operations. We argue that the \textit{and} gate is a bivariate copula family, which has the interpretation of constructing bivariate Bernoulli random variables following a given P…
▽ More
We present a correlated \textit{and} gate which may be used to propagate uncertainty and dependence through Boolean functions, since any Boolean function may be expressed as a combination of \textit{and} and \textit{not} operations. We argue that the \textit{and} gate is a bivariate copula family, which has the interpretation of constructing bivariate Bernoulli random variables following a given Pearson correlation coefficient and marginal probabilities. We show how this copula family may be used to propagate uncertainty in the form of probabilities of events, probability intervals, and probability boxes, with only partial or no knowledge of the dependency between events, expressed as an interval for the correlation coefficient. These results generalise previous results by Fréchet on the conjunction of two events with unknown dependencies. We show an application propagating uncertainty through a fault tree for a pressure tank. This paper comes with an open-source Julia library for performing uncertainty logic.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Matrix factorisation and the interpretation of geodesic distance
Authors:
Nick Whiteley,
Annie Gray,
Patrick Rubin-Delanchy
Abstract:
Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is…
▽ More
Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques.
△ Less
Submitted 22 September, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Spatio-temporal Local Interpolation of Global Ocean Heat Transport using Argo Floats: A Debiased Latent Gaussian Process Approach
Authors:
Beomjo Park,
Mikael Kuusela,
Donata Giglio,
Alison Gray
Abstract:
The world ocean plays a key role in redistributing heat in the climate system and hence in regulating Earth's climate. Yet statistical analysis of ocean heat transport suffers from partially incomplete large-scale data intertwined with complex spatio-temporal dynamics, as well as from potential model misspecification. We present a comprehensive spatio-temporal statistical framework tailored to int…
▽ More
The world ocean plays a key role in redistributing heat in the climate system and hence in regulating Earth's climate. Yet statistical analysis of ocean heat transport suffers from partially incomplete large-scale data intertwined with complex spatio-temporal dynamics, as well as from potential model misspecification. We present a comprehensive spatio-temporal statistical framework tailored to interpolating the global ocean heat transport using in-situ Argo profiling float measurements. We formalize the statistical challenges using latent local Gaussian process regression accompanied by a two-stage fitting procedure. We introduce an approximate Expectation-Maximization algorithm to jointly estimate both the mean field and the covariance parameters, and refine the potentially under-specified mean field model with a debiasing procedure. This approach provides data-driven global ocean heat transport fields that vary in both space and time and can provide insights into crucial dynamical phenomena, such as El Ni{ñ}o \& La Ni{ñ}a, as well as the global climatological mean heat transport field, which by itself is of scientific interest. The proposed framework and the Argo-based estimates are thoroughly validated with state-of-the-art multimission satellite products and shown to yield realistic subsurface ocean heat transport estimates.
△ Less
Submitted 18 July, 2022; v1 submitted 20 May, 2021;
originally announced May 2021.
-
Solving Constrained CASH Problems with ADMM
Authors:
Parikshit Ram,
Sijia Liu,
Deepak Vijaykeerthi,
Dakuo Wang,
Djallel Bouneffouf,
Greg Bramble,
Horst Samulowitz,
Alexander G. Gray
Abstract:
The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization fram…
▽ More
The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization framework to decompose CASH into multiple small problems and demonstrate how ADMM facilitates incorporation of black-box constraints.
△ Less
Submitted 10 July, 2020; v1 submitted 16 June, 2020;
originally announced June 2020.
-
AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates
Authors:
Daniel Karl I. Weidele,
Justin D. Weisz,
Eno Oduor,
Michael Muller,
Josh Andres,
Alexander Gray,
Dakuo Wang
Abstract:
Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate m…
▽ More
Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate model results. Thus, users often do not understand the process, neither do they trust the outputs. In this short paper, we provide a first user evaluation by 10 data scientists of an experimental system, AutoAIViz, that aims to visualize AutoAI's model generation process. We find that the proposed system helps users to complete the data science tasks, and increases their understanding, toward the goal of increasing trust in the AutoAI system.
△ Less
Submitted 17 January, 2020; v1 submitted 13 December, 2019;
originally announced December 2019.
-
An ADMM Based Framework for AutoML Pipeline Configuration
Authors:
Sijia Liu,
Parikshit Ram,
Deepak Vijaykeerthy,
Djallel Bouneffouf,
Gregory Bramble,
Horst Samulowitz,
Dakuo Wang,
Andrew Conn,
Alexander Gray
Abstract:
We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of mu…
▽ More
We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints along-side the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits),and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML& OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework.
△ Less
Submitted 6 December, 2019; v1 submitted 1 May, 2019;
originally announced May 2019.
-
Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines
Authors:
Hassan A. Kingravi,
Patricio A. Vela,
Alexandar Gray
Abstract:
This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empir…
▽ More
This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empirical operator. A simple data point selection procedure is given to generate a substitute density for the data, with accuracy that is governed by a user-tunable parameter . The effect of the approximation on the quality of the KPCA solution, in terms of spectral and operator errors, can be shown directly in terms of the density estimate error and as a function of the parameter . We show in experiments that RSKPCA can improve both training and evaluation time of KPCA by up to an order of magnitude, and compares favorably to the widely-used Nystrom and density-weighted Nystrom methods.
△ Less
Submitted 26 July, 2015;
originally announced July 2015.
-
Modeling an Augmented Lagrangian for Blackbox Constrained Optimization
Authors:
Robert B. Gramacy,
Genetha A. Gray,
Sebastien Le Digabel,
Herbert K. H. Lee,
Pritam Ranjan,
Garth Wells,
Stefan M. Wild
Abstract:
Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth.…
▽ More
Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth. To narrow that gap, we propose a combination of response surface modeling, expected improvement, and the augmented Lagrangian numerical optimization framework. This hybrid approach allows the statistical model to think globally and the augmented Lagrangian to act locally. We focus on problems where the constraints are the primary bottleneck, requiring expensive simulation to evaluate and substantial modeling effort to map out. In that context, our hybridization presents a simple yet effective solution that allows existing objective-oriented statistical approaches, like those based on Gaussian process surrogates and expected improvement heuristics, to be applied to the constrained setting with minor modification. This work is motivated by a challenging, real-data benchmark problem from hydrology where, even with a simple linear objective function, learning a nontrivial valid region complicates the search for a global minimum.
△ Less
Submitted 3 March, 2015; v1 submitted 19 March, 2014;
originally announced March 2014.
-
Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens
Authors:
Ravi Ganti,
Alexander G. Gray
Abstract:
In this paper we propose a multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification. By carefully constructing an analogy between active learning and multi-armed bandits, we utilize ideas such as lower confidence bounds, and self-concordant regularization from the multi-armed bandit literature to design our proposed algorithm. Our algorithm is a se…
▽ More
In this paper we propose a multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification. By carefully constructing an analogy between active learning and multi-armed bandits, we utilize ideas such as lower confidence bounds, and self-concordant regularization from the multi-armed bandit literature to design our proposed algorithm. Our algorithm is a sequential algorithm, which in each round assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for the label of this sampled point. The design of this sampling distribution is also inspired by the analogy between active learning and multi-armed bandits. We show how to derive lower confidence bounds required by our algorithm. Experimental comparisons to previously proposed active learning algorithms show superior performance on some standard UCI datasets.
△ Less
Submitted 26 September, 2013;
originally announced September 2013.
-
Local Support Vector Machines:Formulation and Analysis
Authors:
Ravi Ganti,
Alexander Gray
Abstract:
We provide a formulation for Local Support Vector Machines (LSVMs) that generalizes previous formulations, and brings out the explicit connections to local polynomial learning used in nonparametric estimation literature. We investigate the simplest type of LSVMs called Local Linear Support Vector Machines (LLSVMs). For the first time we establish conditions under which LLSVMs make Bayes consistent…
▽ More
We provide a formulation for Local Support Vector Machines (LSVMs) that generalizes previous formulations, and brings out the explicit connections to local polynomial learning used in nonparametric estimation literature. We investigate the simplest type of LSVMs called Local Linear Support Vector Machines (LLSVMs). For the first time we establish conditions under which LLSVMs make Bayes consistent predictions at each test point $x_0$. We also establish rates at which the local risk of LLSVMs converges to the minimum value of expected local risk at each point $x_0$. Using stability arguments we establish generalization error bounds for LLSVMs.
△ Less
Submitted 14 September, 2013;
originally announced September 2013.
-
Stochastic ADMM for Nonsmooth Optimization
Authors:
Hua Ouyang,
Niao He,
Alexander Gray
Abstract:
We present a stochastic setting for optimization problems with nonsmooth convex separable objective functions over linear equality constraints. To solve such problems, we propose a stochastic Alternating Direction Method of Multipliers (ADMM) algorithm. Our algorithm applies to a more general class of nonsmooth convex functions that does not necessarily have a closed-form solution by minimizing th…
▽ More
We present a stochastic setting for optimization problems with nonsmooth convex separable objective functions over linear equality constraints. To solve such problems, we propose a stochastic Alternating Direction Method of Multipliers (ADMM) algorithm. Our algorithm applies to a more general class of nonsmooth convex functions that does not necessarily have a closed-form solution by minimizing the augmented function directly. We also demonstrate the rates of convergence for our algorithm under various structural assumptions of the stochastic functions: $O(1/\sqrt{t})$ for convex functions and $O(\log t/t)$ for strongly convex functions. Compared to previous literature, we establish the convergence rate of ADMM algorithm, for the first time, in terms of both the objective value and the feasibility violation.
△ Less
Submitted 22 January, 2013; v1 submitted 3 November, 2012;
originally announced November 2012.
-
Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL
Authors:
Nishant A. Mehta,
Dongryeol Lee,
Alexander G. Gray
Abstract:
Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxat…
▽ More
Since its inception, the modus operandi of multi-task learning (MTL) has been to minimize the task-wise mean of the empirical risks. We introduce a generalized loss-compositional paradigm for MTL that includes a spectrum of formulations as a subfamily. One endpoint of this spectrum is minimax MTL: a new MTL formulation that minimizes the maximum of the tasks' empirical risks. Via a certain relaxation of minimax MTL, we obtain a continuum of MTL formulations spanning minimax MTL and classical MTL. The full paradigm itself is loss-compositional, operating on the vector of empirical risks. It incorporates minimax MTL, its relaxations, and many new MTL formulations as special cases. We show theoretically that minimax MTL tends to avoid worst case outcomes on newly drawn test tasks in the learning to learn (LTL) test setting. The results of several MTL formulations on synthetic and real problems in the MTL and LTL test settings are encouraging.
△ Less
Submitted 13 September, 2012;
originally announced September 2012.
-
Faster Gaussian Summation: Theory and Experiment
Authors:
Dongryeol Lee,
Alexander G. Gray
Abstract:
We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We ri…
▽ More
We provide faster algorithms for the problem of Gaussian summation, which occurs in many machine learning methods. We develop two new extensions - an O(Dp) Taylor expansion for the Gaussian kernel with rigorous error bounds and a new error control scheme integrating any arbitrary approximation method - within the best discretealgorithmic framework using adaptive hierarchical data structures. We rigorously evaluate these techniques empirically in the context of optimal bandwidth selection in kernel density estimation, revealing the strengths and weaknesses of current state-of-the-art approaches for the first time. Our results demonstrate that the new error control scheme yields improved performance, whereas the series expansion approach is only effective in low dimensions (five or less).
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Fast Nonparametric Conditional Density Estimation
Authors:
Michael P. Holmes,
Alexander G. Gray,
Charles Lee Isbell
Abstract:
Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or n…
▽ More
Conditional density estimation generalizes regression by modeling a full density f(yjx) rather than only the expected value E(yjx). This is important for many tasks, including handling multi-modality and generating prediction intervals. Though fundamental and widely applicable, nonparametric conditional density estimators have received relatively little attention from statisticians and little or none from the machine learning community. None of that work has been applied to greater than bivariate data, presumably due to the computational difficulty of data-driven bandwidth selection. We describe the double kernel conditional density estimator and derive fast dual-tree-based algorithms for bandwidth selection using a maximum likelihood criterion. These techniques give speedups of up to 3.8 million in our experiments, and enable the first applications to previously intractable large multivariate datasets, including a redshift prediction problem from the Sloan Digital Sky Survey.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
Authors:
Hua Ouyang,
Alexander Gray
Abstract:
In this work we consider the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We propose a novel algorithm called Accelerated Nonsmooth Stochastic Gradient Descent (ANSGD), which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algorithm…
▽ More
In this work we consider the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We propose a novel algorithm called Accelerated Nonsmooth Stochastic Gradient Descent (ANSGD), which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algorithm that can achieve the optimal O(1/t) rate for minimizing nonsmooth loss functions (with strong convexity). The fast rates are confirmed by empirical comparisons, in which ANSGD significantly outperforms previous subgradient descent algorithms including SGD.
△ Less
Submitted 1 October, 2012; v1 submitted 20 May, 2012;
originally announced May 2012.
-
On the Sample Complexity of Predictive Sparse Coding
Authors:
Nishant A. Mehta,
Alexander G. Gray
Abstract:
The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have…
▽ More
The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have not been studied. We establish the first generalization error bounds for predictive sparse coding, covering two settings: 1) the overcomplete setting, where the number of features k exceeds the original dimensionality d; and 2) the high or infinite-dimensional setting, where only dimension-free bounds are useful. Both learning bounds intimately depend on stability properties of the learned sparse encoder, as measured on the training sample. Consequently, we first present a fundamental stability result for the LASSO, a result characterizing the stability of the sparse codes with respect to perturbations to the dictionary. In the overcomplete setting, we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with respect to d and k. In the high or infinite-dimensional setting, we show a dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and s, where s is an upper bound on the number of non-zeros in the sparse code for any training data point.
△ Less
Submitted 7 October, 2012; v1 submitted 17 February, 2012;
originally announced February 2012.
-
UPAL: Unbiased Pool Based Active Learning
Authors:
Ravi Ganti,
Alexander Gray
Abstract:
In this paper we address the problem of pool based active learning, and provide an algorithm, called UPAL, that works by minimizing the unbiased estimator of the risk of a hypothesis in a given hypothesis space. For the space of linear classifiers and the squared loss we show that UPAL is equivalent to an exponentially weighted average forecaster. Exploiting some recent results regarding the spect…
▽ More
In this paper we address the problem of pool based active learning, and provide an algorithm, called UPAL, that works by minimizing the unbiased estimator of the risk of a hypothesis in a given hypothesis space. For the space of linear classifiers and the squared loss we show that UPAL is equivalent to an exponentially weighted average forecaster. Exploiting some recent results regarding the spectra of random matrices allows us to establish consistency of UPAL when the true hypothesis is a linear hypothesis. Empirical comparison with an active learner implementation in Vowpal Wabbit, and a previously proposed pool based active learner implementation show good empirical performance and better scalability.
△ Less
Submitted 13 November, 2011; v1 submitted 7 November, 2011;
originally announced November 2011.
-
Dual-Tree Fast Gauss Transforms
Authors:
Dongryeol Lee,
Alexander G. Gray,
Andrew W. Moore
Abstract:
Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying density distribution with minimal assumptions. Although they can be shown to achieve asymptotic estimation optimality for any input distribution, cross-validating for an optimal parameter requires significant computation dominated by kernel summations. In this paper we present an improvement to the dua…
▽ More
Kernel density estimation (KDE) is a popular statistical technique for estimating the underlying density distribution with minimal assumptions. Although they can be shown to achieve asymptotic estimation optimality for any input distribution, cross-validating for an optimal parameter requires significant computation dominated by kernel summations. In this paper we present an improvement to the dual-tree algorithm, the first practical kernel summation algorithm for general dimension. Our extension is based on the series-expansion for the Gaussian kernel used by fast Gauss transform. First, we derive two additional analytical machinery for extending the original algorithm to utilize a hierarchical data structure, demonstrating the first truly hierarchical fast Gauss transform. Second, we show how to integrate the series-expansion approximation within the dual-tree approach to compute kernel summations with a user-controllable relative error bound. We evaluate our algorithm on real-world datasets in the context of optimal bandwidth selection in kernel density estimation. Our results demonstrate that our new algorithm is the only one that guarantees a hard relative error bound and offers fast performance across a wide range of bandwidths evaluated in cross validation procedures.
△ Less
Submitted 14 February, 2011;
originally announced February 2011.
-
Generative and Latent Mean Map Kernels
Authors:
Nishant A. Mehta,
Alexander G. Gray
Abstract:
We introduce two kernels that extend the mean map, which embeds probability measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth similarity measure between probabilistic models. The latent mean map kernel (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of empirical distributions in order to incorporate latent variable models. When comparing certain c…
▽ More
We introduce two kernels that extend the mean map, which embeds probability measures in Hilbert spaces. The generative mean map kernel (GMMK) is a smooth similarity measure between probabilistic models. The latent mean map kernel (LMMK) generalizes the non-iid formulation of Hilbert space embeddings of empirical distributions in order to incorporate latent variable models. When comparing certain classes of distributions, the GMMK exhibits beneficial regularization and generalization properties not shown for previous generative kernels. We present experiments comparing support vector machine performance using the GMMK and LMMK between hidden Markov models to the performance of other methods on discrete and continuous observation sequence data. The results suggest that, in many cases, the GMMK has generalization error competitive with or better than other methods.
△ Less
Submitted 3 May, 2010;
originally announced May 2010.
-
Sequential category aggregation and partitioning approaches for multi-way contingency tables based on survey and census data
Authors:
L. Fraser Jackson,
Alistair G. Gray,
Stephen E. Fienberg
Abstract:
Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by find…
▽ More
Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by finding members of a class of restricted log-linear models which maximize the likelihood of the data and use this to find a parsimonious means of representing the table. In contrast with more standard approaches for model search in hierarchical log-linear models (HLLM), our procedure systematically reduces the number of categories of the variables. Through a series of examples, we illustrate the extent to which it can preserve the interaction structure found with HLLMs and be used as a data simplification procedure prior to HLL modeling. A feature of the procedure is that it can easily be applied to many tables with millions of cells, providing a new way of summarizing large data sets in many disciplines. The focus is on information and description rather than statistical testing. The procedure may treat each variable in the table in different ways, preserving full detail, treating it as fully nominal, or preserving ordinality.
△ Less
Submitted 11 November, 2008;
originally announced November 2008.