Search | arXiv e-print repository

Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

Authors: Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert

Abstract: Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bou… ▽ More Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification. △ Less

Submitted 17 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 18 pages, 6 figures

arXiv:2305.11323 [pdf, other]

Cumulative differences between paired samples

Authors: Isabel Kloumann, Hannah Korevaar, Chris McConnell, Mark Tygert, Jessica Zhao

Abstract: The simplest, most common paired samples consist of observations from two populations, with each observed response from one population corresponding to an observed response from the other population at the same value of an ordinal covariate. The pair of observed responses (one from each population) at the same value of the covariate is known as a "matched pair" (with the matching based on the valu… ▽ More The simplest, most common paired samples consist of observations from two populations, with each observed response from one population corresponding to an observed response from the other population at the same value of an ordinal covariate. The pair of observed responses (one from each population) at the same value of the covariate is known as a "matched pair" (with the matching based on the value of the covariate). A graph of cumulative differences between the two populations reveals differences in responses as a function of the covariate. Indeed, the slope of the secant line connecting two points on the graph becomes the average difference over the wide interval of values of the covariate between the two points; i.e., slope of the graph is the average difference in responses. ("Average" refers to the weighted average if the samples are weighted.) Moreover, a simple statistic known as the Kuiper metric summarizes into a single scalar the overall differences over all values of the covariate. The Kuiper metric is the absolute value of the total difference in responses between the two populations, totaled over the interval of values of the covariate for which the absolute value of the total is greatest. The total should be normalized such that it becomes the (weighted) average over all values of the covariate when the interval over which the total is taken is the entire range of the covariate (i.e., the sum for the total gets divided by the total number of observations, if the samples are unweighted, or divided by the total weight, if the samples are weighted). This cumulative approach is fully nonparametric and uniquely defined (with only one right way to construct the graphs and scalar summary statistics), unlike traditional methods such as reliability diagrams or parametric or semi-parametric regressions, which typically obscure significant differences due to their parameter settings. △ Less

Submitted 8 April, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: 19 pages, 9 figures

arXiv:2303.02226 [pdf, other]

An efficient algorithm for integer lattice reduction

Authors: François Charton, Kristin Lauter, Cathy Li, Mark Tygert

Abstract: A lattice of integers is the collection of all linear combinations of a set of vectors for which all entries of the vectors are integers and all coefficients in the linear combinations are also integers. Lattice reduction refers to the problem of finding a set of vectors in a given lattice such that the collection of all integer linear combinations of this subset is still the entire original latti… ▽ More A lattice of integers is the collection of all linear combinations of a set of vectors for which all entries of the vectors are integers and all coefficients in the linear combinations are also integers. Lattice reduction refers to the problem of finding a set of vectors in a given lattice such that the collection of all integer linear combinations of this subset is still the entire original lattice and so that the Euclidean norms of the subset are reduced. The present paper proposes simple, efficient iterations for lattice reduction which are guaranteed to reduce the Euclidean norms of the basis vectors (the vectors in the subset) monotonically during every iteration. Each iteration selects the basis vector for which projecting off (with integer coefficients) the components of the other basis vectors along the selected vector minimizes the Euclidean norms of the reduced basis vectors. Each iteration projects off the components along the selected basis vector and efficiently updates all information required for the next iteration to select its best basis vector and perform the associated projections. △ Less

Submitted 3 August, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 29 pages, 20 figures

Journal ref: SIAM Journal on Matrix Analysis and Applications, 45 (1): 353-367, 2024

arXiv:2207.13632 [pdf, ps, other]

Ties in ranking scores can be treated as weighted samples

Authors: Mark Tygert

Abstract: Prior proposals for cumulative statistics suggest making tiny random perturbations to the scores (independent variables in a regression) in order to ensure the scores' uniqueness. Uniqueness means that no score for any member of the population or subpopulation being analyzed is exactly equal to any other member's score. It turns out to be possible to construct from the original data a weighted dat… ▽ More Prior proposals for cumulative statistics suggest making tiny random perturbations to the scores (independent variables in a regression) in order to ensure the scores' uniqueness. Uniqueness means that no score for any member of the population or subpopulation being analyzed is exactly equal to any other member's score. It turns out to be possible to construct from the original data a weighted data set that modifies the scores, weights, and responses (dependent variables in the regression) such that the new scores are unique and (together with the new weights and responses) yield the desired cumulative statistics for the original data. This reduces the problem of analyzing data with scores that may not be unique to the problem of analyzing a weighted data set with scores that are unique by construction. Recent proposals for cumulative statistics have already detailed how to process weighted samples whose scores are unique. △ Less

Submitted 5 August, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 4 pages. arXiv admin note: substantial text overlap with arXiv:2202.00100

arXiv:2205.09680 [pdf, other]

Metrics of calibration for probabilistic predictions

Authors: Imanol Arrieta-Ibarra, Paman Gujral, Jonathan Tannen, Mark Tygert, Cherie Xu

Abstract: Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, "reliability diagrams" help detect and diagnose statistically significant discrepancies -- so-called "miscalibration" -- between the predictions and the outcomes. The canonical reliability diagrams histogram… ▽ More Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, "reliability diagrams" help detect and diagnose statistically significant discrepancies -- so-called "miscalibration" -- between the predictions and the outcomes. The canonical reliability diagrams histogram the observed and expected values of the predictions; replacing the hard histogram binning with soft kernel density estimation is another common practice. But, which widths of bins or kernels are best? Plots of the cumulative differences between the observed and expected values largely avoid this question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision, even when the constant offsets of the secant lines are irrelevant; there is no need to bin or perform kernel density estimation. The existing standard metrics of miscalibration each summarize a reliability diagram as a single scalar statistic. The cumulative plots naturally lead to scalar metrics for the deviation of the graph of cumulative differences away from zero; good calibration corresponds to a horizontal, flat graph which deviates little from zero. The cumulative approach is currently unconventional, yet offers many favorable statistical properties, guaranteed via mathematical theory backed by rigorous proofs and illustrative numerical examples. In particular, metrics based on binning or kernel density estimation unavoidably must trade-off statistical confidence for the ability to resolve variations as a function of the predicted probability or vice versa. Widening the bins or kernels averages away random noise while giving up some resolving power. Narrowing the bins or kernels enhances resolving power while not averaging away as much noise. △ Less

Submitted 12 June, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: 50 pages, 36 figures

Journal ref: Journal of Machine Learning Research, 23: 1-54, 2022

arXiv:2202.00100 [pdf, other]

Calibration of P-values for calibration and for deviation of a subpopulation from the full population

Authors: Mark Tygert

Abstract: The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics… ▽ More The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples. △ Less

Submitted 8 April, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: 22 pages, 8 figures

Journal ref: Advances in Computational Mathematics, 49 (70): 1-22, 2023

arXiv:2112.00672 [pdf, other]

Controlling for multiple covariates

Authors: Mark Tygert

Abstract: A fundamental problem in statistics is to compare the outcomes attained by members of subpopulations. This problem arises in the analysis of randomized controlled trials, in the analysis of A/B tests, and in the assessment of fairness and bias in the treatment of sensitive subpopulations, especially when measuring the effects of algorithms and machine learning. Often the comparison makes the most… ▽ More A fundamental problem in statistics is to compare the outcomes attained by members of subpopulations. This problem arises in the analysis of randomized controlled trials, in the analysis of A/B tests, and in the assessment of fairness and bias in the treatment of sensitive subpopulations, especially when measuring the effects of algorithms and machine learning. Often the comparison makes the most sense when performed separately for individuals who are similar according to certain characteristics given by the values of covariates of interest; the separate comparisons can also be aggregated in various ways to compare across all values of the covariates. Separating, segmenting, or stratifying into those with similar values of the covariates is also known as "conditioning on" or "controlling for" those covariates; controlling for age or annual income is common. Two standard methods of controlling for covariates are (1) binning and (2) regression modeling. Binning requires making fairly arbitrary, yet frequently highly influential choices, and is unsatisfactorily temperamental in multiple dimensions, with multiple covariates. Regression analysis works wonderfully when there is good reason to believe in a particular parameterized regression model or classifier (such as logistic regression). Thus, there appears to be no extant canonical fully non-parametric regression for the comparison of subpopulations, not while conditioning on multiple specified covariates. Existing methods rely on analysts to make choices, and those choices can be debatable; analysts can deceive others or even themselves. The present paper aims to fill the gap, combining two ingredients: (1) recently developed methodologies for such comparisons that already exist when conditioning on a single scalar covariate and (2) the Hilbert space-filling curve that maps continuously from one dimension to multiple dimensions. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: 29 pages, 21 figures, 2 tables

arXiv:2108.02666 [pdf, other]

A graphical method of cumulative differences between two subpopulations

Authors: Mark Tygert

Abstract: Comparing the differences in outcomes (that is, in "dependent variables") between two subpopulations is often most informative when comparing outcomes only for individuals from the subpopulations who are similar according to "independent variables." The independent variables are generally known as "scores," as in propensity scores for matching or as in the probabilities predicted by statistical or… ▽ More Comparing the differences in outcomes (that is, in "dependent variables") between two subpopulations is often most informative when comparing outcomes only for individuals from the subpopulations who are similar according to "independent variables." The independent variables are generally known as "scores," as in propensity scores for matching or as in the probabilities predicted by statistical or machine-learned models, for example. If the outcomes are discrete, then some averaging is necessary to reduce the noise arising from the outcomes varying randomly over those discrete values in the observed data. The traditional method of averaging is to bin the data according to the scores and plot the average outcome in each bin against the average score in the bin. However, such binning can be rather arbitrary and yet greatly impacts the interpretation of displayed deviation between the subpopulations and assessment of its statistical significance. Fortunately, such binning is entirely unnecessary in plots of cumulative differences and in the associated scalar summary metrics that are analogous to the workhorse statistics of comparing probability distributions -- those due to Kolmogorov and Smirnov and their refinements due to Kuiper. The present paper develops such cumulative methods for the common case in which no score of any member of the subpopulations being compared is exactly equal to the score of any other member of either subpopulation. △ Less

Submitted 24 October, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

Comments: 26 pages, 15 figures, 2 tables. arXiv admin note: text overlap with arXiv:2008.01779

Journal ref: Journal of Big Data, 8 (158): 1-29, 2021

arXiv:2008.01779 [pdf, other]

Cumulative deviation of a subpopulation from the full population

Authors: Mark Tygert

Abstract: Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population such that similar individuals get similar scores; matching via propensity scores or appropriate covariates is common, for example. Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships… ▽ More Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population such that similar individuals get similar scores; matching via propensity scores or appropriate covariates is common, for example. Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships in the subpopulation. The traditional graphical methods for visualizing inequities are known as "reliability diagrams" or "calibrations plots," which bin the scores into a partition of all possible values, and for each bin plot both the average outcomes for only individuals in the subpopulation as well as the average outcomes for all individuals; comparing the graph for the subpopulation with that for the full population gives some sense of how the averages for the subpopulation deviate from the averages for the full population. Unfortunately, real data sets contain only finitely many observations, limiting the usable resolution of the bins, and so the conventional methods can obscure important variations due to the binning. Fortunately, plotting cumulative deviation of the subpopulation from the full population as proposed in this paper sidesteps the problematic coarse binning. The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs. Slope is easy to perceive even when the constant offsets of the secant lines are irrelevant. The cumulative approach avoids binning that smooths over deviations of the subpopulation from the full population. Such cumulative aggregation furnishes both high-resolution graphical methods and simple scalar summary statistics (analogous to those of Kuiper and of Kolmogorov and Smirnov used in statistical significance testing for comparing probability distributions). △ Less

Submitted 7 July, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: 70 pages, 51 figures, 2 tables; the new versions of the paper merge in most of arXiv:2006.02504

Journal ref: Journal of Big Data, 8 (117): 1-60, 2021

arXiv:2006.02577 [pdf, ps, other]

An optimizable scalar objective value cannot be objective and should not be the sole objective

Authors: Isabel Kloumann, Mark Tygert

Abstract: This paper concerns the ethics and morality of algorithms and computational systems, and has been circulating internally at Facebook for the past couple years. The paper reviews many Nobel laureates' work, as well as the work of other prominent scientists such as Richard Dawkins, Andrei Kolmogorov, Vilfredo Pareto, and John von Neumann. The paper draws conclusions based on such works, as summarize… ▽ More This paper concerns the ethics and morality of algorithms and computational systems, and has been circulating internally at Facebook for the past couple years. The paper reviews many Nobel laureates' work, as well as the work of other prominent scientists such as Richard Dawkins, Andrei Kolmogorov, Vilfredo Pareto, and John von Neumann. The paper draws conclusions based on such works, as summarized in the title. The paper argues that the standard approach to modern machine learning and artificial intelligence is bound to be biased and unfair, and that longstanding traditions in the professions of law, justice, politics, and medicine should help. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: 13 pages

arXiv:2006.02504 [pdf, other]

Plots of the cumulative differences between observed and expected values of ordered Bernoulli variates

Authors: Mark Tygert

Abstract: Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30 percent chance. Given both the predictions and the actual outcomes, "reliability diagrams" (also known as "calibration plots") help detect and diagnose statistically significant discrepancies between the predictions and the outcomes. The canonical reliability diagrams are… ▽ More Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30 percent chance. Given both the predictions and the actual outcomes, "reliability diagrams" (also known as "calibration plots") help detect and diagnose statistically significant discrepancies between the predictions and the outcomes. The canonical reliability diagrams are based on histogramming the observed and expected values of the predictions; several variants of the standard reliability diagrams propose to replace the hard histogram binning with soft kernel density estimation using smooth convolutional kernels of widths similar to the widths of the bins. In all cases, an important question naturally arises: which widths are best (or are multiple plots with different widths better)? Rather than answering this question, plots of the cumulative differences between the observed and expected values largely avoid the question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision even when the constant offsets of the secant lines are irrelevant. There is no need to bin or perform kernel density estimation with a somewhat arbitrary kernel. △ Less

Submitted 16 July, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: 18 pages, 12 figures

arXiv:2001.03192 [pdf, other]

Secure multiparty computations in floating-point arithmetic

Authors: Chuan Guo, Awni Hannun, Brian Knott, Laurens van der Maaten, Mark Tygert, Ruiyu Zhu

Abstract: Secure multiparty computations enable the distribution of so-called shares of sensitive data to multiple parties such that the multiple parties can effectively process the data while being unable to glean much information about the data (at least not without collusion among all parties to put back together all the shares). Thus, the parties may conspire to send all their processed results to a tru… ▽ More Secure multiparty computations enable the distribution of so-called shares of sensitive data to multiple parties such that the multiple parties can effectively process the data while being unable to glean much information about the data (at least not without collusion among all parties to put back together all the shares). Thus, the parties may conspire to send all their processed results to a trusted third party (perhaps the data provider) at the conclusion of the computations, with only the trusted third party being able to view the final results. Secure multiparty computations for privacy-preserving machine-learning turn out to be possible using solely standard floating-point arithmetic, at least with a carefully controlled leakage of information less than the loss of accuracy due to roundoff, all backed by rigorous mathematical proofs of worst-case bounds on information loss and numerical stability in finite-precision arithmetic. Numerical examples illustrate the high performance attained on commodity off-the-shelf hardware for generalized linear models, including ordinary linear least-squares regression, binary and multinomial logistic regression, probit regression, and Poisson regression. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 31 pages, 13 figures, 6 tables

Journal ref: Information and Inference: a Journal of the IMA, iaaa038: 1-33, 2021

arXiv:1902.00608 [pdf, other]

Methods of interpreting error estimates for grayscale image reconstructions

Authors: Aaron Defazio, Mark Tygert

Abstract: One representation of possible errors in a grayscale image reconstruction is as another grayscale image estimating potentially worrisome differences between the reconstruction and the actual "ground-truth" reality. Visualizations and summary statistics can aid in the interpretation of such a representation of error estimates. Visualizations include suitable colorizations of the reconstruction, as… ▽ More One representation of possible errors in a grayscale image reconstruction is as another grayscale image estimating potentially worrisome differences between the reconstruction and the actual "ground-truth" reality. Visualizations and summary statistics can aid in the interpretation of such a representation of error estimates. Visualizations include suitable colorizations of the reconstruction, as well as the obvious "correction" of the reconstruction by subtracting off the error estimates. The canonical summary statistic would be the root-mean-square of the error estimates. Numerical examples involving cranial magnetic-resonance imaging clarify the relative merits of the various methods in the context of compressed sensing. Unfortunately, the colorizations appear likely to be too distracting for actual clinical practice, and the root-mean-square gets swamped by background noise in the error estimates. Fortunately, straightforward displays of the error estimates and of the "corrected" reconstruction are illuminating, and the root-mean-square improves greatly after mild blurring of the error estimates; the blurring is barely perceptible to the human eye yet smooths away background noise that would otherwise overwhelm the root-mean-square. △ Less

Submitted 1 February, 2019; originally announced February 2019.

Comments: 23 pages, 16 figures, 3 tables

arXiv:1811.08026 [pdf, other]

doi 10.2140/camcos.2020.15.1

Simulating single-coil MRI from the responses of multiple coils

Authors: Mark Tygert, Jure Zbontar

Abstract: We convert the information-rich measurements of parallel and phased-array MRI into noisier data that a corresponding single-coil scanner could have taken. Specifically, we replace the responses from multiple receivers with a linear combination that emulates the response from only a single, aggregate receiver, replete with the low signal-to-noise ratio and phase problems of any single one of the or… ▽ More We convert the information-rich measurements of parallel and phased-array MRI into noisier data that a corresponding single-coil scanner could have taken. Specifically, we replace the responses from multiple receivers with a linear combination that emulates the response from only a single, aggregate receiver, replete with the low signal-to-noise ratio and phase problems of any single one of the original receivers (combining several receivers is necessary, however, since the original receivers usually have limited spatial sensitivity). This enables experimentation in the simpler context of a single-coil scanner prior to development of algorithms for the full complexity of multiple receiver coils. △ Less

Submitted 27 May, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: 14 pages, 17 figures

Journal ref: Commun. Appl. Math. Comput. Sci. 15 (2020) 1-13

arXiv:1809.06959 [pdf, other]

Compressed sensing with a jackknife and a bootstrap

Authors: Mark Tygert, Rachel Ward, Jure Zbontar

Abstract: Compressed sensing proposes to reconstruct more degrees of freedom in a signal than the number of values actually measured. Compressed sensing therefore risks introducing errors -- inserting spurious artifacts or masking the abnormalities that medical imaging seeks to discover. The present case study of estimating errors using the standard statistical tools of a jackknife and a bootstrap yields er… ▽ More Compressed sensing proposes to reconstruct more degrees of freedom in a signal than the number of values actually measured. Compressed sensing therefore risks introducing errors -- inserting spurious artifacts or masking the abnormalities that medical imaging seeks to discover. The present case study of estimating errors using the standard statistical tools of a jackknife and a bootstrap yields error "bars" in the form of full images that are remarkably representative of the actual errors (at least when evaluated and validated on data sets for which the ground truth and hence the actual error is available). These images show the structure of possible errors -- without recourse to measuring the entire ground truth directly -- and build confidence in regions of the images where the estimated errors are small. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: 67 pages, 83 figures: the images in the appendix are low-quality; high-quality images are available at http://tygert.com/comps.pdf

Journal ref: Journal of Data Science, Statistics, and Visualisation, 2 (4): 1-29, 2022

arXiv:1710.04238 [pdf, other]

Regression-aware decompositions

Authors: Mark Tygert

Abstract: Linear least-squares regression with a "design" matrix A approximates a given matrix B via minimization of the spectral- or Frobenius-norm discrepancy ||AX-B|| over every conformingly sized matrix X. Another popular approximation is low-rank approximation via principal component analysis (PCA) -- which is essentially singular value decomposition (SVD) -- or interpolative decomposition (ID). Classi… ▽ More Linear least-squares regression with a "design" matrix A approximates a given matrix B via minimization of the spectral- or Frobenius-norm discrepancy ||AX-B|| over every conformingly sized matrix X. Another popular approximation is low-rank approximation via principal component analysis (PCA) -- which is essentially singular value decomposition (SVD) -- or interpolative decomposition (ID). Classically, PCA/SVD and ID operate solely with the matrix B being approximated, not supervised by any auxiliary matrix A. However, linear least-squares regression models can inform the ID, yielding regression-aware ID. As a bonus, this provides an interpretation as regression-aware PCA for a kind of canonical correlation analysis between A and B. The regression-aware decompositions effectively enable supervision to inform classical dimensionality reduction, which classically has been totally unsupervised. The regression-aware decompositions reveal the structure inherent in B that is relevant to regression against A. △ Less

Submitted 12 February, 2018; v1 submitted 11 October, 2017; originally announced October 2017.

Comments: 19 pages, 9 figures, 2 tables

Journal ref: Linear Algebra and Its Applications, 565 (6): 208-224, 2019

arXiv:1709.01062 [pdf, ps, other]

A hierarchical loss and its problems when classifying non-hierarchically

Authors: Cinna Wu, Mark Tygert, Yann LeCun

Abstract: Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a s… ▽ More Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier. △ Less

Submitted 9 December, 2019; v1 submitted 1 September, 2017; originally announced September 2017.

Comments: 19 pages, 4 figures, 7 tables

Journal ref: PLOS ONE, 14 (12): 1-17, 2019

arXiv:1612.08709 [pdf, other]

Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

Authors: Huamin Li, Yuval Kluger, Mark Tygert

Abstract: Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superi… ▽ More Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark (the popular platform for distributed computation); in particular, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision. △ Less

Submitted 1 January, 2018; v1 submitted 27 December, 2016; originally announced December 2016.

Comments: 21 pages, 29 tables, 1 figure, 8 algorithms in pseudocode

Journal ref: Advances in Computational Mathematics, 44 (5): 1651-1672, 2018

arXiv:1603.01765 [pdf, ps, other]

Accurate principal component analysis via a few iterations of alternating least squares

Authors: Arthur Szlam, Andrew Tulloch, Mark Tygert

Abstract: A few iterations of alternating least squares with a random starting point provably suffice to produce nearly optimal spectral- and Frobenius-norm accuracies of low-rank approximations to a matrix; iterating to convergence is unnecessary. Thus, software implementing alternating least squares can be retrofitted via appropriate setting of parameters to calculate nearly optimally accurate low-rank ap… ▽ More A few iterations of alternating least squares with a random starting point provably suffice to produce nearly optimal spectral- and Frobenius-norm accuracies of low-rank approximations to a matrix; iterating to convergence is unnecessary. Thus, software implementing alternating least squares can be retrofitted via appropriate setting of parameters to calculate nearly optimally accurate low-rank approximations highly efficiently, with no need for convergence. △ Less

Submitted 5 March, 2016; originally announced March 2016.

Comments: 9 pages, 3 tables

Journal ref: SIAM Journal on Matrix Analysis and Applications, 38 (2): 425-433, 2017

arXiv:1602.02823 [pdf, other]

Poor starting points in machine learning

Authors: Mark Tygert

Abstract: Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nest… ▽ More Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Comments: 11 pages, 3 figures, 1 table; this initial version is literally identical to that circulated among a restricted audience over a month ago

arXiv:1506.08230 [pdf, other]

Convolutional networks and learning invariant to homogeneous multiplicative scalings

Authors: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

Abstract: The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification st… ▽ More The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification stage turns out to be more robust than multinomial logistic regression, appears to result in slightly lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning. "Scale-invariant" means that multiplying the input values by any nonzero scalar leaves the output unchanged. △ Less

Submitted 16 February, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

Comments: 12 pages, 6 figures, 4 tables

Journal ref: Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017

arXiv:1503.03438 [pdf, ps, other]

A mathematical motivation for complex-valued convolutional networks

Authors: Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, Mark Tygert

Abstract: A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-v… ▽ More A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as "data-driven multiscale windowed power spectra," "data-driven multiscale windowed absolute spectra," "data-driven multiwavelet absolute values," or (in their most general configuration) "data-driven nonlinear multiwavelet packets." Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max. pooling, etc., do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets. △ Less

Submitted 12 December, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

Comments: 11 pages, 3 figures; this is the retitled version submitted to the journal, "Neural Computation"

Journal ref: Neural Computation, 28 (5): 815-825, May 2016

arXiv:1412.3510 [pdf, other]

An implementation of a randomized algorithm for principal component analysis

Authors: Arthur Szlam, Yuval Kluger, Mark Tygert

Abstract: Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via… ▽ More Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical techniques (such as Lanczos iterations) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms, and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces). △ Less

Submitted 10 December, 2014; originally announced December 2014.

Comments: 13 pages, 4 figures

Journal ref: ACM TOMS, 43(3): 28:1-28:14, 2016

arXiv:1306.0959 [pdf, ps, other]

Testing goodness-of-fit for logistic regression

Authors: Mark Tygert, Rachel Ward

Abstract: Explicitly accounting for all applicable independent variables, even when the model being tested does not, is critical in testing goodness-of-fit for logistic regression. This can increase statistical power by orders of magnitude. Explicitly accounting for all applicable independent variables, even when the model being tested does not, is critical in testing goodness-of-fit for logistic regression. This can increase statistical power by orders of magnitude. △ Less

Submitted 20 June, 2013; v1 submitted 4 June, 2013; originally announced June 2013.

Comments: 13 pages, 4 tables

arXiv:1301.1208 [pdf, ps, other]

Significance testing without truth

Authors: William Perkins, Mark Tygert, Rachel Ward

Abstract: A popular approach to significance testing proposes to decide whether the given hypothesized statistical model is likely to be true (or false). Statistical decision theory provides a basis for this approach by requiring every significance test to make a decision about the truth of the hypothesis/model under consideration. Unfortunately, many interesting and useful models are obviously false (that… ▽ More A popular approach to significance testing proposes to decide whether the given hypothesized statistical model is likely to be true (or false). Statistical decision theory provides a basis for this approach by requiring every significance test to make a decision about the truth of the hypothesis/model under consideration. Unfortunately, many interesting and useful models are obviously false (that is, not exactly true) even before considering any data. Fortunately, in practice a significance test need only gauge the consistency (or inconsistency) of the observed data with the assumed hypothesis/model -- without enquiring as to whether the assumption is likely to be true (or false), or whether some alternative is likely to be true (or false). In this practical formulation, a significance test rejects a hypothesis/model only if the observed data is highly improbable when calculating the probability while assuming the hypothesis being tested; the significance test only gauges whether the observed data likely invalidates the assumed hypothesis, and cannot decide that the assumption -- however unmistakably false -- is likely to be false a priori, without any data. △ Less

Submitted 7 January, 2013; originally announced January 2013.

Comments: 9 pages

arXiv:1206.6378 [pdf, ps, other]

Computing the asymptotic power of a Euclidean-distance test for goodness-of-fit

Authors: William Perkins, Gary Simon, Mark Tygert

Abstract: A natural (yet unconventional) test for goodness-of-fit measures the discrepancy between the model and empirical distributions via their Euclidean distance (or, equivalently, via its square). The present paper characterizes the statistical power of such a test against a family of alternative distributions, in the limit that the number of observations is large, with every alternative departing from… ▽ More A natural (yet unconventional) test for goodness-of-fit measures the discrepancy between the model and empirical distributions via their Euclidean distance (or, equivalently, via its square). The present paper characterizes the statistical power of such a test against a family of alternative distributions, in the limit that the number of observations is large, with every alternative departing from the model in the same direction. Specifically, the paper provides an efficient numerical method for evaluating the cumulative distribution function (cdf) of the square of the Euclidean distance between the model and empirical distributions under the alternatives, in the limit that the number of observations is large. The paper illustrates the scheme by plotting the asymptotic power (as a function of the significance level) for several examples. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: 14 pages, 1 figure, 1 table

arXiv:1206.6367 [pdf, ps, other]

A comparison of the discrete Kolmogorov-Smirnov statistic and the Euclidean distance

Authors: Jacob Carruth, Mark Tygert, Rachel Ward

Abstract: Goodness-of-fit tests gauge whether a given set of observations is consistent (up to expected random fluctuations) with arising as independent and identically distributed (i.i.d.) draws from a user-specified probability distribution known as the "model." The standard gauges involve the discrepancy between the model and the empirical distribution of the observed draws. Some measures of discrepancy… ▽ More Goodness-of-fit tests gauge whether a given set of observations is consistent (up to expected random fluctuations) with arising as independent and identically distributed (i.i.d.) draws from a user-specified probability distribution known as the "model." The standard gauges involve the discrepancy between the model and the empirical distribution of the observed draws. Some measures of discrepancy are cumulative; others are not. The most popular cumulative measure is the Kolmogorov-Smirnov statistic; when all probability distributions under consideration are discrete, a natural noncumulative measure is the Euclidean distance between the model and the empirical distributions. In the present paper, both mathematical analysis and its illustration via various data sets indicate that the Kolmogorov-Smirnov statistic tends to be more powerful than the Euclidean distance when there is a natural ordering for the values that the draws can take -- that is, when the data is ordinal -- whereas the Euclidean distance is more reliable and more easily understood than the Kolmogorov-Smirnov statistic when there is no natural ordering (or partial order) -- that is, when the data is nominal. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: 15 pages, 6 figures, 3 tables

arXiv:1201.1431 [pdf, ps, other]

An introduction to how chi-square and classical exact tests often wildly misreport significance and how the remedy lies in computers

Authors: William Perkins, Mark Tygert, Rachel Ward

Abstract: Goodness-of-fit tests based on the Euclidean distance often outperform chi-square and other classical tests (including the standard exact tests) by at least an order of magnitude when the model being tested for goodness-of-fit is a discrete probability distribution that is not close to uniform. The present article discusses numerous examples of this. Goodness-of-fit tests based on the Euclidean me… ▽ More Goodness-of-fit tests based on the Euclidean distance often outperform chi-square and other classical tests (including the standard exact tests) by at least an order of magnitude when the model being tested for goodness-of-fit is a discrete probability distribution that is not close to uniform. The present article discusses numerous examples of this. Goodness-of-fit tests based on the Euclidean metric are now practical and convenient: although the actual values taken by the Euclidean distance and similar goodness-of-fit statistics are seldom humanly interpretable, black-box computer programs can rapidly calculate their precise significance. △ Less

Submitted 25 January, 2012; v1 submitted 6 January, 2012; originally announced January 2012.

Comments: 41 pages, 25 figures, 7 tables. arXiv admin note: near complete text overlap with arXiv:1108.4126

Journal ref: Applied and Computational Harmonic Analysis, 36 (3): 361-386, 2014

arXiv:1201.1421 [pdf, ps, other]

Testing the significance of assuming homogeneity in contingency-tables/cross-tabulations

Authors: Mark Tygert

Abstract: The model for homogeneity of proportions in a two-way contingency-table/cross-tabulation is the same as the model of independence, except that the probabilistic process generating the data is viewed as fixing the column totals (but not the row totals). When gauging the consistency of observed data with the assumption of independence, recent work has illustrated that the Euclidean/Frobenius/Hilbert… ▽ More The model for homogeneity of proportions in a two-way contingency-table/cross-tabulation is the same as the model of independence, except that the probabilistic process generating the data is viewed as fixing the column totals (but not the row totals). When gauging the consistency of observed data with the assumption of independence, recent work has illustrated that the Euclidean/Frobenius/Hilbert-Schmidt distance is often far more statistically powerful than the classical statistics such as chi-square, the log-likelihood-ratio (G), the Freeman-Tukey/Hellinger distance, and other members of the Cressie-Read power-divergence family. The present paper indicates that the Euclidean/Frobenius/Hilbert-Schmidt distance can be more powerful for gauging the consistency of observed data with the assumption of homogeneity, too. △ Less

Submitted 6 January, 2012; originally announced January 2012.

Comments: 14 pages, 18 tables

arXiv:1108.4126 [pdf, ps, other]

Chi-square and classical exact tests often wildly misreport significance; the remedy lies in computers

Authors: William Perkins, Mark Tygert, Rachel Ward

Abstract: If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson chi-square statistic can involve division by nearly zero. This often leads to serious trouble in practice -- even in the absence of round-off errors -- as the present article illustrates via numerous examples. Fortunately, with the now widespread availability of comp… ▽ More If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson chi-square statistic can involve division by nearly zero. This often leads to serious trouble in practice -- even in the absence of round-off errors -- as the present article illustrates via numerous examples. Fortunately, with the now widespread availability of computers, avoiding all the trouble is simple and easy: without the problematic division by nearly zero, the actual values taken by goodness-of-fit statistics are not humanly interpretable, but black-box computer programs can rapidly calculate their precise significance. △ Less

Submitted 15 September, 2011; v1 submitted 20 August, 2011; originally announced August 2011.

Comments: 63 pages, 51 figures, 7 tables

arXiv:1009.2260 [pdf, ps, other]

Computing the confidence levels for a root-mean-square test of goodness-of-fit, II

Authors: William Perkins, Mark Tygert, Rachel Ward

Abstract: This paper extends our earlier article, "Computing the confidence levels for a root-mean-square test of goodness-of-fit;" unlike in the earlier article, the models in the present paper involve parameter estimation -- both the null and alternative hypotheses in the associated tests are composite. We provide efficient black-box algorithms for calculating the asymptotic confidence levels of a variant… ▽ More This paper extends our earlier article, "Computing the confidence levels for a root-mean-square test of goodness-of-fit;" unlike in the earlier article, the models in the present paper involve parameter estimation -- both the null and alternative hypotheses in the associated tests are composite. We provide efficient black-box algorithms for calculating the asymptotic confidence levels of a variant on the classic chi-squared test. In some circumstances, it is also feasible to compute the exact confidence levels via Monte Carlo simulation. △ Less

Submitted 22 December, 2011; v1 submitted 12 September, 2010; originally announced September 2010.

Comments: 14 pages, 3 figures (each with two parts), 4 tables

arXiv:1007.5510 [pdf, ps, other]

An algorithm for the principal component analysis of large data sets

Authors: Nathan Halko, Per-Gunnar Martinsson, Yoel Shkolnisky, Mark Tygert

Abstract: Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure… ▽ More Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy --- even on parallel processors --- unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently "out-of-core.") We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM. △ Less

Submitted 19 March, 2011; v1 submitted 30 July, 2010; originally announced July 2010.

Comments: 17 pages, 3 figures (each with 2 or 3 subfigures), 2 tables (each with 2 subtables)

Journal ref: SIAM Journal on Scientific Computing, 33 (5): 2580-2594, 2011

arXiv:1006.0042 [pdf, ps, other]

Computing the confidence levels for a root-mean-square test of goodness-of-fit

Authors: William Perkins, Mark Tygert, Rachel Ward

Abstract: The classic chi-squared statistic for testing goodness-of-fit has long been a cornerstone of modern statistical practice. The statistic consists of a sum in which each summand involves division by the probability associated with the corresponding bin in the distribution being tested for goodness-of-fit. Typically this division should precipitate rebinning to uniformize the probabilities associated… ▽ More The classic chi-squared statistic for testing goodness-of-fit has long been a cornerstone of modern statistical practice. The statistic consists of a sum in which each summand involves division by the probability associated with the corresponding bin in the distribution being tested for goodness-of-fit. Typically this division should precipitate rebinning to uniformize the probabilities associated with the bins, in order to make the test reasonably powerful. With the now widespread availability of computers, there is no longer any need for this. The present paper provides efficient black-box algorithms for calculating the asymptotic confidence levels of a variant on the classic chi-squared test which omits the problematic division. In many circumstances, it is also feasible to compute the exact confidence levels via Monte Carlo simulation. △ Less

Submitted 7 March, 2011; v1 submitted 31 May, 2010; originally announced June 2010.

Comments: 19 pages, 8 figures, 3 tables

Journal ref: Applied Mathematics and Computation, 217 (22): 9072-9084, 2011

arXiv:1001.2286 [pdf, ps, other]

doi 10.1073/pnas.1008446107

Statistical tests for whether a given set of independent, identically distributed draws does not come from a specified probability density

Authors: Mark Tygert

Abstract: We discuss several tests for whether a given set of independent and identically distributed (i.i.d.) draws does not come from a specified probability density function. The most commonly used are Kolmogorov-Smirnov tests, particularly Kuiper's variant, which focus on discrepancies between the cumulative distribution function for the specified probability density and the empirical cumulative distrib… ▽ More We discuss several tests for whether a given set of independent and identically distributed (i.i.d.) draws does not come from a specified probability density function. The most commonly used are Kolmogorov-Smirnov tests, particularly Kuiper's variant, which focus on discrepancies between the cumulative distribution function for the specified probability density and the empirical cumulative distribution function for the given set of i.i.d. draws. Unfortunately, variations in the probability density function often get smoothed over in the cumulative distribution function, making it difficult to detect discrepancies in regions where the probability density is small in comparison with its values in surrounding regions. We discuss tests without this deficiency, complementing the classical methods. The tests of the present paper are based on the plain fact that it is unlikely to draw a random number whose probability is small, provided that the draw is taken from the same distribution used in calculating the probability (thus, if we draw a random number whose probability is small, then we can be confident that we did not draw the number from the same distribution used in calculating the probability). △ Less

Submitted 3 June, 2010; v1 submitted 13 January, 2010; originally announced January 2010.

Comments: 18 pages, 5 figures, 6 tables

Journal ref: Proceedings of the National Academy of Sciences (USA), 107 (38): 16471-16476, 2010

arXiv:0912.1135 [pdf, ps, other]

A fast randomized algorithm for orthogonal projection

Authors: Vladimir Rokhlin, Mark Tygert

Abstract: We describe an algorithm that, given any full-rank matrix A having fewer rows than columns, can rapidly compute the orthogonal projection of any vector onto the null space of A, as well as the orthogonal projection onto the row space of A, provided that both A and its adjoint can be applied rapidly to arbitrary vectors. As an intermediate step, the algorithm solves the overdetermined linear leas… ▽ More We describe an algorithm that, given any full-rank matrix A having fewer rows than columns, can rapidly compute the orthogonal projection of any vector onto the null space of A, as well as the orthogonal projection onto the row space of A, provided that both A and its adjoint can be applied rapidly to arbitrary vectors. As an intermediate step, the algorithm solves the overdetermined linear least-squares regression involving the adjoint of A (and so can be used for this, too). The basis of the algorithm is an obvious but numerically unstable scheme; suitable use of a preconditioner yields numerical stability. We generate the preconditioner rapidly via a randomized procedure that succeeds with extremely high probability. In many circumstances, the method can accelerate interior-point methods for convex optimization, such as linear programming (Ming Gu, personal communication). △ Less

Submitted 10 December, 2009; v1 submitted 6 December, 2009; originally announced December 2009.

Comments: 13 pages, 6 tables

Journal ref: SIAM Journal on Scientific Computing, 33 (2): 849-868, 2011

arXiv:0910.5435 [pdf, ps, other]

doi 10.1016/j.jcp.2010.05.004

Fast algorithms for spherical harmonic expansions, III

Authors: Mark Tygert

Abstract: We accelerate the computation of spherical harmonic transforms, using what is known as the butterfly scheme. This provides a convenient alternative to the approach taken in the second paper from this series on "Fast algorithms for spherical harmonic expansions." The requisite precomputations become manageable when organized as a "depth-first traversal" of the program's control-flow graph, rather t… ▽ More We accelerate the computation of spherical harmonic transforms, using what is known as the butterfly scheme. This provides a convenient alternative to the approach taken in the second paper from this series on "Fast algorithms for spherical harmonic expansions." The requisite precomputations become manageable when organized as a "depth-first traversal" of the program's control-flow graph, rather than as the perhaps more natural "breadth-first traversal" that processes one-by-one each level of the multilevel procedure. We illustrate the results via several numerical examples. △ Less

Submitted 5 April, 2010; v1 submitted 28 October, 2009; originally announced October 2009.

Comments: 14 pages, 1 figure, 6 tables

Journal ref: Fast algorithms for spherical harmonic expansions, III, Journal of Computational Physics, 229 (18): 6181-6192, 2010

arXiv:0905.4745 [pdf, ps, other]

A fast algorithm for computing minimal-norm solutions to underdetermined systems of linear equations

Authors: Mark Tygert

Abstract: We introduce a randomized algorithm for computing the minimal-norm solution to an underdetermined system of linear equations. Given an arbitrary full-rank m x n matrix A with m<n, any m x 1 vector b, and any positive real number epsilon less than 1, the procedure computes an n x 1 vector x approximating to relative precision epsilon or better the n x 1 vector p of minimal Euclidean norm satisfyi… ▽ More We introduce a randomized algorithm for computing the minimal-norm solution to an underdetermined system of linear equations. Given an arbitrary full-rank m x n matrix A with m<n, any m x 1 vector b, and any positive real number epsilon less than 1, the procedure computes an n x 1 vector x approximating to relative precision epsilon or better the n x 1 vector p of minimal Euclidean norm satisfying Ap=b. The algorithm typically requires O(mn log(sqrt(n)/epsilon) + m**3) floating-point operations, generally less than the O(m**2 n) required by the classical schemes based on QR-decompositions or bidiagonalization. We present several numerical examples illustrating the performance of the algorithm. △ Less

Submitted 8 September, 2009; v1 submitted 28 May, 2009; originally announced May 2009.

Comments: 13 pages, 4 tables

Report number: UCLA Computational and Applied Math. Technical Report 09-48

arXiv:0809.2274 [pdf, ps, other]

A randomized algorithm for principal component analysis

Authors: Vladimir Rokhlin, Arthur Szlam, Mark Tygert

Abstract: Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have… ▽ More Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples. △ Less

Submitted 5 July, 2009; v1 submitted 12 September, 2008; originally announced September 2008.

Comments: 26 pages, 6 tables, 1 figure; to appear in the SIAM Journal on Matrix Analysis and Applications

Report number: UCLA Computational and Applied Math Technical Report 08-60

Journal ref: A randomized algorithm for principal component analysis, SIAM Journal on Matrix Analysis and Applications, 31 (3): 1100-1124, 2009

arXiv:cs/0609081 [pdf, ps, other]

Recurrence relations and fast algorithms

Authors: Mark Tygert

Abstract: We construct fast algorithms for evaluating transforms associated with families of functions which satisfy recurrence relations. These include algorithms both for computing the coefficients in linear combinations of the functions, given the values of these linear combinations at certain points, and, vice versa, for evaluating such linear combinations at those points, given the coefficients in th… ▽ More We construct fast algorithms for evaluating transforms associated with families of functions which satisfy recurrence relations. These include algorithms both for computing the coefficients in linear combinations of the functions, given the values of these linear combinations at certain points, and, vice versa, for evaluating such linear combinations at those points, given the coefficients in the linear combinations; such procedures are also known as analysis and synthesis of series of certain special functions. The algorithms of the present paper are efficient in the sense that their computational costs are proportional to n (ln n) (ln(1/epsilon))^3, where n is the amount of input and output data, and epsilon is the precision of computations. Stated somewhat more precisely, we find a positive real number C such that, for any positive integer n > 10, the algorithms require at most C n (ln n) (ln(1/epsilon))^3 floating-point operations and words of memory to evaluate at n appropriately chosen points any linear combination of n special functions, given the coefficients in the linear combination, where epsilon is the precision of computations. △ Less

Submitted 14 September, 2006; originally announced September 2006.

Comments: 24 pages

ACM Class: F.2.1; G.1.2

Journal ref: Recurrence relations and fast algorithms, Applied and Computational Harmonic Analysis, 28 (1): 121-128, 2010

Showing 1–39 of 39 results for author: Tygert, M