Skip to main content

Showing 1–19 of 19 results for author: Balsubramani, A

.
  1. arXiv:2012.07421  [pdf, other

    cs.LG

    WILDS: A Benchmark of in-the-Wild Distribution Shifts

    Authors: Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang

    Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchma… ▽ More

    Submitted 16 July, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

  2. arXiv:2011.01343  [pdf, ps, other

    math.ST stat.ME stat.ML

    p-value peeking and estimating extrema

    Authors: Akshay Balsubramani

    Abstract: A pervasive issue in statistical hypothesis testing is that the reported $p$-values are biased downward by data "peeking" -- the practice of reporting only progressively extreme values of the test statistic as more data samples are collected. We develop principled mechanisms to estimate such running extrema of test statistics, which directly address the effect of peeking in some general scenarios.

    Submitted 2 November, 2020; originally announced November 2020.

  3. arXiv:2008.13293  [pdf, ps, other

    cs.LG cs.IT math.PR stat.ML

    Sharp finite-sample concentration of independent variables

    Authors: Akshay Balsubramani

    Abstract: We show an extension of Sanov's theorem on large deviations, controlling the tail probabilities of i.i.d. random variables with matching concentration and anti-concentration bounds. This result has a general scope, applies to samples of any size, and has a short information-theoretic proof using elementary techniques.

    Submitted 8 October, 2021; v1 submitted 30 August, 2020; originally announced August 2020.

  4. arXiv:1909.13203  [pdf, other

    cs.LG stat.ML

    Learning transport cost from subset correspondence

    Authors: Ruishan Liu, Akshay Balsubramani, James Zou

    Abstract: Learning to align multiple datasets is an important problem with many applications, and it is especially useful when we need to integrate multiple experiments or correct for confounding. Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related.… ▽ More

    Submitted 30 July, 2021; v1 submitted 29 September, 2019; originally announced September 2019.

  5. arXiv:1905.12717  [pdf, other

    cs.LG cs.AI stat.ML

    An adaptive nearest neighbor rule for classification

    Authors: Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund, Shay Moran

    Abstract: We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of points in noisy regions.) We provide theory and… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  6. arXiv:1709.01509  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Linking Generative Adversarial Learning and Binary Classification

    Authors: Akshay Balsubramani

    Abstract: In this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator lo… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  7. arXiv:1705.07904  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Semantically Decomposing the Latent Spaces of Generative Adversarial Networks

    Authors: Chris Donahue, Zachary C. Lipton, Akshay Balsubramani, Julian McAuley

    Abstract: We propose a new algorithm for training generative adversarial networks that jointly learns latent codes for both identities (e.g. individual humans) and observations (e.g. specific photographs). By fixing the identity portion of the latent codes, we can generate diverse images of the same subject, and by fixing the observation portion, we can traverse the manifold of subjects while maintaining co… ▽ More

    Submitted 22 February, 2018; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Published as a conference paper at ICLR 2018

  8. arXiv:1611.02268  [pdf, other

    cs.LG cs.AI stat.ML

    Optimal Binary Autoencoding with Pairwise Correlations

    Authors: Akshay Balsubramani

    Abstract: We formulate learning of a binary autoencoder as a biconvex optimization problem which learns from the pairwise correlations between encoded and decoded bits. Among all possible algorithms that use this information, ours finds the autoencoder that reconstructs its inputs with worst-case optimal loss. The optimal decoder is a single layer of artificial neurons, emerging entirely from the minimax lo… ▽ More

    Submitted 7 November, 2016; originally announced November 2016.

  9. arXiv:1605.08833  [pdf, other

    cs.LG stat.ML

    Muffled Semi-Supervised Learning

    Authors: Akshay Balsubramani, Yoav Freund

    Abstract: We explore a novel approach to semi-supervised learning. This approach is contrary to the common approach in that the unlabeled examples serve to "muffle," rather than enhance, the guidance provided by the labeled examples. We provide several variants of the basic algorithm and show experimentally that they can achieve significantly higher AUC than boosted trees, random forests and logistic regres… ▽ More

    Submitted 27 May, 2016; originally announced May 2016.

  10. arXiv:1602.08151  [pdf, other

    cs.LG stat.ML

    Learning to Abstain from Binary Prediction

    Authors: Akshay Balsubramani

    Abstract: A binary classifier capable of abstaining from making a label prediction has two goals in tension: minimizing errors, and avoiding abstaining unnecessarily often. In this work, we exactly characterize the best achievable tradeoff between these two goals in a general semi-supervised setting, given an ensemble of predictors of varying competence as well as unlabeled data on which we wish to predict… ▽ More

    Submitted 29 November, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

  11. arXiv:1512.08133  [pdf, other

    cs.LG

    The Utility of Abstaining in Binary Classification

    Authors: Akshay Balsubramani

    Abstract: We explore the problem of binary classification in machine learning, with a twist - the classifier is allowed to abstain on any datum, professing ignorance about the true class label without committing to any prediction. This is directly motivated by applications like medical diagnosis and fraud risk assessment, in which incorrect predictions have potentially calamitous consequences. We focus on a… ▽ More

    Submitted 26 December, 2015; originally announced December 2015.

    Comments: Short survey

  12. arXiv:1510.00452  [pdf, other

    cs.LG stat.ML

    Optimal Binary Classifier Aggregation for General Losses

    Authors: Akshay Balsubramani, Yoav Freund

    Abstract: We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification erro… ▽ More

    Submitted 7 November, 2016; v1 submitted 1 October, 2015; originally announced October 2015.

    Comments: NIPS 2016

  13. arXiv:1506.06573  [pdf, ps, other

    cs.LG math.PR stat.ML

    PAC-Bayes Iterated Logarithm Bounds for Martingale Mixtures

    Authors: Akshay Balsubramani

    Abstract: We give tight concentration bounds for mixtures of martingales that are simultaneously uniform over (a) mixture distributions, in a PAC-Bayes sense; and (b) all finite times. These bounds are proved in terms of the martingale variance, extending classical Bernstein inequalities, and sharpening and simplifying prior work.

    Submitted 22 June, 2015; originally announced June 2015.

  14. arXiv:1506.05790  [pdf, other

    cs.LG

    Scalable Semi-Supervised Aggregation of Classifiers

    Authors: Akshay Balsubramani, Yoav Freund

    Abstract: We present and empirically evaluate an efficient algorithm that learns to aggregate the predictions of an ensemble of binary classifiers. The algorithm uses the structure of the ensemble predictions on unlabeled data to yield significant performance improvements. It does this without making assumptions on the structure or origin of the ensemble, without parameters, and as scalably as linear learni… ▽ More

    Submitted 10 November, 2015; v1 submitted 18 June, 2015; originally announced June 2015.

  15. arXiv:1506.03486  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Sequential Nonparametric Testing with the Law of the Iterated Logarithm

    Authors: Akshay Balsubramani, Aaditya Ramdas

    Abstract: We propose a new algorithmic framework for sequential hypothesis testing with i.i.d. data, which includes A/B testing, nonparametric two-sample testing, and independence testing as special cases. It is novel in several ways: (a) it takes linear time and constant space to compute on the fly, (b) it has the same power guarantee as a non-sequential version of the test with the same computational cons… ▽ More

    Submitted 1 March, 2016; v1 submitted 10 June, 2015; originally announced June 2015.

  16. arXiv:1503.01811  [pdf, other

    cs.LG stat.ML

    Optimally Combining Classifiers Using Unlabeled Data

    Authors: Akshay Balsubramani, Yoav Freund

    Abstract: We develop a worst-case analysis of aggregation of classifier ensembles for binary classification. The task of predicting to minimize error is formulated as a game played over a given set of unlabeled data (a transductive setting), where prior label information is encoded as constraints on the game. The minimax solution of this game identifies cases where a weighted combination of the classifiers… ▽ More

    Submitted 18 June, 2015; v1 submitted 5 March, 2015; originally announced March 2015.

  17. arXiv:1501.03838  [pdf, other

    cs.LG stat.ML

    PAC-Bayes with Minimax for Confidence-Rated Transduction

    Authors: Akshay Balsubramani, Yoav Freund

    Abstract: We consider using an ensemble of binary classifiers for transductive prediction, when unlabeled test data are known in advance. We derive minimax optimal rules for confidence-rated prediction in this setting. By using PAC-Bayes analysis on these rules, we obtain data-dependent performance guarantees without distributional assumptions on the data. Our analysis techniques are readily extended to a s… ▽ More

    Submitted 15 January, 2015; originally announced January 2015.

  18. arXiv:1501.03796  [pdf, ps, other

    cs.LG stat.ML

    The Fast Convergence of Incremental PCA

    Authors: Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund

    Abstract: We consider a situation in which we see samples in $\mathbb{R}^d$ drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives. Two classical such schem… ▽ More

    Submitted 15 January, 2015; originally announced January 2015.

    Comments: NIPS 2013

  19. arXiv:1405.2639  [pdf, ps, other

    math.PR cs.LG stat.ML

    Sharp Finite-Time Iterated-Logarithm Martingale Concentration

    Authors: Akshay Balsubramani

    Abstract: We give concentration bounds for martingales that are uniform over finite times and extend classical Hoeffding and Bernstein inequalities. We also demonstrate our concentration bounds to be optimal with a matching anti-concentration inequality, proved using the same method. Together these constitute a finite-time version of the law of the iterated logarithm, and shed light on the relationship betw… ▽ More

    Submitted 1 December, 2015; v1 submitted 12 May, 2014; originally announced May 2014.

    Comments: 25 pages

    MSC Class: 60E15; 60G17 (Primary); 60G40; 60G42; 60G44 (Secondary)