Skip to main content

Showing 1–19 of 19 results for author: Cotter, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2107.10960  [pdf, other

    cs.LG stat.ML

    Implicit Rate-Constrained Optimization of Non-decomposable Objectives

    Authors: Abhishek Kumar, Harikrishna Narasimhan, Andrew Cotter

    Abstract: We consider a popular family of constrained optimization problems arising in machine learning that involve optimizing a non-decomposable evaluation metric with a certain thresholded form, while constraining another metric of interest. Examples of such problems include optimizing the false negative rate at a fixed false positive rate, optimizing precision at a fixed recall, optimizing the area unde… ▽ More

    Submitted 28 July, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: ICML 2021; Code available at https://github.com/google-research/google-research/tree/master/implicit_constrained_optimization

  2. arXiv:2106.02654  [pdf, other

    cs.LG cs.AI stat.ML

    Churn Reduction via Distillation

    Authors: Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh

    Abstract: In real-world systems, models are frequently updated as more data becomes available, and in addition to achieving high accuracy, the goal is to also maintain a low difference in predictions compared to the base model (i.e. predictive "churn"). If model retraining results in vastly different behavior, then it could cause negative effects in downstream systems, especially if this churn can be avoide… ▽ More

    Submitted 14 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Journal ref: ICLR 2022

  3. arXiv:2102.06849  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Double Descent

    Authors: Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

    Abstract: Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset. The most common explanations for why distillation "works" are predicated on the assumption that student is provided with \emph{soft} labels, \eg probabilities or confidences, from the teacher model. In this work, we show, that,… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  4. arXiv:2002.09343  [pdf, ps, other

    cs.LG stat.ML

    Robust Optimization for Fairness with Noisy Protected Groups

    Authors: Serena Wang, Wenshuo Guo, Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Michael I. Jordan

    Abstract: Many existing fairness criteria for machine learning involve equalizing some metric across protected groups such as race or gender. However, practitioners trying to audit or enforce such group-based criteria can easily face the problem of noisy or biased protected group information. First, we study the consequences of naively relying on noisy protected group labels: we provide an upper bound on th… ▽ More

    Submitted 10 November, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: To appear at 34th Conference on Neural Information Processing Systems (NeurIPS 2020); first two authors contributed equally to this work

  5. arXiv:1909.02939  [pdf, other

    cs.LG cs.GT stat.ML

    Optimizing Generalized Rate Metrics through Game Equilibrium

    Authors: Harikrishna Narasimhan, Andrew Cotter, Maya Gupta

    Abstract: We present a general framework for solving a large class of learning problems with non-linear functions of classification rates. This includes problems where one wishes to optimize a non-decomposable performance metric such as the F-measure or G-mean, and constrained training problems where the classifier needs to satisfy non-linear rate constraints such as predictive parity fairness, distribution… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

  6. arXiv:1906.05330  [pdf, other

    cs.LG stat.ML

    Pairwise Fairness for Ranking and Regression

    Authors: Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Serena Wang

    Abstract: We present pairwise fairness metrics for ranking models and regression models that form analogues of statistical fairness notions such as equal opportunity, equal accuracy, and statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using existin… ▽ More

    Submitted 7 January, 2020; v1 submitted 12 June, 2019; originally announced June 2019.

  7. arXiv:1809.04198  [pdf, other

    cs.LG cs.AI cs.GT math.OC stat.ML

    Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals

    Authors: Andrew Cotter, Heinrich Jiang, Serena Wang, Taman Narayan, Maya Gupta, Seungil You, Karthik Sridharan

    Abstract: We show that many machine learning goals, such as improved fairness metrics, can be expressed as constraints on the model's predictions, which we call rate constraints. We study the problem of training non-convex models subject to these rate constraints (or any non-convex and non-differentiable constraints). In the non-convex setting, the standard approach of Lagrange multipliers may fail. Further… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

  8. arXiv:1807.00028  [pdf, other

    cs.LG stat.ML

    Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

    Authors: Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

    Abstract: Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals. We study the generalization performance for such constrained optimization problems, in terms of how well the constraints are satisfied at evaluation time, given that they are satisfied at training time. To improve generalization performa… ▽ More

    Submitted 28 September, 2018; v1 submitted 29 June, 2018; originally announced July 2018.

  9. arXiv:1806.11212  [pdf, other

    cs.LG stat.ML

    Proxy Fairness

    Authors: Maya Gupta, Andrew Cotter, Mahdi Milani Fard, Serena Wang

    Abstract: We consider the problem of improving fairness when one lacks access to a dataset labeled with protected groups, making it difficult to take advantage of strategies that can improve fairness but require protected group labels, either at training or runtime. To address this, we investigate improving fairness metrics for proxy groups, and test whether doing so results in improved fairness for the tru… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.

  10. arXiv:1806.00050  [pdf, other

    cs.LG cs.AI stat.ML

    Interpretable Set Functions

    Authors: Andrew Cotter, Maya Gupta, Heinrich Jiang, James Muller, Taman Narayan, Serena Wang, Tao Zhu

    Abstract: We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label. We use a deep lattice network model so we can architect the model structure to enhance interpretability, and add monotonicity constraints between inputs-and-outputs. We then use the proposed set function to automate the engineering of dense, int… ▽ More

    Submitted 31 May, 2018; originally announced June 2018.

  11. arXiv:1804.06500  [pdf, ps, other

    cs.LG cs.GT math.OC stat.ML

    Two-Player Games for Efficient Non-Convex Constrained Optimization

    Authors: Andrew Cotter, Heinrich Jiang, Karthik Sridharan

    Abstract: In recent years, constrained optimization has become increasingly relevant to the machine learning community, with applications including Neyman-Pearson classification, robust optimization, and fair machine learning. A natural approach to constrained optimization is to optimize the Lagrangian, but this is not guaranteed to work in the non-convex setting, and, if using a first-order method, cannot… ▽ More

    Submitted 28 September, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

  12. arXiv:1606.07558  [pdf, ps, other

    cs.LG

    Satisfying Real-world Goals with Dataset Constraints

    Authors: Gabriel Goh, Andrew Cotter, Maya Gupta, Michael Friedlander

    Abstract: The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a pr… ▽ More

    Submitted 3 May, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

  13. arXiv:1512.04960  [pdf, ps, other

    cs.LG

    A Light Touch for Heavily Constrained SGD

    Authors: Andrew Cotter, Maya Gupta, Jan Pfeifer

    Abstract: Minimizing empirical risk subject to a set of constraints can be a useful strategy for learning restricted classes of functions, such as monotonic functions, submodular functions, classifiers that guarantee a certain class label for some subset of examples, etc. However, these restrictions may result in a very large number of constraints. Projected stochastic gradient descent (SGD) is often the de… ▽ More

    Submitted 24 October, 2016; v1 submitted 15 December, 2015; originally announced December 2015.

    Journal ref: 29th Annual Conference on Learning Theory, pp. 729-771, 2016

  14. arXiv:1505.06378  [pdf, other

    cs.LG

    Monotonic Calibrated Interpolated Look-Up Tables

    Authors: Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojtek Moczydlowski, Alex van Esbroeck

    Abstract: Real-world machine learning applications may require functions that are fast-to-evaluate and interpretable. In particular, guaranteed monotonicity of the learned function can be critical to user trust. We propose meeting these goals for low-dimensional machine learning problems by learning flexible, monotonic functions using calibrated interpolated look-up tables. We extend the structural risk min… ▽ More

    Submitted 20 January, 2016; v1 submitted 23 May, 2015; originally announced May 2015.

    Comments: To appear (with minor revisions), Journal Machine Learning Research 2016

  15. arXiv:1308.3509  [pdf, other

    cs.LG

    Stochastic Optimization for Machine Learning

    Authors: Andrew Cotter

    Abstract: It has been found that stochastic algorithms often find good solutions much more rapidly than inherently-batch approaches. Indeed, a very useful rule of thumb is that often, when solving a machine learning problem, an iterative technique which relies on performing a very large number of relatively-inexpensive updates will often outperform one which performs a smaller number of much "smarter" but c… ▽ More

    Submitted 15 August, 2013; originally announced August 2013.

    Comments: PhD Thesis

  16. arXiv:1307.1674  [pdf, other

    stat.ML cs.LG

    Stochastic Optimization of PCA with Capped MSG

    Authors: Raman Arora, Andrew Cotter, Nathan Srebro

    Abstract: We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as "Matrix Stochastic Gradient" (MSG), as well as a practical variant, Capped MSG. We study the method both theoretically and empirically.

    Submitted 5 July, 2013; originally announced July 2013.

  17. arXiv:1204.0566  [pdf, ps, other

    cs.LG

    The Kernelized Stochastic Batch Perceptron

    Authors: Andrew Cotter, Shai Shalev-Shwartz, Nathan Srebro

    Abstract: We present a novel approach for training kernel Support Vector Machines, establish learning runtime guarantees for our method that are better then those of any other known kernelized SVM optimization approach, and show that our method works well in practice compared to existing alternatives.

    Submitted 21 June, 2012; v1 submitted 2 April, 2012; originally announced April 2012.

  18. arXiv:1109.4603  [pdf, other

    cs.AI

    Explicit Approximations of the Gaussian Kernel

    Authors: Andrew Cotter, Joseph Keshet, Nathan Srebro

    Abstract: We investigate training and using Gaussian kernel SVMs by approximating the kernel with an explicit finite- dimensional polynomial feature representation based on the Taylor expansion of the exponential. Although not as efficient as the recently-proposed random Fourier features [Rahimi and Recht, 2007] in terms of the number of features, we show how this polynomial representation can provide a bet… ▽ More

    Submitted 21 September, 2011; originally announced September 2011.

    Comments: 11 pages, 2 tables, 2 figures

  19. arXiv:1106.4574  [pdf, other

    cs.LG

    Better Mini-Batch Algorithms via Accelerated Gradient Methods

    Authors: Andrew Cotter, Ohad Shamir, Nathan Srebro, Karthik Sridharan

    Abstract: Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and propose a novel accelerated gradient algorithm, which deals with this deficien… ▽ More

    Submitted 22 June, 2011; originally announced June 2011.