Skip to main content

Showing 1–22 of 22 results for author: Sharan, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.19374  [pdf, ps, other

    stat.ML cs.LG

    Optimal Multiclass U-Calibration Error and Beyond

    Authors: Haipeng Luo, Spandan Senapati, Vatsal Sharan

    Abstract: We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously. Kleinberg et al. (2023) developed an algorithm with U-calibration error $O(K\sqrt{T})$ after $T$ rounds and raised the open question of what the… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2403.06925  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Simplicity Bias of Transformers to Learn Low Sensitivity Functions

    Authors: Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott Kau, Youqi Huang, Vatsal Sharan

    Abstract: Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive. Various neural network architectures such as fully connected networks have been found to have a simplicity bias towards simple functions of the data; one version of th… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 24 pages, 19 figures, 3 tables

  3. arXiv:2402.10360  [pdf, ps, other

    cs.LG cs.CC cs.DS cs.LO stat.ML

    Transductive Sample Complexities Are Compact

    Authors: Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng

    Abstract: We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $H$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$. We prove that this exact form of compactness holds for realizable and agnostic learning with respect to any proper m… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 18 pages

  4. arXiv:2310.06161  [pdf, other

    cs.LG stat.ML

    Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness

    Authors: Bhavya Vasudeva, Kameron Shahabi, Vatsal Sharan

    Abstract: Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative. Simplicity bias can lead to the model making biased predictions which have poor out-of-distribution (OOD) generalization. To address this, we propose a framework that encourages the model to use a more diverse set of… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 28 pages, 10 figures, 16 tables

  5. arXiv:2309.13692  [pdf, ps, other

    cs.LG stat.ML

    Regularization and Optimal Multiclass Learning

    Authors: Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng

    Abstract: The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken aw… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: COLT 2024; 43 pages

  6. arXiv:2203.15260  [pdf, other

    cs.LG cs.CC cs.DS math.OC stat.ML

    Efficient Convex Optimization Requires Superlinear Memory

    Authors: Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

    Abstract: We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1.25 - δ}$ bits of memory must make at least $\tildeΩ(d^{1 + (4/3)δ})$ first-order queries (for any constant $δ\in [0, 1/4]$). Consequently, the performance of such memory-constrained algorithms are a polyno… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 33 pages, 1 figure

  7. arXiv:2202.13576  [pdf, other

    cs.LG cs.IT stat.ML

    KL Divergence Estimation with Multi-group Attribution

    Authors: Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder

    Abstract: Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) f… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

    Comments: 20 pages, 4 figures

  8. arXiv:2111.03137  [pdf, other

    math.OC cs.DS cs.LG stat.ML

    Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

    Authors: Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

    Abstract: We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluatio… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 95 pages, 4 figures; authors are listed in alphabetical order

  9. arXiv:2109.05389  [pdf, other

    cs.LG stat.ML

    Omnipredictors

    Authors: Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

    Abstract: Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be kn… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: 35 pages, 1 figure

  10. arXiv:2103.15261  [pdf, other

    cs.LG cs.AI stat.ML

    One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

    Authors: Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang

    Abstract: Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning multiple tasks from a combined data set, for a vari… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

    Comments: 30 pages, 6 figures

  11. arXiv:2103.05853  [pdf, ps, other

    cs.LG stat.ML

    Multicalibrated Partitions for Importance Weights

    Authors: Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder

    Abstract: The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergen… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: 27 pages

  12. arXiv:1912.03582  [pdf, other

    cs.LG stat.ML

    PIDForest: Anomaly Detection via Partial Identification

    Authors: Parikshit Gopalan, Vatsal Sharan, Udi Wieder

    Abstract: We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum densit… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

  13. arXiv:1904.12053  [pdf, other

    cs.LG math.ST stat.ML

    Sample Amplification: Increasing Dataset Size even when Learning is Impossible

    Authors: Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

    Abstract: Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ $\text{amplification procedure}$ takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$ ``samples''. An amplifica… ▽ More

    Submitted 2 December, 2019; v1 submitted 26 April, 2019; originally announced April 2019.

    Comments: Added discussion about potential applications

  14. arXiv:1904.08544  [pdf, ps, other

    cs.LG stat.ML

    Memory-Sample Tradeoffs for Linear Regression with Small Error

    Authors: Vatsal Sharan, Aaron Sidford, Gregory Valiant

    Abstract: We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples $(a_1,b_1), (a_2,b_2)\ldots,$ with $a_i$ drawn independently from a $d$-dimensional isotro… ▽ More

    Submitted 10 October, 2020; v1 submitted 17 April, 2019; originally announced April 2019.

    Comments: A few minor edits over previous version

  15. arXiv:1811.06609  [pdf, other

    cs.LG stat.ML

    A Spectral View of Adversarially Robust Features

    Authors: Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant

    Abstract: Given the apparent difficulty of learning models that are robust to adversarial perturbations, we propose tackling the simpler problem of develo** adversarially robust features. Specifically, given a dataset and metric of interest, the goal is to return a function (or multiple functions) that 1) is robust to adversarial perturbations, and 2) has significant variation across the datapoints. We es… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

    Comments: To appear at NIPS 2018

  16. arXiv:1811.00148  [pdf, other

    cs.LG cs.DS stat.ML

    Recovery Guarantees for Quadratic Tensors with Sparse Observations

    Authors: Hongyang R. Zhang, Vatsal Sharan, Moses Charikar, Yingyu Liang

    Abstract: We consider the tensor completion problem of predicting the missing entries of a tensor. The commonly used CP model has a triple product form, but an alternate family of quadratic models, which are the sum of pairwise products instead of a triple product, have emerged from applications such as recommendation systems. Non-convex methods are the method of choice for learning quadratic models, and th… ▽ More

    Submitted 29 July, 2023; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: 16 pages. Appeared in AISTATS 2019

  17. arXiv:1804.03065  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Anomaly Detection via Matrix Sketching

    Authors: Vatsal Sharan, Parikshit Gopalan, Udi Wieder

    Abstract: We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results sho… ▽ More

    Submitted 27 November, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Updates for NeurIPS'18 camera-ready

  18. arXiv:1711.02309  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Overcomplete HMMs

    Authors: Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

    Abstract: We study the problem of learning overcomplete HMMs---those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results---both positive and negative---which help define the boundaries between the tractable and… ▽ More

    Submitted 27 June, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: Added acknowledgements

  19. arXiv:1711.02305  [pdf, other

    cs.LG cs.DS stat.ML

    Sketching Linear Classifiers over Data Streams

    Authors: Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

    Abstract: We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and stream… ▽ More

    Submitted 6 April, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: Full version of paper appearing at SIGMOD 2018 with more detailed proofs of theoretical results. Code available at https://github.com/stanford-futuredata/wmsketch

  20. arXiv:1706.08146  [pdf, other

    cs.LG cs.AI stat.ML

    Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data

    Authors: Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant

    Abstract: What learning algorithms can be run directly on compressively-sensed data? In this work, we consider the question of accurately and efficiently computing low-rank matrix or tensor factorizations given data compressed via random projections. We examine the approach of first performing factorization in the compressed domain, and then reconstructing the original high-dimensional factors from the reco… ▽ More

    Submitted 27 May, 2019; v1 submitted 25 June, 2017; originally announced June 2017.

    Comments: Updates for ICML'19 camera-ready

  21. arXiv:1703.01804  [pdf, other

    cs.LG stat.ML

    Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use

    Authors: Vatsal Sharan, Gregory Valiant

    Abstract: The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform. We propose a modification of the ALS approach that is as efficient as standard ALS, but provably recovers the true factors with random initialization under standard incoherence ass… ▽ More

    Submitted 23 September, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

    Comments: Minor updates to presentation. Appears in ICML'17

  22. arXiv:1612.02526  [pdf, other

    cs.LG cs.AI cs.CC stat.ML

    Prediction with a Short Memory

    Authors: Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

    Abstract: We consider the problem of predicting the next observation given a sequence of past observations, and consider the extent to which accurate prediction requires complex algorithms that explicitly leverage long-range dependencies. Perhaps surprisingly, our positive results show that for a broad class of sequences, there is an algorithm that predicts well on average, and bases its predictions only on… ▽ More

    Submitted 27 June, 2018; v1 submitted 7 December, 2016; originally announced December 2016.

    Comments: Updates for STOC camera ready