Skip to main content

Showing 1–14 of 14 results for author: Wieder, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.08649  [pdf, other

    cs.LG

    Loss Minimization through the Lens of Outcome Indistinguishability

    Authors: Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, Udi Wieder

    Abstract: We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic t… ▽ More

    Submitted 8 December, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

  2. arXiv:2202.13576  [pdf, other

    cs.LG cs.IT stat.ML

    KL Divergence Estimation with Multi-group Attribution

    Authors: Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder

    Abstract: Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) f… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

    Comments: 20 pages, 4 figures

  3. arXiv:2109.05389  [pdf, other

    cs.LG stat.ML

    Omnipredictors

    Authors: Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

    Abstract: Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be kn… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: 35 pages, 1 figure

  4. arXiv:2103.05853  [pdf, ps, other

    cs.LG stat.ML

    Multicalibrated Partitions for Importance Weights

    Authors: Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder

    Abstract: The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergen… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: 27 pages

  5. arXiv:2006.12018  [pdf, other

    cs.CR cs.DB

    Overlook: Differentially Private Exploratory Visualization for Big Data

    Authors: Pratiksha Thaker, Mihai Budiu, Parikshit Gopalan, Udi Wieder, Matei Zaharia

    Abstract: Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a s… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  6. arXiv:1912.03582  [pdf, other

    cs.LG stat.ML

    PIDForest: Anomaly Detection via Partial Identification

    Authors: Parikshit Gopalan, Vatsal Sharan, Udi Wieder

    Abstract: We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum densit… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

  7. arXiv:1911.07378  [pdf, ps, other

    cs.DS cs.CC math.PR

    Finding Skewed Subcubes Under a Distribution

    Authors: Parikshit Gopalan, Roie Levin, Udi Wieder

    Abstract: Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcu… ▽ More

    Submitted 12 November, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

  8. Hillview: A trillion-cell spreadsheet for big data

    Authors: Mihai Budiu, Parikshit Gopalan, Lalith Suresh, Udi Wieder, Han Kruiger, Marcos K. Aguilera

    Abstract: Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketche… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

  9. arXiv:1806.02004  [pdf, ps, other

    cs.DS

    Another Proof of Cuckoo hashing with New Variants

    Authors: Udi Wieder

    Abstract: We show a new proof for the load of obtained by a Cuckoo Hashing data structure. Our proof is arguably simpler than previous proofs and allows for new generalizations. The proof first appeared in Pinkas et. al. \cite{PSWW19} in the context of a protocol for private set intersection. We present it here separately to improve its readability.

    Submitted 6 June, 2018; originally announced June 2018.

  10. arXiv:1804.03065  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Anomaly Detection via Matrix Sketching

    Authors: Vatsal Sharan, Parikshit Gopalan, Udi Wieder

    Abstract: We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results sho… ▽ More

    Submitted 27 November, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: Updates for NeurIPS'18 camera-ready

  11. arXiv:1404.3768  [pdf, ps, other

    cs.DS

    Changing Bases: Multistage Optimization for Matroids and Matchings

    Authors: Anupam Gupta, Kunal Talwar, Udi Wieder

    Abstract: This paper is motivated by the fact that many systems need to be maintained continually while the underlying costs change over time. The challenge is to continually maintain near-optimal solutions to the underlying optimization problems, without creating too much churn in the solution itself. We model this as a multistage combinatorial optimization problem where the input is a sequence of cost fun… ▽ More

    Submitted 14 April, 2014; originally announced April 2014.

  12. arXiv:1310.5367  [pdf, other

    cs.DM cs.DS

    Balanced Allocations: A Simple Proof for the Heavily Loaded Case

    Authors: Kunal Talwar, Udi Wieder

    Abstract: We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by $(1+o(1))\log \log n$, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. Their proof uses heavy machinery from Markov-Chain theory and some of the calculations are done using computers. In this manuscript we pro… ▽ More

    Submitted 20 October, 2013; originally announced October 2013.

  13. arXiv:1304.1188  [pdf, ps, other

    cs.DS

    How to Approximate A Set Without Knowing Its Size In Advance

    Authors: Rasmus Pagh, Gil Segev, Udi Wieder

    Abstract: The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most epsilon. That is, the membership algorithm must be correct on each x in S, and may err with probability at most epsilon on each x not in S. We study a well-motivated, yet ins… ▽ More

    Submitted 11 April, 2013; v1 submitted 3 April, 2013; originally announced April 2013.

    Comments: Clarified a point in the lower bound proof

  14. arXiv:1005.0418  [pdf, ps, other

    cs.DS cs.CG

    Lower Bounds on Near Neighbor Search via Metric Expansion

    Authors: Rina Panigrahy, Kunal Talwar, Udi Wieder

    Abstract: In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and d… ▽ More

    Submitted 3 May, 2010; originally announced May 2010.

    Comments: 29 pages

    ACM Class: F.2.2; E.1