Search | arXiv e-print repository

Loss Minimization through the Lens of Outcome Indistinguishability

Authors: Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, Udi Wieder

Abstract: We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic t… ▽ More We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic template to learn predictors satisfying a guarantee we call Loss Outcome Indistinguishability. For a set of statistical tests--based on a collection of losses and hypothesis class--a predictor is Loss OI if it is indistinguishable (according to the tests) from Nature's true probabilities over outcomes. By design, Loss OI implies omniprediction in a direct and intuitive manner. We simplify Loss OI further, decomposing it into a calibration condition plus multiaccuracy for a class of functions derived from the loss and hypothesis classes. By careful analysis of this class, we give efficient constructions of omnipredictors for interesting classes of loss functions, including non-convex losses. This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration. We show that calibrated multiaccuracy implies Loss OI for the important set of convex losses arising from Generalized Linear Models, without requiring full multicalibration. For such losses, we show an equivalence between our computational notion of Loss OI and a geometric notion of indistinguishability, formulated as Pythagorean theorems in the associated Bregman divergence. We give an efficient algorithm for calibrated multiaccuracy with computational complexity comparable to that of multiaccuracy. In all, calibrated multiaccuracy offers an interesting tradeoff point between efficiency and generality in the omniprediction landscape. △ Less

Submitted 8 December, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

arXiv:2202.13576 [pdf, other]

KL Divergence Estimation with Multi-group Attribution

Authors: Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder

Abstract: Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) f… ▽ More Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) family $\mathcal{C}$ of overlap** subsets of the domain. We propose the notion of multi-group attribution for $\mathcal{C}$, which requires that the estimated divergence conditioned on every sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness desiderata, such as ensuring that sub-populations where the model predicts significant divergence do diverge significantly in the two distributions. Our main technical contribution is to show that multi-group attribution can be derived from the recently introduced notion of multi-calibration for importance weights [HKRR18, GRSW21]. We provide experimental evidence to support our theoretical results, and show that multi-group attribution provides better KL divergence estimates when conditioned on sub-populations than other popular algorithms. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 20 pages, 4 figures

arXiv:2109.05389 [pdf, other]

Omnipredictors

Authors: Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

Abstract: Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be kn… ▽ More Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be known at the time of learning, requiring the algorithm to use a best-guess loss function. We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action. We introduce the notion of an (${\mathcal{L}},\mathcal{C}$)-omnipredictor, which could be used to optimize any loss in a family ${\mathcal{L}}$. Once the loss function is set, the outputs of the predictor can be post-processed (a simple univariate data-independent transformation of individual predictions) to do well compared with any hypothesis from the class $\mathcal{C}$. The post processing is essentially what one would perform if the outputs of the predictor were true probabilities of the uncertain events. In a sense, omnipredictors extract all the predictive power from the class $\mathcal{C}$, irrespective of the loss function in $\mathcal{L}$. We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration, a notion introduced in the context of algorithmic fairness. In addition, we show how multicalibration can be viewed as a solution concept for agnostic boosting, shedding new light on past results. Finally, we transfer our insights back to the context of algorithmic fairness by providing omnipredictors for multi-group loss minimization. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Comments: 35 pages, 1 figure

arXiv:2103.05853 [pdf, ps, other]

Multicalibrated Partitions for Importance Weights

Authors: Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder

Abstract: The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergen… ▽ More The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergence. We consider the common setting where $R$ and $P$ are only given through samples from each distribution. The vast literature on estimating importance weights is either heuristic, or makes strong assumptions about $R$ and $P$ or on the importance weights themselves. In this paper, we explore a computational perspective to the estimation of importance weights, which factors in the limitations and possibilities obtainable with bounded computational resources. We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C \in \mathcal{C}$, where $\mathcal{C}$ may be a huge collection of sets. We show that the MaxEntropy approach may fail to assign high average scores to sets $C \in \mathcal{C}$, even when the average of ground truth weights for the set is evidently large. We similarly show that it may overestimate the average scores to sets $C \in \mathcal{C}$. We therefore formulate Sandwiching bounds as a notion of set-wise accuracy for importance weights. We study these bounds to show that they capture natural completeness and soundness requirements from the weights. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds. Our techniques rely on a new notion of multicalibrated partitions of the domain of the distributions, which appear to be useful objects in their own right. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 27 pages

arXiv:2006.12018 [pdf, other]

Overlook: Differentially Private Exploratory Visualization for Big Data

Authors: Pratiksha Thaker, Mihai Budiu, Parikshit Gopalan, Udi Wieder, Matei Zaharia

Abstract: Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a s… ▽ More Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store. We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. The key idea in Overlook is a virtual synopsis that can be evaluated incrementally, without extra space storage or expensive precomputation. Overlook simply executes queries using an existing engine, such as a SQL DBMS, and adds noise to their results. Because Overlook's synopses do not require costly precomputation or storage, data curators can also use Overlook to explore the impact of privacy parameters interactively. Overlook offers a rich visual query interface based on the open source Hillview system. Overlook achieves accuracy comparable to existing synopsis-based systems, while offering better performance and removing the need for extra storage. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:1912.03582 [pdf, other]

PIDForest: Anomaly Detection via Partial Identification

Authors: Parikshit Gopalan, Vatsal Sharan, Udi Wieder

Abstract: We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum densit… ▽ More We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition. We show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset. △ Less

Submitted 7 December, 2019; originally announced December 2019.

arXiv:1911.07378 [pdf, ps, other]

Finding Skewed Subcubes Under a Distribution

Authors: Parikshit Gopalan, Roie Levin, Udi Wieder

Abstract: Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcu… ▽ More Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in $ψ$. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to $ψ$ where finding skewed subcubes might be easier. △ Less

Submitted 12 November, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

arXiv:1907.04827 [pdf, other]

doi 10.14778/3342263.3342279

Hillview: A trillion-cell spreadsheet for big data

Authors: Mihai Budiu, Parikshit Gopalan, Lalith Suresh, Udi Wieder, Han Kruiger, Marcos K. Aguilera

Abstract: Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketche… ▽ More Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems. △ Less

Submitted 10 July, 2019; originally announced July 2019.

arXiv:1806.02004 [pdf, ps, other]

Another Proof of Cuckoo hashing with New Variants

Authors: Udi Wieder

Abstract: We show a new proof for the load of obtained by a Cuckoo Hashing data structure. Our proof is arguably simpler than previous proofs and allows for new generalizations. The proof first appeared in Pinkas et. al. \cite{PSWW19} in the context of a protocol for private set intersection. We present it here separately to improve its readability. We show a new proof for the load of obtained by a Cuckoo Hashing data structure. Our proof is arguably simpler than previous proofs and allows for new generalizations. The proof first appeared in Pinkas et. al. \cite{PSWW19} in the context of a protocol for private set intersection. We present it here separately to improve its readability. △ Less

Submitted 6 June, 2018; originally announced June 2018.

arXiv:1804.03065 [pdf, other]

Efficient Anomaly Detection via Matrix Sketching

Authors: Vatsal Sharan, Parikshit Gopalan, Udi Wieder

Abstract: We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results sho… ▽ More We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results showing that \emph{any} sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation inequalities for operators arising in the computation of these measures. △ Less

Submitted 27 November, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: Updates for NeurIPS'18 camera-ready

arXiv:1404.3768 [pdf, ps, other]

Changing Bases: Multistage Optimization for Matroids and Matchings

Authors: Anupam Gupta, Kunal Talwar, Udi Wieder

Abstract: This paper is motivated by the fact that many systems need to be maintained continually while the underlying costs change over time. The challenge is to continually maintain near-optimal solutions to the underlying optimization problems, without creating too much churn in the solution itself. We model this as a multistage combinatorial optimization problem where the input is a sequence of cost fun… ▽ More This paper is motivated by the fact that many systems need to be maintained continually while the underlying costs change over time. The challenge is to continually maintain near-optimal solutions to the underlying optimization problems, without creating too much churn in the solution itself. We model this as a multistage combinatorial optimization problem where the input is a sequence of cost functions (one for each time step); while we can change the solution from step to step, we incur an additional cost for every such change. We study the multistage matroid maintenance problem, where we need to maintain a base of a matroid in each time step under the changing cost functions and acquisition costs for adding new elements. The online version of this problem generalizes online paging. E.g., given a graph, we need to maintain a spanning tree $T_t$ at each step: we pay $c_t(T_t)$ for the cost of the tree at time $t$, and also $| T_t\setminus T_{t-1} |$ for the number of edges changed at this step. Our main result is an $O(\log m \log r)$-approximation, where $m$ is the number of elements/edges and $r$ is the rank of the matroid. We also give an $O(\log m)$ approximation for the offline version of the problem. These bounds hold when the acquisition costs are non-uniform, in which caseboth these results are the best possible unless P=NP. We also study the perfect matching version of the problem, where we must maintain a perfect matching at each step under changing cost functions and costs for adding new elements. Surprisingly, the hardness drastically increases: for any constant $ε>0$, there is no $O(n^{1-ε})$-approximation to the multistage matching maintenance problem, even in the offline case. △ Less

Submitted 14 April, 2014; originally announced April 2014.

arXiv:1310.5367 [pdf, other]

Balanced Allocations: A Simple Proof for the Heavily Loaded Case

Authors: Kunal Talwar, Udi Wieder

Abstract: We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by $(1+o(1))\log \log n$, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. Their proof uses heavy machinery from Markov-Chain theory and some of the calculations are done using computers. In this manuscript we pro… ▽ More We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by $(1+o(1))\log \log n$, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. Their proof uses heavy machinery from Markov-Chain theory and some of the calculations are done using computers. In this manuscript we provide a significantly simpler proof that is not aided by computers and is self contained. The simplification comes at a cost of weaker bounds on the low order terms and a weaker tail bound for the probability of deviating from the expectation. △ Less

Submitted 20 October, 2013; originally announced October 2013.

arXiv:1304.1188 [pdf, ps, other]

How to Approximate A Set Without Knowing Its Size In Advance

Authors: Rasmus Pagh, Gil Segev, Udi Wieder

Abstract: The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most epsilon. That is, the membership algorithm must be correct on each x in S, and may err with probability at most epsilon on each x not in S. We study a well-motivated, yet ins… ▽ More The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most epsilon. That is, the membership algorithm must be correct on each x in S, and may err with probability at most epsilon on each x not in S. We study a well-motivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: - We show a super-linear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. - We show that our space lower bound is tight, and can even be matched by a highly efficient data structure. △ Less

Submitted 11 April, 2013; v1 submitted 3 April, 2013; originally announced April 2013.

Comments: Clarified a point in the lower bound proof

arXiv:1005.0418 [pdf, ps, other]

Lower Bounds on Near Neighbor Search via Metric Expansion

Authors: Rina Panigrahy, Kunal Talwar, Udi Wieder

Abstract: In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and d… ▽ More In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion $Φ$ then we show that any deterministic $t$-probe data structure for $n$ points must use space $S$ where $(St/n)^t > Φ$. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as $l_1$, $l_2$, $l_\infty$ by simply computing their expansion. In the process, we strengthen and generalize our previous results (FOCS 2008). Additionally, we unify the approach in that work and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on $t$ is weak; that is, the bound drops exponentially in $t$. We show a much stronger (tight) time-space tradeoff for the class of dynamic low contention data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often. △ Less

Submitted 3 May, 2010; originally announced May 2010.

Comments: 29 pages

ACM Class: F.2.2; E.1

Showing 1–14 of 14 results for author: Wieder, U