-
Easy Learning from Label Proportions
Authors:
Robert Istvan Busa-Fekete,
Hee** Choi,
Travis Dick,
Claudio Gentile,
Andres Munoz medina
Abstract:
We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into "bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on a…
▽ More
We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into "bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on aggregate labels, which operates on arbitrary loss functions. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level. We showcase the flexibility of our approach by applying it to popular learning frameworks, like Empirical Risk Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable guarantees on instance level performance. More concretely, we exhibit a variance reduction technique that makes the quality of LLP learning deteriorate only by a factor of k (k being bag size) in both ERM and SGD setups, as compared to full supervision. Finally, we validate our theoretical results on multiple datasets demonstrating our algorithm performs as well or better than previous LLP approaches in spite of its simplicity.
△ Less
Submitted 13 February, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Learning-Augmented Private Algorithms for Multiple Quantile Release
Authors:
Mikhail Khodak,
Kareem Amin,
Travis Dick,
Sergei Vassilvitskii
Abstract:
When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework -- previously applied largely to improve time complexity or competitive ratios -- as a powerful way of designing and analyzing priv…
▽ More
When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework -- previously applied largely to improve time complexity or competitive ratios -- as a powerful way of designing and analyzing privacy-preserving methods that can take advantage of such external information to improve utility. This idea is instantiated on the important task of multiple quantile release, for which we derive error guarantees that scale with a natural measure of prediction quality while (almost) recovering state-of-the-art prediction-independent guarantees. Our analysis enjoys several advantages, including minimal assumptions about the data, a natural way of adding robustness, and the provision of useful surrogate losses for two novel ``meta" algorithms that learn predictions from other (potentially sensitive) data. We conclude with experiments on challenging tasks demonstrating that learning predictions across one or more instances can lead to large error reductions while preserving privacy.
△ Less
Submitted 8 May, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part II: Application and Case Study
Authors:
Di Kang,
Jiaxi Zhao,
C. Tyler Dick,
Xiang Liu,
Zheyong Bian,
Steven W. Kirkpatrick,
Chen-Yu Lin
Abstract:
Built upon the risk analysis methodology (presented in the part I paper), this part II paper focuses on applying this methodology. Five illustrative scenarios were used to analyze the best or worst cases and compare the transportation risk differences between service options using unit trains and manifest trains. The comparison results indicate that if all tank cars are placed at the positions wit…
▽ More
Built upon the risk analysis methodology (presented in the part I paper), this part II paper focuses on applying this methodology. Five illustrative scenarios were used to analyze the best or worst cases and compare the transportation risk differences between service options using unit trains and manifest trains. The comparison results indicate that if all tank cars are placed at the positions with the lowest probability of derailing and if switching tank cars alone in classification yards, it could provide the lowest risk estimate given the same transportation demand (i.e., number of tank cars to transport). This paper also shows that based on the data and parameters in the case study, risks during arrival/departure events and yard switching events could be as significant as risks that on mainlines. This paper provides a way to use the risk analysis methodology for rail safety decisions. The methodology and its application can be tailored to specific infrastructure and rolling stock characteristics.
△ Less
Submitted 4 July, 2022;
originally announced August 2022.
-
Comparing Unit Trains versus Manifest Trains for the Risk of Rail Transport of Hazardous Materials -- Part I: Risk Analysis Methodology
Authors:
Di Kang,
Jiaxi Zhao,
C. Tyler Dick,
Xiang Liu,
Zheyong Bian,
Steven W. Kirkpatrick,
Chen-Yu Lin
Abstract:
Transporting hazardous materials (hazmats) using tank cars has more significant economic benefits than other transportation modes. Although railway transportation is roughly four times more fuel-efficient than roadway transportation, a train derailment has greater potential to cause more disastrous consequences than a truck incident. Train types, such as unit train or manifest train (also called m…
▽ More
Transporting hazardous materials (hazmats) using tank cars has more significant economic benefits than other transportation modes. Although railway transportation is roughly four times more fuel-efficient than roadway transportation, a train derailment has greater potential to cause more disastrous consequences than a truck incident. Train types, such as unit train or manifest train (also called mixed train), can influence transport risks in several ways. For example, unit trains only experience risks on mainlines and when arriving at or departing from terminals, while manifest trains experience additional switching risks in yards. Based on prior studies and various data sources covering the years 1996-2018, this paper constructs event chains for line-haul risks on mainlines (for both unit trains and manifest trains), arrival/departure risks in terminals (for unit trains) and yards (for manifest trains), and yard switching risks for manifest trains using various probabilistic models, and finally determines expected casualties as the consequences of a potential train derailment and release incident. This is the first analysis to quantify the total risks a train may encounter throughout the shipment process, either on mainlines or in yards/terminals, distinguishing train types. It provides a methodology applicable to any train to calculate the expected risks (quantified as expected casualties in this paper) from an origin to a destination.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning
Authors:
Kaiwen Wang,
Travis Dick,
Maria-Florina Balcan
Abstract:
This paper introduces the first provably accurate algorithms for differentially private, top-down decision tree learning in the distributed setting (Balcan et al., 2012). We propose DP-TopDown, a general privacy preserving decision tree learning algorithm, and present two distributed implementations. Our first method NoisyCounts naturally extends the single machine algorithm by using the Laplace m…
▽ More
This paper introduces the first provably accurate algorithms for differentially private, top-down decision tree learning in the distributed setting (Balcan et al., 2012). We propose DP-TopDown, a general privacy preserving decision tree learning algorithm, and present two distributed implementations. Our first method NoisyCounts naturally extends the single machine algorithm by using the Laplace mechanism. Our second method LocalRNM significantly reduces communication and added noise by performing local optimization at each data holder. We provide the first utility guarantees for differentially private top-down decision tree learning in both the single machine and distributed settings. These guarantees show that the error of the privately-learned decision tree quickly goes to zero provided that the dataset is sufficiently large. Our extensive experiments on real datasets illustrate the trade-offs of privacy, accuracy and generalization when learning private decision trees in the distributed setting.
△ Less
Submitted 22 February, 2021; v1 submitted 19 December, 2020;
originally announced December 2020.
-
Algorithms and Learning for Fair Portfolio Design
Authors:
Emily Diana,
Travis Dick,
Hadi Elzayn,
Michael Kearns,
Aaron Roth,
Zachary Schutzman,
Saeed Sharifi-Malvajerdi,
Juba Ziani
Abstract:
We consider a variation on the classical finance problem of optimal portfolio design. In our setting, a large population of consumers is drawn from some distribution over risk tolerances, and each consumer must be assigned to a portfolio of lower risk than her tolerance. The consumers may also belong to underlying groups (for instance, of demographic properties or wealth), and the goal is to desig…
▽ More
We consider a variation on the classical finance problem of optimal portfolio design. In our setting, a large population of consumers is drawn from some distribution over risk tolerances, and each consumer must be assigned to a portfolio of lower risk than her tolerance. The consumers may also belong to underlying groups (for instance, of demographic properties or wealth), and the goal is to design a small number of portfolios that are fair across groups in a particular and natural technical sense.
Our main results are algorithms for optimal and near-optimal portfolio design for both social welfare and fairness objectives, both with and without assumptions on the underlying group structure. We describe an efficient algorithm based on an internal two-player zero-sum game that learns near-optimal fair portfolios ex ante and show experimentally that it can be used to obtain a small set of fair portfolios ex post as well. For the special but natural case in which group structure coincides with risk tolerances (which models the reality that wealthy consumers generally tolerate greater risk), we give an efficient and optimal fair algorithm. We also provide generalization guarantees for the underlying risk distribution that has no dependence on the number of portfolios and illustrate the theory with simulation results.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images
Authors:
Avrim Blum,
Travis Dick,
Naren Manoj,
Hongyang Zhang
Abstract:
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $\ell_p$ ball of radius $ε$ when $p>2$. Although random smoothing has been well understood for the $\ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open probl…
▽ More
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $\ell_p$ ball of radius $ε$ when $p>2$. Although random smoothing has been well understood for the $\ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $\ell_\infty$ threat model. In this work, we show that any noise distribution $\mathcal{D}$ over $\mathbb{R}^d$ that provides $\ell_p$ robustness for all base classifiers with $p>2$ must satisfy $\mathbb{E}η_i^2=Ω(d^{1-2/p}ε^2(1-δ)/δ^2)$ for 99% of the features (pixels) of vector $η\sim\mathcal{D}$, where $ε$ is the robust radius and $δ$ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
△ Less
Submitted 5 March, 2020; v1 submitted 9 February, 2020;
originally announced February 2020.
-
How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design
Authors:
Maria-Florina Balcan,
Dan DeBlasio,
Travis Dick,
Carl Kingsford,
Tuomas Sandholm,
Ellen Vitercik
Abstract:
Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case in…
▽ More
Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set.
We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithm's average performance over the training set and its expected performance. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause large changes in behavior. Prior research has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We uncover a unifying structure which we use to prove extremely general guarantees, yet we recover the bounds from prior research. Our guarantees apply whenever an algorithm's performance is a piecewise-constant, -linear, or -- more generally -- piecewise-structured function of its parameters. Our theory also implies novel bounds for voting mechanisms and dynamic programming algorithms from computational biology.
△ Less
Submitted 25 April, 2021; v1 submitted 7 August, 2019;
originally announced August 2019.
-
Learning piecewise Lipschitz functions in changing environments
Authors:
Maria-Florina Balcan,
Travis Dick,
Dravyansh Sharma
Abstract:
Optimization in the presence of sharp (non-Lipschitz), unpredictable (w.r.t. time and amount) changes is a challenging and largely unexplored problem of great significance. We consider the class of piecewise Lipschitz functions, which is the most general online setting considered in the literature for the problem, and arises naturally in various combinatorial algorithm selection problems where uti…
▽ More
Optimization in the presence of sharp (non-Lipschitz), unpredictable (w.r.t. time and amount) changes is a challenging and largely unexplored problem of great significance. We consider the class of piecewise Lipschitz functions, which is the most general online setting considered in the literature for the problem, and arises naturally in various combinatorial algorithm selection problems where utility functions can have sharp discontinuities. The usual performance metric of $\mathit{static}$ regret minimizes the gap between the payoff accumulated and that of the best fixed point for the entire duration, and thus fails to capture changing environments. Shifting regret is a useful alternative, which allows for up to $s$ environment shifts. In this work we provide an $O(\sqrt{sdT\log T}+sT^{1-β})$ regret bound for $β$-dispersed functions, where $β$ roughly quantifies the rate at which discontinuities appear in the utility functions in expectation (typically $β\ge1/2$ in problems of practical interest). We also present a lower bound tight up to sub-logarithmic factors. We further obtain improved bounds when selecting from a small pool of experts. We empirically demonstrate a key application of our algorithms to online clustering problems on popular benchmarks.
△ Less
Submitted 6 August, 2020; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Semi-bandit Optimization in the Dispersed Setting
Authors:
Maria-Florina Balcan,
Travis Dick,
Wesley Pegden
Abstract:
The goal of data-driven algorithm design is to obtain high-performing algorithms for specific application domains using machine learning and data. Across many fields in AI, science, and engineering, practitioners will often fix a family of parameterized algorithms and then optimize those parameters to obtain good performance on example instances from the application domain. In the online setting,…
▽ More
The goal of data-driven algorithm design is to obtain high-performing algorithms for specific application domains using machine learning and data. Across many fields in AI, science, and engineering, practitioners will often fix a family of parameterized algorithms and then optimize those parameters to obtain good performance on example instances from the application domain. In the online setting, we must choose algorithm parameters for each instance as they arrive, and our goal is to be competitive with the best fixed algorithm in hindsight.
There are two major challenges in online data-driven algorithm design. First, it can be computationally expensive to evaluate the loss functions that map algorithm parameters to performance, which often require the learner to run a combinatorial algorithm to measure its performance. Second, the losses can be extremely volatile and have sharp discontinuities. However, we show that in many applications, evaluating the loss function for one algorithm choice can sometimes reveal the loss for a range of similar algorithms, essentially for free. We develop online optimization algorithms capable of using this kind of extra information by working in the semi-bandit feedback setting. Our algorithms achieve regret bounds that are essentially as good as algorithms under full-information feedback and are significantly more computationally efficient. We apply our semi-bandit results to obtain the first provable guarantees for data-driven algorithm design for linkage-based clustering and we improve the best regret bounds for designing greedy knapsack algorithms.
△ Less
Submitted 21 December, 2020; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Envy-Free Classification
Authors:
Maria-Florina Balcan,
Travis Dick,
Ritesh Noothigattu,
Ariel D. Procaccia
Abstract:
In classic fair division problems such as cake cutting and rent division, envy-freeness requires that each individual (weakly) prefer his allocation to anyone else's. On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks. Our technical focus is the generalizability of envy-free classification, i.e., understanding whether a classif…
▽ More
In classic fair division problems such as cake cutting and rent division, envy-freeness requires that each individual (weakly) prefer his allocation to anyone else's. On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks. Our technical focus is the generalizability of envy-free classification, i.e., understanding whether a classifier that is envy free on a sample would be almost envy free with respect to the underlying distribution with high probability. Our main result establishes that a small sample is sufficient to achieve such guarantees, when the classifier in question is a mixture of deterministic classifiers that belong to a family of low Natarajan dimension.
△ Less
Submitted 24 September, 2020; v1 submitted 23 September, 2018;
originally announced September 2018.
-
Data Driven Resource Allocation for Distributed Learning
Authors:
Travis Dick,
Mu Li,
Venkata Krishna Pillutla,
Colin White,
Maria Florina Balcan,
Alex Smola
Abstract:
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present…
▽ More
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.
△ Less
Submitted 15 December, 2016; v1 submitted 15 December, 2015;
originally announced December 2015.