Search | arXiv e-print repository

arXiv:2306.03262 [pdf, other]

Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

Authors: Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan

Abstract: We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process. We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approxi… ▽ More We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process. We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approximately half of the list of accepted papers would change if the review process were randomly rerun. Our analysis suggests that making the conference more selective would increase the arbitrariness of the process. Taken together with previous research, our results highlight the inherent difficulty of objectively measuring the quality of research, and suggest that authors should not be excessively discouraged by rejected work. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2211.12966 [pdf, other]

How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

Authors: Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

Abstract: How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based… ▽ More How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2003.12206 [pdf, other]

Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

Authors: Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d'Alché-Buc, Emily Fox, Hugo Larochelle

Abstract: One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible res… ▽ More One of the challenges in machine learning research is to ensure that presented and published results are sound and reliable. Reproducibility, that is obtaining similar results as presented in a paper or talk, using the same code and data (when available), is a necessary step to verify the reliability of research findings. Reproducibility is also an important step to promote open and accessible research, thereby allowing the scientific community to quickly integrate new findings and convert ideas to practice. Reproducibility also promotes the use of robust experimental workflows, which potentially reduce unintentional errors. In 2019, the Neural Information Processing Systems (NeurIPS) conference, the premier international conference for research in machine learning, introduced a reproducibility program, designed to improve the standards across the community for how we conduct, communicate, and evaluate machine learning research. The program contained three components: a code submission policy, a community-wide reproducibility challenge, and the inclusion of the Machine Learning Reproducibility checklist as part of the paper submission process. In this paper, we describe each of these components, how it was deployed, as well as what we were able to learn from this initiative. △ Less

Submitted 30 December, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

Comments: To appear at JMLR, 16 pages + Appendix

arXiv:1902.02244 [pdf, other]

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case

Authors: Alina Beygelzimer, Dávid Pál, Balázs Szörényi, Devanathan Thiruvenkatachari, Chen-Yu Wei, Chicheng Zhang

Abstract: We study the problem of efficient online multiclass linear classification with bandit feedback, where all examples belong to one of $K$ classes and lie in the $d$-dimensional Euclidean space. Previous works have left open the challenge of designing efficient algorithms with finite mistake bounds when the data is linearly separable by a margin $γ$. In this work, we take a first step towards this pr… ▽ More We study the problem of efficient online multiclass linear classification with bandit feedback, where all examples belong to one of $K$ classes and lie in the $d$-dimensional Euclidean space. Previous works have left open the challenge of designing efficient algorithms with finite mistake bounds when the data is linearly separable by a margin $γ$. In this work, we take a first step towards this problem. We consider two notions of linear separability: strong and weak. 1. Under the strong linear separability condition, we design an efficient algorithm that achieves a near-optimal mistake bound of $O\left( K/γ^2 \right)$. 2. Under the more challenging weak linear separability condition, we design an efficient algorithm with a mistake bound of $\min (2^{\widetilde{O}(K \log^2 (1/γ))}, 2^{\widetilde{O}(\sqrt{1/γ} \log K)})$. Our algorithm is based on kernel Perceptron, which is inspired by the work of (Klivans and Servedio, 2008) on improperly learning intersection of halfspaces. △ Less

Submitted 18 June, 2019; v1 submitted 6 February, 2019; originally announced February 2019.

Comments: 41 pages, 8 figures

arXiv:1807.06473 [pdf, other]

Contextual Memory Trees

Authors: Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

Abstract: We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substanti… ▽ More We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. Furthermore CMT operates as a reduction to classification, allowing it to benefit from advances in representation or architecture. We demonstrate the efficacy of CMT by augmenting existing multi-class and multi-label classification algorithms with CMT and observe statistical improvement. We also test CMT learning on several image-captioning tasks to demonstrate that it performs computationally better than a simple nearest neighbors memory system while benefitting from reward learning. △ Less

Submitted 2 June, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: ICM 2019

arXiv:1803.02453 [pdf, other]

A Reductions Approach to Fair Classification

Authors: Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach

Abstract: We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier wit… ▽ More We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier with the lowest (empirical) error subject to the desired constraints. We introduce two reductions that work for any representation of the cost-sensitive classifier and compare favorably to prior baselines on a variety of data sets, while overcoming several of their disadvantages. △ Less

Submitted 16 July, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

arXiv:1702.07958 [pdf, other]

Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

Authors: Alina Beygelzimer, Francesco Orabona, Chicheng Zhang

Abstract: We present an efficient second-order algorithm with $\tilde{O}(\frac{1}η\sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $η$, for a range of $η$ restricted by the norm of the competitor. The family of loss functions ranges from hinge loss ($η=0$) to squared hinge loss ($η=1$). This provides… ▽ More We present an efficient second-order algorithm with $\tilde{O}(\frac{1}η\sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $η$, for a range of $η$ restricted by the norm of the competitor. The family of loss functions ranges from hinge loss ($η=0$) to squared hinge loss ($η=1$). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms. △ Less

Submitted 17 January, 2018; v1 submitted 25 February, 2017; originally announced February 2017.

Comments: 22 pages, 2 figures; ICML 2017; this version includes additional discussions of Newtron, and a variant of SOBA that directly uses an online exp-concave optimization oracle

arXiv:1602.07265 [pdf, ps, other]

Search Improves Label for Active Learning

Authors: Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang

Abstract: We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not). The Search oracle models the situation where a human searches a database to seed or counterexample an existing solution. Search is stronger than Label while being natural to implement in many situations. We show that an algorithm using both oracles can provide exponentially larg… ▽ More We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not). The Search oracle models the situation where a human searches a database to seed or counterexample an existing solution. Search is stronger than Label while being natural to implement in many situations. We show that an algorithm using both oracles can provide exponentially large problem-dependent improvements over Label alone. △ Less

Submitted 24 October, 2016; v1 submitted 23 February, 2016; originally announced February 2016.

Comments: 32 pages; NIPS 2016

arXiv:1506.04820 [pdf, ps, other]

Online Gradient Boosting

Authors: Alina Beygelzimer, Elad Hazan, Satyen Kale, Haipeng Luo

Abstract: We extend the theory of boosting for regression problems to the online learning setting. Generalizing from the batch setting for boosting, the notion of a weak learning algorithm is modeled as an online learning algorithm with linear loss functions that competes with a base class of regression functions, while a strong learning algorithm is an online learning algorithm with convex loss functions t… ▽ More We extend the theory of boosting for regression problems to the online learning setting. Generalizing from the batch setting for boosting, the notion of a weak learning algorithm is modeled as an online learning algorithm with linear loss functions that competes with a base class of regression functions, while a strong learning algorithm is an online learning algorithm with convex loss functions that competes with a larger class of regression functions. Our main result is an online gradient boosting algorithm which converts a weak online learning algorithm into a strong one where the larger class of functions is the linear span of the base class. We also give a simpler boosting algorithm that converts a weak online learning algorithm into a strong one where the larger class of functions is the convex hull of the base class, and prove its optimality. △ Less

Submitted 30 October, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

arXiv:1502.02704 [pdf, other]

Learning Reductions that Really Work

Authors: Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

Abstract: We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful. We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful. △ Less

Submitted 9 February, 2015; originally announced February 2015.

arXiv:1502.02651 [pdf, ps, other]

Optimal and Adaptive Algorithms for Online Boosting

Authors: Alina Beygelzimer, Satyen Kale, Haipeng Luo

Abstract: We study online boosting, the task of converting any weak online learner into a strong online learner. Based on a novel and natural definition of weak online learnability, we develop two online boosting algorithms. The first algorithm is an online version of boost-by-majority. By proving a matching lower bound, we show that this algorithm is essentially optimal in terms of the number of weak learn… ▽ More We study online boosting, the task of converting any weak online learner into a strong online learner. Based on a novel and natural definition of weak online learnability, we develop two online boosting algorithms. The first algorithm is an online version of boost-by-majority. By proving a matching lower bound, we show that this algorithm is essentially optimal in terms of the number of weak learners and the sample complexity needed to achieve a specified accuracy. This optimal algorithm is not adaptive however. Using tools from online loss minimization, we derive an adaptive online boosting algorithm that is also parameter-free, but not optimal. Both algorithms work with base learners that can handle example importance weights directly, as well as by rejection sampling examples with probability defined by the booster. Results are complemented with an extensive experimental study. △ Less

Submitted 9 February, 2015; originally announced February 2015.

arXiv:1410.0440 [pdf, other]

Scalable Nonlinear Learning with Adaptive Polynomial Expansions

Authors: Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky

Abstract: Can we effectively learn a nonlinear representation in time comparable to linear learning? We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations. The algorithm is designed for extreme computational efficiency, and an extensive experimental study shows that its computation/prediction tradeoff ability compares very favor… ▽ More Can we effectively learn a nonlinear representation in time comparable to linear learning? We describe a new algorithm that explicitly and adaptively expands higher-order interaction features over base linear representations. The algorithm is designed for extreme computational efficiency, and an extensive experimental study shows that its computation/prediction tradeoff ability compares very favorably against strong baselines. △ Less

Submitted 1 October, 2014; originally announced October 2014.

Comments: To appear in NIPS 2014

arXiv:1408.2031 [pdf]

Conditional Probability Tree Estimation Analysis and Algorithms

Authors: Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alexander L. Strehl

Abstract: We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably co… ▽ More We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels. △ Less

Submitted 9 August, 2014; originally announced August 2014.

Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Report number: UAI-P-2009-PG-51-58

arXiv:1207.1418 [pdf]

Efficient Test Selection in Active Diagnosis via Entropy Approximation

Authors: Alice X. Zheng, Irina Rish, Alina Beygelzimer

Abstract: We consider the problem of diagnosing faults in a system represented by a Bayesian network, where diagnosis corresponds to recovering the most likely state of unobserved nodes given the outcomes of tests (observed nodes). Finding an optimal subset of tests in this setting is intractable in general. We show that it is difficult even to compute the next most-informative test using greedy test select… ▽ More We consider the problem of diagnosing faults in a system represented by a Bayesian network, where diagnosis corresponds to recovering the most likely state of unobserved nodes given the outcomes of tests (observed nodes). Finding an optimal subset of tests in this setting is intractable in general. We show that it is difficult even to compute the next most-informative test using greedy test selection, as it involves several entropy terms whose exact computation is intractable. We propose an approximate approach that utilizes the loopy belief propagation infrastructure to simultaneously compute approximations of marginal and conditional entropies on multiple subsets of nodes. We apply our method to fault diagnosis in computer networks, and show the algorithm to be very effective on realistic Internet-like topologies. We also provide theoretical justification for the greedy test selection approach, along with some performance guarantees. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

Report number: UAI-P-2005-PG-675-682

arXiv:1201.6655 [pdf, other]

Learning Performance of Prediction Markets with Kelly Bettors

Authors: Alina Beygelzimer, John Langford, David Pennock

Abstract: In evaluating prediction markets (and other crowd-prediction mechanisms), investigators have repeatedly observed a so-called "wisdom of crowds" effect, which roughly says that the average of participants performs much better than the average participant. The market price---an average or at least aggregate of traders' beliefs---offers a better estimate than most any individual trader's opinion. In… ▽ More In evaluating prediction markets (and other crowd-prediction mechanisms), investigators have repeatedly observed a so-called "wisdom of crowds" effect, which roughly says that the average of participants performs much better than the average participant. The market price---an average or at least aggregate of traders' beliefs---offers a better estimate than most any individual trader's opinion. In this paper, we ask a stronger question: how does the market price compare to the best trader's belief, not just the average trader. We measure the market's worst-case log regret, a notion common in machine learning theory. To arrive at a meaningful answer, we need to assume something about how traders behave. We suppose that every trader optimizes according to the Kelly criteria, a strategy that provably maximizes the compound growth of wealth over an (infinite) sequence of market interactions. We show several consequences. First, the market prediction is a wealth-weighted average of the individual participants' beliefs. Second, the market learns at the optimal rate, the market price reacts exactly as if updating according to Bayes' Law, and the market prediction has low worst-case log regret to the best individual participant. We simulate a sequence of markets where an underlying true probability exists, showing that the market converges to the true objective frequency as if updating a Beta distribution, as the theory predicts. If agents adopt a fractional Kelly criteria, a common practical variant, we show that agents behave like full-Kelly agents with beliefs weighted between their own and the market's, and that the market price converges to a time-discounted frequency. Our analysis provides a new justification for fractional Kelly betting, a strategy widely used in practice for ad-hoc reasons. Finally, we propose a method for an agent to learn her own optimal Kelly fraction. △ Less

Submitted 31 January, 2012; originally announced January 2012.

arXiv:1006.2588 [pdf, ps, other]

Agnostic Active Learning Without Constraints

Authors: Alina Beygelzimer, Daniel Hsu, John Langford, Tong Zhang

Abstract: We present and analyze an agnostic active learning algorithm that works without kee** a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with mai… ▽ More We present and analyze an agnostic active learning algorithm that works without kee** a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification. △ Less

Submitted 13 June, 2010; originally announced June 2010.

arXiv:1002.4058 [pdf, ps, other]

Contextual Bandit Algorithms with Supervised Learning Guarantees

Authors: Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire

Abstract: We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most… ▽ More We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most $O(\sqrt{KT\ln(N/δ)})$ over $T$ time steps. The new algorithm is tested empirically in a large-scale, real-world dataset. Second, we give a new algorithm called VE that competes with a possibly infinite set of policies of VC-dimension $d$ while incurring regret at most $O(\sqrt{T(d\ln(T) + \ln (1/δ))})$ with probability $1-δ$. These guarantees improve on those of all previous algorithms, whether in a stochastic or adversarial environment, and bring us closer to providing supervised learning type guarantees for the contextual bandit setting. △ Less

Submitted 27 October, 2011; v1 submitted 22 February, 2010; originally announced February 2010.

Comments: 10 pages

ACM Class: I.2.6

arXiv:0903.4217 [pdf, ps, other]

Conditional Probability Tree Estimation Analysis and Algorithms

Authors: Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alex Strehl

Abstract: We consider the problem of estimating the conditional probability of a label in time $O(\log n)$, where $n$ is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which prov… ▽ More We consider the problem of estimating the conditional probability of a label in time $O(\log n)$, where $n$ is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly $10^6$ labels. △ Less

Submitted 3 June, 2009; v1 submitted 24 March, 2009; originally announced March 2009.

arXiv:0902.3176 [pdf, ps, other]

Error-Correcting Tournaments

Authors: Alina Beygelzimer, John Langford, Pradeep Ravikumar

Abstract: We present a family of pairwise tournaments reducing $k$-class classification to binary classification. These reductions are provably robust against a constant fraction of binary errors. The results improve on the PECOC construction \cite{SECOC} with an exponential improvement in computation, from $O(k)$ to $O(\log_2 k)$, and the removal of a square root in the regret dependence, matching the be… ▽ More We present a family of pairwise tournaments reducing $k$-class classification to binary classification. These reductions are provably robust against a constant fraction of binary errors. The results improve on the PECOC construction \cite{SECOC} with an exponential improvement in computation, from $O(k)$ to $O(\log_2 k)$, and the removal of a square root in the regret dependence, matching the best possible computation and regret up to a constant. △ Less

Submitted 3 February, 2010; v1 submitted 18 February, 2009; originally announced February 2009.

Comments: Minor wording improvements

arXiv:0812.4952 [pdf, ps, other]

Importance Weighted Active Learning

Authors: Alina Beygelzimer, Sanjoy Dasgupta, John Langford

Abstract: We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process. Experiments on passively labeled data show that this approach reduces the label complexity… ▽ More We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process. Experiments on passively labeled data show that this approach reduces the label complexity required to achieve good predictive performance on many learning problems. △ Less

Submitted 20 May, 2009; v1 submitted 29 December, 2008; originally announced December 2008.

arXiv:0812.4044 [pdf, ps, other]

The Offset Tree for Learning with Partial Labels

Authors: Alina Beygelzimer, John Langford

Abstract: We present an algorithm, called the Offset Tree, for learning to make decisions in situations where the payoff of only one choice is observed, rather than all choices. The algorithm reduces this setting to binary classification, allowing one to reuse of any existing, fully supervised binary classification algorithm in this partial information setting. We show that the Offset Tree is an optimal red… ▽ More We present an algorithm, called the Offset Tree, for learning to make decisions in situations where the payoff of only one choice is observed, rather than all choices. The algorithm reduces this setting to binary classification, allowing one to reuse of any existing, fully supervised binary classification algorithm in this partial information setting. We show that the Offset Tree is an optimal reduction to binary classification. In particular, it has regret at most $(k-1)$ times the regret of the binary classifier it uses (where $k$ is the number of choices), and no reduction to binary classification can do better. This reduction is also computationally optimal, both at training and test time, requiring just $O(\log_2 k)$ work to train on an example or make a prediction. Experiments with the Offset Tree show that it generally performs better than several alternative approaches. △ Less

Submitted 3 April, 2016; v1 submitted 21 December, 2008; originally announced December 2008.

arXiv:cs/9911007 [pdf, ps, other]

One-Way Functions in Worst-Case Cryptography: Algebraic and Security Properties

Authors: A. Beygelzimer, L. A. Hemaspaandra, C. M. Homan, J. Rothe

Abstract: We survey recent developments in the study of (worst-case) one-way functions having strong algebraic and security properties. According to [RS93], this line of research was initiated in 1984 by Rivest and Sherman who designed two-party secret-key agreement protocols that use strongly noninvertible, total, associative one-way functions as their key building blocks. If commutativity is added as an… ▽ More We survey recent developments in the study of (worst-case) one-way functions having strong algebraic and security properties. According to [RS93], this line of research was initiated in 1984 by Rivest and Sherman who designed two-party secret-key agreement protocols that use strongly noninvertible, total, associative one-way functions as their key building blocks. If commutativity is added as an ingredient, these protocols can be used by more than two parties, as noted by Rabi and Sherman [RS93] who also developed digital signature protocols that are based on such enhanced one-way functions. Until recently, it was an open question whether one-way functions having the algebraic and security properties that these protocols require could be created from any given one-way function. Recently, Hemaspaandra and Rothe [HR99] resolved this open issue in the affirmative, by showing that one-way functions exist if and only if strong, total, commutative, associative one-way functions exist. We discuss this result, and the work of Rabi, Rivest, and Sherman, and recent work of Homan [Hom99] that makes progress on related issues. △ Less

Submitted 15 November, 1999; originally announced November 1999.

Comments: 17 pages

Report number: University of Rochester Technical Report UR-CS TR 722 ACM Class: F.1.3; E.3

Showing 1–22 of 22 results for author: Beygelzimer, A