-
arXiv:2309.01173 [pdf, ps, other]
Logic of subjective probability
Abstract: In this paper I discuss both syntax and semantics of subjective probability. The semantics determines ways of testing probability statements. Among important varieties of subjective probabilities are intersubjective probabilities and impersonal probabilities, and I will argue that well-tested impersonal probabilities acquire features of objective probabilities. Jeffreys's law, my next topic, state… ▽ More
Submitted 3 September, 2023; originally announced September 2023.
Comments: 19 pages
MSC Class: 62A01 (Primary) 60A99; 62F15; 60F15; 62C05; 60G42; 62M20 (Secondary)
-
arXiv:2210.01948 [pdf, ps, other]
Game-theoretic statistics and safe anytime-valid inference
Abstract: Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stop** times, accommodating continuous monitoring and analysis of accumulating data and optional stop** or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative… ▽ More
Submitted 17 June, 2023; v1 submitted 4 October, 2022; originally announced October 2022.
Comments: 25 pages. Under review. ArXiv does not compile/space some references properly
-
Conformal testing: binary case with Markov alternatives
Abstract: We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency.
Submitted 2 November, 2021; originally announced November 2021.
Comments: 8 pages, 8 figures
MSC Class: 68Q32 (Primary) 62G10; 60G42 (Secondary)
-
Protected probabilistic classification
Abstract: This paper proposes a way of protecting probabilistic prediction models against changes in the data distribution, concentrating on the case of classification and paying particular attention to binary classification. This is important in applications of machine learning, where the quality of a trained prediction algorithm may drop significantly in the process of its exploitation. Our techniques are… ▽ More
Submitted 22 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.
Comments: 23 pages, 14 figures, and 4 tables
MSC Class: 68Q32 (Primary) 68T05; 60G25; 60G42; 62F03; 62M20 (Secondary)
-
Enhancement of prediction algorithms by betting
Abstract: This note proposes a procedure for enhancing the quality of probabilistic prediction algorithms via betting against their predictions. It is inspired by the success of the conformal test martingales that have been developed recently.
Submitted 18 May, 2021; originally announced May 2021.
Comments: 7 pages, 4 figures
MSC Class: 68T05 (Primary) 62G10; 60G42 (Secondary)
-
Conformal testing in a binary model situation
Abstract: Conformal testing is a way of testing the IID assumption based on conformal prediction. The topic of this note is computational evaluation of the performance of conformal testing in a model situation in which IID binary observations generated from a Bernoulli distribution are followed by IID binary observations generated from another Bernoulli distribution, with the parameters of the distributions… ▽ More
Submitted 5 April, 2021; originally announced April 2021.
Comments: 8 pages, 5 figures
MSC Class: 68T05; 62G10 (Primary) 62L10; 60G42 (Secondary)
-
Retrain or not retrain: Conformal test martingales for change-point detection
Abstract: We argue for supplementing the process of training a prediction algorithm by setting up a scheme for detecting the moment when the distribution of the data changes and the algorithm needs to be retrained. Our proposed schemes are based on exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the data. Our method, based on conformal prediction, is… ▽ More
Submitted 20 February, 2021; originally announced February 2021.
Comments: 22 pages, 19 figures, 3 tables
MSC Class: 68Q32 (Primary) 62G10; 60G42; 68T05 (Secondary)
-
Testing for concept shift online
Abstract: This note continues study of exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the observations. Such processes can be used for detecting violations of the IID assumption, which is commonly made in machine learning. Violations of the IID assumption are sometimes referred to as dataset shift, and dataset shift is sometimes subdivided into conc… ▽ More
Submitted 28 December, 2020; originally announced December 2020.
Comments: 14 pages, 2 figures
MSC Class: 68Q32 (Primary) 62G10; 60G42; 68T05 (Secondary)
-
Training conformal predictors
Abstract: Efficiency criteria for conformal prediction, such as \emph{observed fuzziness} (i.e., the sum of p-values associated with false labels), are commonly used to \emph{evaluate} the performance of given conformal predictors. Here, we investigate whether it is possible to exploit efficiency criteria to \emph{learn} classifiers, both conformal predictors and point classifiers, by using such criteria as… ▽ More
Submitted 14 May, 2020; originally announced May 2020.
Comments: 8 pages, 2 figures, 4 tables
-
arXiv:2001.05989 [pdf, ps, other]
Cross-conformal e-prediction
Abstract: This note discusses a simple modification of cross-conformal prediction inspired by recent work on e-values. The precursor of conformal prediction developed in the 1990s by Gammerman, Vapnik, and Vovk was also based on e-values and is called conformal e-prediction in this note. Replacing e-values by p-values led to conformal prediction, which has important advantages over conformal e-prediction wi… ▽ More
Submitted 27 June, 2024; v1 submitted 16 January, 2020; originally announced January 2020.
Comments: 8 pages. This version: exposition improved; proof of Proposition 4 added
MSC Class: 68T05
-
Computationally efficient versions of conformal predictive distributions
Abstract: Conformal predictive systems are a recent modification of conformal predictors that output, in regression problems, probability distributions for labels of test observations rather than set predictions. The extra information provided by conformal predictive systems may be useful, e.g., in decision making problems. Conformal predictive systems inherit the relative computational inefficiency of conf… ▽ More
Submitted 3 November, 2019; originally announced November 2019.
Comments: 31 pages, 14 figures, 1 table. The conference version published in the Proceedings of COPA 2018, and the journal version is to appear in Neurocomputing
MSC Class: 68T05
-
Testing randomness
Abstract: The hypothesis of randomness is fundamental in statistical machine learning and in many areas of nonparametric statistics; it says that the observations are assumed to be independent and coming from the same unknown probability distribution. This hypothesis is close, in certain respects, to the hypothesis of exchangeability, which postulates that the distribution of the observations is invariant w… ▽ More
Submitted 29 March, 2020; v1 submitted 21 June, 2019; originally announced June 2019.
Comments: 34 pages, 1 table, 4 figures
MSC Class: 62G10; also 68Q32; 60G42; 68Q30
Journal ref: Statistical Science 36(4):595-611 (2021)
-
Conformal calibrators
Abstract: Most existing examples of full conformal predictive systems, split-conformal predictive systems, and cross-conformal predictive systems impose severe restrictions on the adaptation of predictive distributions to the test object at hand. In this paper we develop split-conformal and cross-conformal predictive systems that are fully adaptive. Our method consists in calibrating existing predictive sys… ▽ More
Submitted 18 February, 2019; originally announced February 2019.
Comments: 10 pages, 2 figures
Report number: 23 MSC Class: 68T05 ACM Class: I.2.6
-
Conformal predictive distributions with kernels
Abstract: This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combin… ▽ More
Submitted 24 October, 2017; originally announced October 2017.
Comments: 20 pages, 3 figures, prepared for the Proceedings of the Braverman Readings (Boston, 28-30 April 2017)
MSC Class: 68Q32 (Primary) 68T05; 62M20; 60G25; 62J07; 62G08; 62F15 (Secondary)
-
arXiv:1708.01902 [pdf, ps, other]
Universally consistent predictive distributions
Abstract: This paper describes simple universally consistent procedures of probability forecasting that satisfy a natural property of small-sample validity, under the assumption that the observations are produced independently in the IID fashion.
Submitted 30 August, 2017; v1 submitted 6 August, 2017; originally announced August 2017.
Comments: 26 pages
MSC Class: 68Q32; 62G20 (Primary) 68M20; 68T05 (Secondary)
-
Criteria of efficiency for conformal prediction
Abstract: We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditi… ▽ More
Submitted 14 September, 2016; v1 submitted 14 March, 2016; originally announced March 2016.
Comments: 31 pages
MSC Class: 68T05 ACM Class: I.2.6
-
arXiv:1603.04283 [pdf, ps, other]
Universal probability-free prediction
Abstract: We construct universal prediction systems in the spirit of Popper's falsifiability and Kolmogorov complexity and randomness. These prediction systems do not depend on any statistical assumptions (but under the IID assumption they dominate, to within the usual accuracy, conformal prediction). Our constructions give rise to a theory of algorithmic complexity and randomness of time containing analogu… ▽ More
Submitted 4 April, 2017; v1 submitted 14 March, 2016; originally announced March 2016.
Comments: 27 pages
MSC Class: 68T05 ACM Class: I.2.6
-
Large-scale probabilistic predictors with and without guarantees of validity
Abstract: This paper studies theoretically and empirically a method of turning machine-learning algorithms into probabilistic predictors that automatically enjoys a property of validity (perfect calibration) and is computationally efficient. The price to pay for perfect calibration is that these probabilistic predictors produce imprecise (in practice, almost precise for large data sets) probabilities. When… ▽ More
Submitted 13 November, 2015; v1 submitted 1 November, 2015; originally announced November 2015.
Comments: 38 pages, 14 figures, to appear in Advances in Neural Information Processing Systems 28 (NIPS 2015). As compared with the previous version (v1), the MATLAB code (the 5 files with extension .m) and results of new empirical studies have been added
Report number: 13 MSC Class: 68T05
-
arXiv:1502.06254 [pdf, ps, other]
The fundamental nature of the log loss function
Abstract: The standard loss functions used in the literature on probabilistic prediction are the log loss function, the Brier loss function, and the spherical loss function; however, any computable proper loss function can be used for comparison of prediction algorithms. This note shows that the log loss function is most selective in that any prediction algorithm that is optimal for a given data sequence (i… ▽ More
Submitted 28 June, 2015; v1 submitted 22 February, 2015; originally announced February 2015.
Comments: 12 pages
MSC Class: 68T05; 68T37; 60G25; 62M20
-
Prediction with Advice of Unknown Number of Experts
Abstract: In the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the Normal- Hedge bound, which mainly depends on the effective number of experts but also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at a… ▽ More
Submitted 9 August, 2014; originally announced August 2014.
Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)
Report number: UAI-P-2010-PG-117-125
-
arXiv:1406.5600 [pdf, ps, other]
From conformal to probabilistic prediction
Abstract: This paper proposes a new method of probabilistic prediction, which is based on conformal prediction. The method is applied to the standard USPS data set and gives encouraging results.
Submitted 21 June, 2014; originally announced June 2014.
Comments: 12 pages, 2 tables
MSC Class: 68T10
-
Efficiency of conformalized ridge regression
Abstract: Conformal prediction is a method of producing prediction sets that can be applied on top of a wide range of prediction algorithms. The method has a guaranteed coverage probability under the standard IID assumption regardless of whether the assumptions (often considerably more restrictive) of the underlying algorithm are satisfied. However, for the method to be really useful it is desirable that in… ▽ More
Submitted 8 April, 2014; originally announced April 2014.
Comments: 22 pages, 1 figure
-
Regression Conformal Prediction with Nearest Neighbours
Abstract: In this paper we apply Conformal Prediction (CP) to the k-Nearest Neighbours Regression (k-NNR) algorithm and propose ways of extending the typical nonconformity measure used for regression so far. Unlike traditional regression methods which produce point predictions, Conformal Predictors output predictive regions that satisfy a given confidence level. The regions produced by any Conformal Predict… ▽ More
Submitted 16 January, 2014; originally announced January 2014.
Journal ref: Journal Of Artificial Intelligence Research, Volume 40, pages 815-840, 2011
-
Kolmogorov's strong law of large numbers in game-theoretic probability: Reality's side
Abstract: The game-theoretic version of Kolmogorov's strong law of large numbers says that Skeptic has a strategy forcing the statement of the law in a game of prediction involving Reality, Forecaster, and Skeptic. This note describes a simple matching strategy for Reality.
Submitted 20 March, 2013; originally announced April 2013.
Comments: 3 pages
MSC Class: 60F15
-
Learning by Transduction
Abstract: We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object/classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicabl… ▽ More
Submitted 30 January, 2013; originally announced January 2013.
Comments: Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998)
Report number: UAI-P-1998-PG-148-155
-
arXiv:1211.0025 [pdf, ps, other]
Venn-Abers predictors
Abstract: This paper continues study, both theoretical and empirical, of the method of Venn prediction, concentrating on binary prediction problems. Venn predictors produce probability-type predictions for the labels of test objects which are guaranteed to be well calibrated under the standard assumption that the observations are generated independently from the same distribution. We give a simple formaliza… ▽ More
Submitted 21 June, 2014; v1 submitted 31 October, 2012; originally announced November 2012.
Comments: 18 pages; to appear in the UAI 2014 Proceedings
Report number: OCM07 MSC Class: 68T05; 68T10
-
Conditional validity of inductive conformal predictors
Abstract: Conformal predictors are set predictors that are automatically valid in the sense of having coverage probability equal to or exceeding a given confidence level. Inductive conformal predictors are a computationally efficient version of conformal predictors satisfying the same property of validity. However, inductive conformal predictors have been only known to control unconditional coverage probabi… ▽ More
Submitted 24 September, 2012; v1 submitted 12 September, 2012; originally announced September 2012.
Comments: 23 pages, 9 figures, 2 tables; to appear in the ACML 2012 Proceedings
Report number: OCMNS05 MSC Class: 68T05; 62G15
-
Cross-conformal predictors
Abstract: This note introduces the method of cross-conformal prediction, which is a hybrid of the methods of inductive conformal prediction and cross-validation, and studies its validity and predictive efficiency empirically.
Submitted 3 August, 2012; originally announced August 2012.
Comments: 10 pages, 2 figures, 1 table
MSC Class: 62G15
-
On-line Prediction with Kernels and the Complexity Approximation Principle
Abstract: The paper describes an application of Aggregating Algorithm to the problem of regression. It generalizes earlier results concerned with plain linear regression to kernel techniques and presents an on-line algorithm which performs nearly as well as any oblivious kernel predictor. The paper contains the derivation of an estimate on the performance of this algorithm. The estimate is then used to deri… ▽ More
Submitted 11 July, 2012; originally announced July 2012.
Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)
Report number: UAI-P-2004-PG-170-176
-
Plug-in martingales for testing exchangeability on-line
Abstract: A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the as… ▽ More
Submitted 28 June, 2012; v1 submitted 15 April, 2012; originally announced April 2012.
Comments: 8 pages, 7 figures; ICML 2012 Conference Proceedings
Report number: On-line Compression Modelling Project (New Series), Working Paper 04 MSC Class: 62G10 ACM Class: I.2.6
-
arXiv:1006.0475 [pdf, ps, other]
Prediction with Advice of Unknown Number of Experts
Abstract: In the framework of prediction with expert advice, we consider a recently introduced kind of regret bounds: the bounds that depend on the effective instead of nominal number of experts. In contrast to the NormalHedge bound, which mainly depends on the effective number of experts and also weakly depends on the nominal one, we obtain a bound that does not contain the nominal number of experts at all… ▽ More
Submitted 2 June, 2010; originally announced June 2010.
Comments: 22 pages; draft version
-
arXiv:1003.2218 [pdf, ps, other]
Supermartingales in Prediction with Expert Advice
Abstract: We apply the method of defensive forecasting, based on the use of game-theoretic supermartingales, to prediction with expert advice. In the traditional setting of a countable number of experts and a finite number of outcomes, the Defensive Forecasting Algorithm is very close to the well-known Aggregating Algorithm. Not only the performance guarantees but also the predictions are the same for these… ▽ More
Submitted 10 March, 2010; originally announced March 2010.
Comments: 37 pages
-
arXiv:0910.4683 [pdf, ps, other]
Competing with Gaussian linear experts
Abstract: We study the problem of online regression. We prove a theoretical bound on the square loss of Ridge Regression. We do not make any assumptions about input vectors or outcomes. We also show that Bayesian Ridge Regression can be thought of as an online algorithm competing with all the Gaussian linear experts.
Submitted 10 May, 2010; v1 submitted 24 October, 2009; originally announced October 2009.
-
arXiv:0904.1579 [pdf, ps, other]
Online prediction of ovarian cancer
Abstract: In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classificat… ▽ More
Submitted 9 April, 2009; originally announced April 2009.
Comments: 11 pages, 4 figures, uses llncs.cls
ACM Class: I.2.1
-
arXiv:0902.4127 [pdf, ps, other]
Prediction with expert evaluators' advice
Abstract: We introduce a new protocol for prediction with expert advice in which each expert evaluates the learner's and his own performance using a loss function that may change over time and may be different from the loss functions used by the other experts. The learner's goal is to perform better or not much worse than each expert, as evaluated by that expert, for all experts simultaneously. If the los… ▽ More
Submitted 23 March, 2009; v1 submitted 24 February, 2009; originally announced February 2009.
Comments: 18 pages
-
arXiv:0710.0485 [pdf, ps, other]
Prediction with expert advice for the Brier game
Abstract: We show that the Brier game of prediction is mixable and find the optimal learning rate and substitution function for it. The resulting prediction algorithm is applied to predict results of football and tennis matches. The theoretical performance guarantee turns out to be rather tight on these data sets, especially in the case of the more extensive tennis data.
Submitted 27 June, 2008; v1 submitted 2 October, 2007; originally announced October 2007.
Comments: 34 pages, 22 figures, 2 tables. The conference version (8 pages) is published in the ICML 2008 Proceedings
Journal ref: Journal of Machine Learning Research 10 (2009), 2413 - 2440
-
arXiv:0708.2353 [pdf, ps, other]
Continuous and randomized defensive forecasting: unified view
Abstract: Defensive forecasting is a method of transforming laws of probability (stated in game-theoretic terms as strategies for Sceptic) into forecasting algorithms. There are two known varieties of defensive forecasting: "continuous", in which Sceptic's moves are assumed to depend on the forecasts in a (semi)continuous manner and which produces deterministic forecasts, and "randomized", in which the de… ▽ More
Submitted 23 August, 2007; v1 submitted 17 August, 2007; originally announced August 2007.
Comments: 10 pages. The new version: (1) relaxes the assumption that the outcome space is finite, and now it is only assumed to be compact; (2) shows that in the case where the outcome space is finite of cardinality C, the randomized forecasts can be chosen concentrated on a finite set of cardinality at most C
-
arXiv:0708.1503 [pdf, ps, other]
Defensive forecasting for optimal prediction with expert advice
Abstract: The method of defensive forecasting is applied to the problem of prediction with expert advice for binary outcomes. It turns out that defensive forecasting is not only competitive with the Aggregating Algorithm but also handles the case of "second-guessing" experts, whose advice depends on the learner's prediction; this paper assumes that the dependence on the learner's prediction is continuous.
Submitted 10 August, 2007; originally announced August 2007.
Comments: 14 pages
-
arXiv:0706.3188 [pdf, ps, other]
A tutorial on conformal prediction
Abstract: Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability $ε$, together with a method that makes a prediction $\hat{y}$ of a label $y$, it produces a set of labels, typically containing $\hat{y}$, that also contains $y$ with probability $1-ε$. Conformal prediction can be applied to any method for producing $\hat{y}$: a near… ▽ More
Submitted 21 June, 2007; originally announced June 2007.
Comments: 58 pages, 9 figures
Journal ref: Journal of Machine Learning Research 9 (2008) 371-421. http://www.jmlr.org/papers/v9/shafer08a.html
-
arXiv:cs/0611011 [pdf, ps, other]
Hedging predictions in machine learning
Abstract: Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This paper describes a new technique for "hedging" the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for… ▽ More
Submitted 2 November, 2006; originally announced November 2006.
Comments: 24 pages; 9 figures; 2 tables; a version of this paper (with discussion and rejoinder) is to appear in "The Computer Journal"
Report number: On-line Compression Modelling Project (New Series), Working Paper 02
Journal ref: Computer Journal, 50:151-177, 2007
-
arXiv:cs/0609045 [pdf, ps, other]
Metric entropy in competitive on-line prediction
Abstract: Competitive on-line prediction (also known as universal prediction of individual sequences) is a strand of learning theory avoiding making any stochastic assumptions about the way the observations are generated. The predictor's goal is to compete with a benchmark class of prediction rules, which is often a proper Banach function space. Metric entropy provides a unifying framework for competitive… ▽ More
Submitted 9 September, 2006; originally announced September 2006.
Comments: 41 pages
-
arXiv:cs/0607136 [pdf, ps, other]
Competing with Markov prediction strategies
Abstract: Assuming that the loss function is convex in the prediction, we construct a prediction strategy universal for the class of Markov prediction strategies, not necessarily continuous. Allowing randomization, we remove the requirement of convexity.
Submitted 28 July, 2006; originally announced July 2006.
Comments: 11 pages
-
arXiv:cs/0607134 [pdf, ps, other]
Leading strategies in competitive on-line prediction
Abstract: We start from a simple asymptotic result for the problem of on-line regression with the quadratic loss function: the class of continuous limited-memory prediction strategies admits a "leading prediction strategy", which not only asymptotically performs at least as well as any continuous limited-memory strategy but also satisfies the property that the excess loss of any continuous limited-memory… ▽ More
Submitted 27 July, 2006; originally announced July 2006.
Comments: 20 pages; a conference version is to appear in the ALT'2006 proceedings
-
arXiv:cs/0607067 [pdf, ps, other]
Competing with stationary prediction strategies
Abstract: In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered st… ▽ More
Submitted 13 July, 2006; originally announced July 2006.
Comments: 20 pages
-
arXiv:cs/0606093 [pdf, ps, other]
Predictions as statements and decisions
Abstract: Prediction is a complex notion, and different predictors (such as people, computer programs, and probabilistic theories) can pursue very different goals. In this paper I will review some popular kinds of prediction and argue that the theory of competitive on-line learning can benefit from the kinds of prediction that are now foreign to it.
Submitted 22 June, 2006; originally announced June 2006.
Comments: 48 pages
-
arXiv:cs/0512059 [pdf, ps, other]
Competing with wild prediction rules
Abstract: We consider the problem of on-line prediction competitive with a benchmark class of continuous but highly irregular prediction rules. It is known that if the benchmark class is a reproducing kernel Hilbert space, there exists a prediction algorithm whose average loss over the first N examples does not exceed the average loss of any prediction rule in the class plus a "regret term" of O(N^(-1/2))… ▽ More
Submitted 25 January, 2006; v1 submitted 14 December, 2005; originally announced December 2005.
Comments: 28 pages, 3 figures
ACM Class: I.2.6
-
arXiv:cs/0511058 [pdf, ps, other]
On-line regression competitive with reproducing kernel Hilbert spaces
Abstract: We consider the problem of on-line prediction of real-valued labels, assumed bounded in absolute value by a known constant, of new objects from known labeled objects. The prediction algorithm's performance is measured by the squared deviation of the predictions from the actual labels. No stochastic assumptions are made about the way the labels and objects are generated. Instead, we are given a b… ▽ More
Submitted 24 January, 2006; v1 submitted 15 November, 2005; originally announced November 2005.
Comments: 37 pages, 1 figure
-
arXiv:cs/0506041 [pdf, ps, other]
Competitive on-line learning with a convex loss function
Abstract: We consider the problem of sequential decision making under uncertainty in which the loss caused by a decision depends on the following binary observation. In competitive on-line learning, the goal is to design decision algorithms that are almost as good as the best decision rules in a wide benchmark class, without making any assumptions about the way the observations are generated. However, sta… ▽ More
Submitted 2 September, 2005; v1 submitted 11 June, 2005; originally announced June 2005.
Comments: 26 pages
ACM Class: I.2.6; I.5.1
-
arXiv:cs/0506007 [pdf, ps, other]
Defensive forecasting for linear protocols
Abstract: We consider a general class of forecasting protocols, called "linear protocols", and discuss several important special cases, including multi-class forecasting. Forecasting is formalized as a game between three players: Reality, whose role is to generate observations; Forecaster, whose goal is to predict the observations; and Skeptic, who tries to make money on any lack of agreement between Fore… ▽ More
Submitted 24 September, 2005; v1 submitted 2 June, 2005; originally announced June 2005.
Comments: 16 pages
ACM Class: I.2.6; I.5.1
-
arXiv:cs/0506004 [pdf, ps, other]
Non-asymptotic calibration and resolution
Abstract: We analyze a new algorithm for probability forecasting of binary observations on the basis of the available data, without making any assumptions about the way the observations are generated. The algorithm is shown to be well calibrated and to have good resolution for long enough sequences of observations and for a suitable choice of its parameter, a kernel on the Cartesian product of the forecas… ▽ More
Submitted 1 July, 2006; v1 submitted 1 June, 2005; originally announced June 2005.
Comments: 20 pages
ACM Class: I.2.6; I.5.1