Search | arXiv e-print repository

Sequential Predictive Two-Sample and Independence Testing

Authors: Aleksandr Podkopaev, Aaditya Ramdas

Abstract: We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data, while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and th… ▽ More We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data, while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and their wealth measures evidence against the null hypothesis. While recently developed kernel-based betting strategies often work well on simple distributions, selecting a suitable kernel for high-dimensional or structured data, such as images, is often nontrivial. To address this drawback, we design prediction-based betting strategies that rely on the following fact: if a sequentially updated predictor starts to consistently determine (a) which distribution an instance is drawn from, or (b) whether an instance is drawn from the joint distribution or the product of the marginal distributions (the latter produced by external randomization), it provides evidence against the two-sample or independence nulls respectively. We empirically demonstrate the superiority of our tests over kernel-based approaches under structured settings. Our tests can be applied beyond the case of independent and identically distributed data, remaining valid and powerful even when the data distribution drifts over time. △ Less

Submitted 19 July, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

arXiv:2212.07383 [pdf, other]

Sequential Kernelized Independence Testing

Authors: Aleksandr Podkopaev, Patrick Blöbaum, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Abstract: Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), he… ▽ More Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data. △ Less

Submitted 19 July, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: To appear at ICML 2023

arXiv:2110.06177 [pdf, other]

Tracking the risk of a deployed model and detecting harmful distribution shifts

Authors: Aleksandr Podkopaev, Aaditya Ramdas

Abstract: When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model… ▽ More When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate. In this work, we design simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant increase in a risk function of interest, like accuracy or calibration. Recent advances in constructing time-uniform confidence sequences allow efficient aggregation of statistical evidence accumulated during the tracking process. The designed framework is applicable in settings where (some) true labels are revealed after the prediction is performed, or when batches of labels become available in a delayed fashion. We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets. △ Less

Submitted 5 May, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper at ICLR 2022

arXiv:2103.03323 [pdf, other]

Distribution-free uncertainty quantification for classification under label shift

Authors: Aleksandr Podkopaev, Aaditya Ramdas

Abstract: Trustworthy deployment of ML models requires a proper measure of uncertainty, especially in safety-critical applications. We focus on uncertainty quantification (UQ) for classification problems via two avenues -- prediction sets using conformal prediction and calibration of probabilistic predictors by post-hoc binning -- since these possess distribution-free guarantees for i.i.d. data. Two common… ▽ More Trustworthy deployment of ML models requires a proper measure of uncertainty, especially in safety-critical applications. We focus on uncertainty quantification (UQ) for classification problems via two avenues -- prediction sets using conformal prediction and calibration of probabilistic predictors by post-hoc binning -- since these possess distribution-free guarantees for i.i.d. data. Two common ways of generalizing beyond the i.i.d. setting include handling covariate and label shift. Within the context of distribution-free UQ, the former has already received attention, but not the latter. It is known that label shift hurts prediction, and we first argue that it also hurts UQ, by showing degradation in coverage and calibration. Piggybacking on recent progress in addressing label shift (for better prediction), we examine the right way to achieve UQ by reweighting the aforementioned conformal and calibration procedures whenever some unlabeled data from the target distribution is available. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance. △ Less

Submitted 7 July, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

arXiv:2012.01067 [pdf, ps, other]

Making Weak Memory Models Fair

Authors: Ori Lahav, Egor Namakonov, Jonas Oberhauser, Anton Podkopaev, Viktor Vafeiadis

Abstract: Liveness properties, such as termination, of even the simplest shared-memory concurrent programs under sequential consistency typically require some fairness assumptions about the scheduler. Under weak memory models, we observe that the standard notions of thread fairness are insufficient, and an additional fairness property, which we call memory fairness, is needed. In this paper, we propose a un… ▽ More Liveness properties, such as termination, of even the simplest shared-memory concurrent programs under sequential consistency typically require some fairness assumptions about the scheduler. Under weak memory models, we observe that the standard notions of thread fairness are insufficient, and an additional fairness property, which we call memory fairness, is needed. In this paper, we propose a uniform definition for memory fairness that can be integrated into any declarative memory model enforcing acyclicity of the union of the program order and the reads-from relation. For the well-known models, SC, x86-TSO, RA, and StrongCOH, that have equivalent operational and declarative presentations, we show that our declarative memory fairness condition is equivalent to an intuitive model-specific operational notion of memory fairness, which requires the memory system to fairly execute its internal propagation steps. Our fairness condition preserves the correctness of local transformations and the compilation scheme from RC11 to x86-TSO, and also enables the first formal proofs of termination of mutual exclusion lock implementations under declarative weak memory models. △ Less

Submitted 9 September, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: 43 pages, 2 figures

ACM Class: D.3.1; F.3.2

arXiv:2006.10564 [pdf, other]

Distribution-free binary classification: prediction sets, confidence intervals and calibration

Authors: Chirag Gupta, Aleksandr Podkopaev, Aaditya Ramdas

Abstract: We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data. With a focus towards calibration, we establish a 'tripod' of theorems that connect these three notions for score-based classifiers. A direct implication is that… ▽ More We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data. With a focus towards calibration, we establish a 'tripod' of theorems that connect these three notions for score-based classifiers. A direct implication is that distribution-free calibration is only possible, even asymptotically, using a scoring function whose level sets partition the feature space into at most countably many sets. Parametric calibration schemes such as variants of Platt scaling do not satisfy this requirement, while nonparametric schemes based on binning do. To close the loop, we derive distribution-free confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration. We also derive extensions to settings with streaming data and covariate shift. △ Less

Submitted 16 February, 2022; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: 34 pages; significant updates from previous version (unambiguous notation, better exposition, and cleaner results); originally appeared as a spotlight at Neural Information Processing Systems (NeurIPS) '20

arXiv:2005.10554 [pdf, other]

doi 10.1145/3385412.3385973

Repairing and Mechanising the JavaScript Relaxed Memory Model

Authors: Conrad Watt, Christopher Pulte, Anton Podkopaev, Guillaume Barbier, Stephen Dolan, Shaked Flur, Jean Pichon-Pharabod, Shu-yu Guo

Abstract: Modern JavaScript includes the SharedArrayBuffer feature, which provides access to true shared memory concurrency. SharedArrayBuffers are simple linear buffers of bytes, and the JavaScript specification defines an axiomatic relaxed memory model to describe their behaviour. While this model is heavily based on the C/C++11 model, it diverges in some key areas. JavaScript chooses to give a well-defin… ▽ More Modern JavaScript includes the SharedArrayBuffer feature, which provides access to true shared memory concurrency. SharedArrayBuffers are simple linear buffers of bytes, and the JavaScript specification defines an axiomatic relaxed memory model to describe their behaviour. While this model is heavily based on the C/C++11 model, it diverges in some key areas. JavaScript chooses to give a well-defined semantics to data-races, unlike the "undefined behaviour" of C/C++11. Moreover, the JavaScript model is mixed-size. This means that its accesses are not to discrete locations, but to (possibly overlap**) ranges of bytes. We show that the model, in violation of the design intention, does not support a compilation scheme to ARMv8 which is used in practice. We propose a correction, which also incorporates a previously proposed fix for a failure of the model to provide Sequential Consistency of Data-Race-Free programs (SC-DRF), an important correctness condition. We use model checking, in Alloy, to generate small counter-examples for these deficiencies, and investigate our correction. To accomplish this, we also develop a mixed-size extension to the existing ARMv8 axiomatic model. Guided by our Alloy experimentation, we mechanise (in Coq) the JavaScript model (corrected and uncorrected), our ARMv8 model, and, for the corrected JavaScript model, a "model-internal" SC-DRF proof and a compilation scheme correctness proof to ARMv8. In addition, we investigate a non-mixed-size subset of the corrected JavaScript model, and give proofs of compilation correctness for this subset to x86-TSO, Power, RISC-V, ARMv7, and (again) ARMv8, via the Intermediate Memory Model (IMM). As a result of our work, the JavaScript standards body (ECMA TC39) will include fixes for both issues in an upcoming edition of the specification. △ Less

Submitted 21 May, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: 16 pages, 13 figiures

arXiv:1911.06567 [pdf, other]

Reconciling Event Structures with Modern Multiprocessors

Authors: Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, Viktor Vafeiadis

Abstract: Weakestmo is a recently proposed memory consistency model that uses event structures to resolve the infamous "out-of-thin-air" problem. Although it has been shown to have important benefits over other memory models, its established compilation schemes are suboptimal in that they add more fences than necessary. In this paper, we prove the correctness in Coq of the intended compilation schemes for W… ▽ More Weakestmo is a recently proposed memory consistency model that uses event structures to resolve the infamous "out-of-thin-air" problem. Although it has been shown to have important benefits over other memory models, its established compilation schemes are suboptimal in that they add more fences than necessary. In this paper, we prove the correctness in Coq of the intended compilation schemes for Weakestmo to a range of hardware memory models (x86, POWER, ARMv7, ARMv8, RISC-V). Our proof is the first that establishes correctness of compilation of an event-structure-based model that forbids "thin-air" behaviors, as well as the first mechanized compilation proof of a weak memory model supporting sequentially consistent accesses to such a range of hardware platforms. Our compilation proof goes via the recent Intermediate Memory Model (IMM), which we suitably extend with sequentially consistent accesses. △ Less

Submitted 28 May, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

arXiv:1807.07892 [pdf, ps, other]

doi 10.1145/3290382

Bridging the Gap between Programming Languages and Hardware Weak Memory Models

Authors: Anton Podkopaev, Ori Lahav, Viktor Vafeiadis

Abstract: We develop a new intermediate weak memory model, IMM, as a way of modularizing the proofs of correctness of compilation from concurrent programming languages with weak memory consistency semantics to mainstream multi-core architectures, such as POWER and ARM. We use IMM to prove the correctness of compilation from the promising semantics of Kang et al. to POWER (thereby correcting and improving th… ▽ More We develop a new intermediate weak memory model, IMM, as a way of modularizing the proofs of correctness of compilation from concurrent programming languages with weak memory consistency semantics to mainstream multi-core architectures, such as POWER and ARM. We use IMM to prove the correctness of compilation from the promising semantics of Kang et al. to POWER (thereby correcting and improving their result) and ARMv7, as well as to the recently revised ARMv8 model. Our results are mechanized in Coq, and to the best of our knowledge, these are the first machine-verified compilation correctness results for models that are weaker than x86-TSO. △ Less

Submitted 9 November, 2018; v1 submitted 20 July, 2018; originally announced July 2018.

arXiv:1606.01400 [pdf, other]

Operational Aspects of C/C++ Concurrency

Authors: Anton Podkopaev, Ilya Sergey, Aleksandar Nanevski

Abstract: In this work, we present a family of operational semantics that gradually approximates the realistic program behaviors in the C/C++11 memory model. Each semantics in our framework is built by elaborating and combining two simple ingredients: viewfronts and operation buffers. Viewfronts allow us to express the spatial aspect of thread interaction, i.e., which values a thread can read, while operati… ▽ More In this work, we present a family of operational semantics that gradually approximates the realistic program behaviors in the C/C++11 memory model. Each semantics in our framework is built by elaborating and combining two simple ingredients: viewfronts and operation buffers. Viewfronts allow us to express the spatial aspect of thread interaction, i.e., which values a thread can read, while operation buffers enable manipulation with the temporal execution aspect, i.e., determining the order in which the results of certain operations can be observed by concurrently running threads. Starting from a simple abstract state machine, through a series of gradual refinements of the abstract state, we capture such language aspects and synchronization primitives as release/acquire atomics, sequentially-consistent and non-atomic memory accesses, also providing a semantics for relaxed atomics, while avoiding the Out-of-Thin-Air problem. To the best of our knowledge, this is the first formal and executable operational semantics of C11 capable of expressing all essential concurrent aspects of the standard. We illustrate our approach via a number of characteristic examples, relating the observed behaviors to those of standard litmus test programs from the literature. We provide an executable implementation of the semantics in PLT Redex, along with a number of implemented litmus tests and examples, and showcase our prototype on a large case study: randomized testing and debugging of a realistic Read-Copy-Update data structure. △ Less

Submitted 9 July, 2016; v1 submitted 4 June, 2016; originally announced June 2016.

Showing 1–10 of 10 results for author: Podkopaev, A