-
Extreme change-point detection
Authors:
Kevin Bleakley
Abstract:
We examine rules for predicting whether a point in $\mathbb{R}$ generated from a 50-50 mixture of two different probability distributions came from one distribution or the other, given limited (or no) information on the two distributions, and, as clues, one point generated randomly from each of the two distributions. We prove that nearest-neighbor prediction does better than chance when we know th…
▽ More
We examine rules for predicting whether a point in $\mathbb{R}$ generated from a 50-50 mixture of two different probability distributions came from one distribution or the other, given limited (or no) information on the two distributions, and, as clues, one point generated randomly from each of the two distributions. We prove that nearest-neighbor prediction does better than chance when we know the two distributions are Gaussian densities without knowing their parameter values. We conjecture that this result holds for general probability distributions and, furthermore, that the nearest-neighbor rule is optimal in this setting, i.e., no other rule can do better than it if we do not know the distributions or do not know their parameters, or both.
△ Less
Submitted 8 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Increased adaptive immune responses and proper feedback regulation protect against clinical dengue
Authors:
Etienne Simon-Loriere,
Veasna Duong,
Ahmed Tawfik,
Sivlin Ung,
Sowath Ly,
Isabelle Casademont,
Matthieu Prot,
Noémie Courtejoie,
Kevin Bleakley,
Philippe Buchy,
Arnaud Tarantola,
Philippe Dussart,
Tineke Cantaert,
Anavaj Sakuntabhai
Abstract:
Dengue is the most prevalent arthropod-borne viral disease. Clinical symptoms of dengue virus (DENV) infection range from classical mild dengue fever to severe, life-threatening dengue shock syndrome. However, most DENV infections cause few or no symptoms. Asymptomatic DENV-infected patients provide a unique opportunity to decipher the host immune responses leading to virus elimination without neg…
▽ More
Dengue is the most prevalent arthropod-borne viral disease. Clinical symptoms of dengue virus (DENV) infection range from classical mild dengue fever to severe, life-threatening dengue shock syndrome. However, most DENV infections cause few or no symptoms. Asymptomatic DENV-infected patients provide a unique opportunity to decipher the host immune responses leading to virus elimination without negative impact on t v 'health. We used an integrated approach of transcriptional profiling and immunological analysis comparing a Cambodian population of strictly asymptomatic viremic individuals with clinical dengue patients. Whereas inflammatory pathways and innate immune responses were similar between asymptomatic individuals and clinical dengue patients, expression of proteins related to antigen presentation and subsequent T and B cell activation pathways were differentially regulated, independent of viral load or previous DENV infection. Feedback mechanisms controlled the immune response in asymptomatic viremic individuals as demonstrated by increased activation of T cell apoptosis-related pathways and Fc$γ$RIIB signaling associated with decreased anti-DENV specific antibody concentrations. Taken together, our data illustrate that symptom-free DENV infection in children is determined by increased activation of the adaptive immune compartment and proper control mechanisms leading to elimination of viral infection without excessive immune activation, having implications for novel vaccine development strategies.
△ Less
Submitted 11 December, 2017;
originally announced December 2017.
-
The Statistical Performance of Collaborative Inference
Authors:
Gérard Biau,
Kevin Bleakley,
Benoit Cadre
Abstract:
The statistical analysis of massive and complex data sets will require the development of algorithms that depend on distributed computing and collaborative inference. Inspired by this, we propose a collaborative framework that aims to estimate the unknown mean $θ$ of a random variable $X$. In the model we present, a certain number of calculation units, distributed across a communication network re…
▽ More
The statistical analysis of massive and complex data sets will require the development of algorithms that depend on distributed computing and collaborative inference. Inspired by this, we propose a collaborative framework that aims to estimate the unknown mean $θ$ of a random variable $X$. In the model we present, a certain number of calculation units, distributed across a communication network represented by a graph, participate in the estimation of $θ$ by sequentially receiving independent data from $X$ while exchanging messages via a stochastic matrix $A$ defined over the graph. We give precise conditions on the matrix $A$ under which the statistical precision of the individual units is comparable to that of a (gold standard) virtual centralized estimate, even though each unit does not have access to all of the data. We show in particular the fundamental role played by both the non-trivial eigenvalues of $A$ and the Ramanujan class of expander graphs, which provide remarkable performance for moderate algorithmic cost.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.
-
Long signal change-point detection
Authors:
Gérard Biau,
Kevin Bleakley,
David Mason
Abstract:
The detection of change-points in a spatially or time ordered data sequence is an important problem in many fields such as genetics and finance. We derive the asymptotic distribution of a statistic recently suggested for detecting change-points. Simulation of its estimated limit distribution leads to a new and computationally efficient change-point detection algorithm, which can be used on very lo…
▽ More
The detection of change-points in a spatially or time ordered data sequence is an important problem in many fields such as genetics and finance. We derive the asymptotic distribution of a statistic recently suggested for detecting change-points. Simulation of its estimated limit distribution leads to a new and computationally efficient change-point detection algorithm, which can be used on very long signals. We assess the algorithm via simulations and on previously benchmarked real-world data sets.
△ Less
Submitted 30 September, 2015; v1 submitted 7 April, 2015;
originally announced April 2015.
-
The group fused Lasso for multiple change-point detection
Authors:
Kevin Bleakley,
Jean-Philippe Vert
Abstract:
We present the group fused Lasso for detection of multiple change-points shared by a set of co-occurring one-dimensional signals. Change-points are detected by approximating the original signals with a constraint on the multidimensional total variation, leading to piecewise-constant approximations. Fast algorithms are proposed to solve the resulting optimization problems, either exactly or approxi…
▽ More
We present the group fused Lasso for detection of multiple change-points shared by a set of co-occurring one-dimensional signals. Change-points are detected by approximating the original signals with a constraint on the multidimensional total variation, leading to piecewise-constant approximations. Fast algorithms are proposed to solve the resulting optimization problems, either exactly or approximately. Conditions are given for consistency of both algorithms as the number of signals increases, and empirical evidence is provided to support the results on simulated and array comparative genomic hybridization data.
△ Less
Submitted 21 June, 2011;
originally announced June 2011.
-
Joint segmentation of many aCGH profiles using fast group LARS
Authors:
Kevin Bleakley,
Jean-Philippe Vert
Abstract:
Array-Based Comparative Genomic Hybridization (aCGH) is a method used to search for genomic regions with copy numbers variations. For a given aCGH profile, one challenge is to accurately segment it into regions of constant copy number. Subjects sharing the same disease status, for example a type of cancer, often have aCGH profiles with similar copy number variations, due to duplications and dele…
▽ More
Array-Based Comparative Genomic Hybridization (aCGH) is a method used to search for genomic regions with copy numbers variations. For a given aCGH profile, one challenge is to accurately segment it into regions of constant copy number. Subjects sharing the same disease status, for example a type of cancer, often have aCGH profiles with similar copy number variations, due to duplications and deletions relevant to that particular disease. We introduce a constrained optimization algorithm that jointly segments aCGH profiles of many subjects. It simultaneously penalizes the amount of freedom the set of profiles have to jump from one level of constant copy number to another, at genomic locations known as breakpoints. We show that breakpoints shared by many different profiles tend to be found first by the algorithm, even in the presence of significant amounts of noise. The algorithm can be formulated as a group LARS problem. We propose an extremely fast way to find the solution path, i.e., a sequence of shared breakpoints in order of importance. For no extra cost the algorithm smoothes all of the aCGH profiles into piecewise-constant regions of equal copy number, giving low-dimensional versions of the original data. These can be shown for all profiles on a single graph, allowing for intuitive visual interpretation. Simulations and an implementation of the algorithm on bladder cancer aCGH profiles are provided.
△ Less
Submitted 7 October, 2009;
originally announced October 2009.
-
Nonparametric sequential prediction of time series
Authors:
Gérard Biau,
Kevin Bleakley,
László Györfi,
György Ottucsák
Abstract:
Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these…
▽ More
Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error.
△ Less
Submitted 1 January, 2008;
originally announced January 2008.