Search | arXiv e-print repository

arXiv:2406.19723 [pdf, other]

LIPO+: Frugal Global Optimization for Lipschitz Functions

Authors: Gaëtan Serré, Perceval Beja-Battais, Sophia Chirrane, Argyris Kalogeratos, Nicolas Vayatis

Abstract: In this paper, we propose simple yet effective empirical improvements to the algorithms of the LIPO family, introduced in [Malherbe2017], that we call LIPO+ and AdaLIPO+. We compare our methods to the vanilla versions of the algorithms over standard benchmark functions and show that they converge significantly faster. Finally, we show that the LIPO family is very prone to the curse of dimensionali… ▽ More In this paper, we propose simple yet effective empirical improvements to the algorithms of the LIPO family, introduced in [Malherbe2017], that we call LIPO+ and AdaLIPO+. We compare our methods to the vanilla versions of the algorithms over standard benchmark functions and show that they converge significantly faster. Finally, we show that the LIPO family is very prone to the curse of dimensionality and tends quickly to Pure Random Search when the dimension increases. We give a proof for this, which is also formalized in Lean mathematical language. Source codes and a demo are provided online. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2402.05715 [pdf, other]

Collaborative non-parametric two-sample testing

Authors: Alejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos

Abstract: This paper addresses the multiple two-sample test problem in a graph-structured setting, which is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions (pdfs), $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be r… ▽ More This paper addresses the multiple two-sample test problem in a graph-structured setting, which is a common scenario in fields such as Spatial Statistics and Neuroscience. Each node $v$ in fixed graph deals with a two-sample testing problem between two node-specific probability density functions (pdfs), $p_v$ and $q_v$. The goal is to identify nodes where the null hypothesis $p_v = q_v$ should be rejected, under the assumption that connected nodes would yield similar test outcomes. We propose the non-parametric collaborative two-sample testing (CTST) framework that efficiently leverages the graph structure and minimizes the assumptions over $p_v$ and $q_v$. Our methodology integrates elements from f-divergence estimation, Kernel Methods, and Multitask Learning. We use synthetic experiments and a real sensor network detecting seismic activity to demonstrate that CTST outperforms state-of-the-art non-parametric statistical tests that apply at each node independently, hence disregard the geometry of the problem. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04689 [pdf, other]

Stein Boltzmann Sampling: A Variational Approach for Global Optimization

Authors: Gaëtan Serré, Argyris Kalogeratos, Nicolas Vayatis

Abstract: In this paper, we present a flow-based method for global optimization of continuous Sobolev functions, called Stein Boltzmann Sampling (SBS). SBS initializes uniformly a number of particles representing candidate solutions, then uses the Stein Variational Gradient Descent (SVGD) algorithm to sequentially and deterministically move those particles in order to approximate a target distribution whose… ▽ More In this paper, we present a flow-based method for global optimization of continuous Sobolev functions, called Stein Boltzmann Sampling (SBS). SBS initializes uniformly a number of particles representing candidate solutions, then uses the Stein Variational Gradient Descent (SVGD) algorithm to sequentially and deterministically move those particles in order to approximate a target distribution whose mass is concentrated around promising areas of the domain of the optimized function. The target is chosen to be a properly parametrized Boltzmann distribution. For the purpose of global optimization, we adapt the generic SVGD theoretical framework allowing to address more general target distributions over a compact subset of $\mathbb{R}^d$, and we prove SBS's asymptotic convergence. In addition to the main SBS algorithm, we present two variants: the SBS-PF that includes a particle filtering strategy, and the SBS-HYBRID one that uses SBS or SBS-PF as a continuation after other particle- or distribution-based optimization methods. A detailed comparison with state-of-the-art methods on benchmark functions demonstrates that SBS and its variants are highly competitive, while the combination of the two variants provides the best trade-off between accuracy and computational cost. △ Less

Submitted 3 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2311.01900 [pdf, other]

Online non-parametric likelihood-ratio estimation by Pearson-divergence functional minimization

Authors: Alejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos

Abstract: Quantifying the difference between two probability density functions, $p$ and $q$, using available data, is a fundamental problem in Statistics and Machine Learning. A usual approach for addressing this problem is the likelihood-ratio estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has been investigated mainly for the offline case. This paper contributes by introducing a ne… ▽ More Quantifying the difference between two probability density functions, $p$ and $q$, using available data, is a fundamental problem in Statistics and Machine Learning. A usual approach for addressing this problem is the likelihood-ratio estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has been investigated mainly for the offline case. This paper contributes by introducing a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are observed over time. The non-parametric nature of our approach has the advantage of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the recent advances in Kernel Methods and functional minimization to develop an estimator that can be efficiently updated online. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2309.16274 [pdf, other]

A framework for paired-sample hypothesis testing for high-dimensional data

Authors: Ioannis Bargiotas, Argyris Kalogeratos, Nicolas Vayatis

Abstract: The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical rec… ▽ More The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 35th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). 6 pages, 3 figures

MSC Class: 62; 68 ACM Class: G.3; I.5; J.3

arXiv:2309.15704 [pdf, other]

Maximum Weight Entropy

Authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we c… ▽ More This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we claim that standard methods sample in "over-restricted" regions of the weight space due to the use of "over-regularization" processes, such as weight decay and zero-mean centered Gaussian priors. We propose to solve the problem by adopting the maximum entropy principle for the weight distribution, with the underlying idea to maximize the weight diversity. Under this paradigm, the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. Considering stochastic neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy. We develop a novel weight parameterization for the stochastic model, based on the singular value decomposition of the neural network's hidden representations, which enables a large increase of the weight entropy for a small empirical risk penalization. We provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 60 pages, 9 figures, 6 tables

arXiv:2304.04042 [pdf, other]

Deep Anti-Regularized Ensembles provide reliable out-of-distribution uncertainty quantification

Authors: Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: We consider the problem of uncertainty quantification in high dimensional regression and classification for which deep ensemble have proven to be promising methods. Recent observations have shown that deep ensemble often return overconfident estimates outside the training domain, which is a major limitation because shifted distributions are often encountered in real-life scenarios. The principal c… ▽ More We consider the problem of uncertainty quantification in high dimensional regression and classification for which deep ensemble have proven to be promising methods. Recent observations have shown that deep ensemble often return overconfident estimates outside the training domain, which is a major limitation because shifted distributions are often encountered in real-life scenarios. The principal challenge for this problem is to solve the trade-off between increasing the diversity of the ensemble outputs and making accurate in-distribution predictions. In this work, we show that an ensemble of networks with large weights fitting the training data are likely to meet these two objectives. We derive a simple and practical approach to produce such ensembles, based on an original anti-regularization term penalizing small weights and a control process of the weight increase which maintains the in-distribution loss under an acceptable threshold. The developed approach does not require any out-of-distribution training data neither any trade-off hyper-parameter calibration. We derive a theoretical framework for this approach and show that the proposed optimization can be seen as a "water-filling" problem. Several experiments in both regression and classification settings highlight that Deep Anti-Regularized Ensembles (DARE) significantly improve uncertainty quantification outside the training domain in comparison to recent deep ensembles and out-of-distribution detection methods. All the conducted experiments are reproducible and the source code is available at \url{https://github.com/antoinedemathelin/DARE}. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: 26 pages, 9 figures

arXiv:2302.03592 [pdf, other]

A Bipartite Ranking Approach to the Two-Sample Problem

Authors: Stephan Clémençon, Myrto Limnios, Nicolas Vayatis

Abstract: The two-sample problem, which consists in testing whether independent samples on $\mathbb{R}^d$ are drawn from the same (unknown) distribution, finds applications in many areas. Its study in high-dimension is the subject of much attention, especially because the information acquisition processes at work in the Big Data era often involve various sources, poorly controlled, leading to datasets possi… ▽ More The two-sample problem, which consists in testing whether independent samples on $\mathbb{R}^d$ are drawn from the same (unknown) distribution, finds applications in many areas. Its study in high-dimension is the subject of much attention, especially because the information acquisition processes at work in the Big Data era often involve various sources, poorly controlled, leading to datasets possibly exhibiting a strong sampling bias. While classic methods relying on the computation of a discrepancy measure between the empirical distributions face the curse of dimensionality, we develop an alternative approach based on statistical learning and extending rank tests, capable of detecting small departures from the null assumption in the univariate case when appropriately designed. Overcoming the lack of natural order on $\mathbb{R}^d$ when $d\geq 2$, it is implemented in two steps. Assigning to each of the samples a label (positive vs. negative) and dividing them into two parts, a preorder on $\mathbb{R}^d$ defined by a real-valued scoring function is learned by means of a bipartite ranking algorithm applied to the first part and a rank test is applied next to the scores of the remaining observations to detect possible differences in distribution. Because it learns how to project the data onto the real line nearly like (any monotone transform of) the likelihood ratio between the original multivariate distributions would do, the approach is not much affected by the dimensionality, ignoring ranking model bias issues, and preserves the advantages of univariate rank tests. Nonasymptotic error bounds are proved based on recent concentration results for two-sample linear rank-processes and an experimental study shows that the approach promoted surpasses alternative methods standing as natural competitors. △ Less

Submitted 8 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

arXiv:2301.03011 [pdf, other]

Online Centralized Non-parametric Change-point Detection via Graph-based Likelihood-ratio Estimation

Authors: Alejandro de la Concha, Argyris Kalogeratos, Nicolas Vayatis

Abstract: Consider each node of a graph to be generating a data stream that is synchronized and observed at near real-time. At a change-point $τ$, a change occurs at a subset of nodes $C$, which affects the probability distribution of their associated node streams. In this paper, we propose a novel kernel-based method to both detect $τ$ and localize $C$, based on the direct estimation of the likelihood-rati… ▽ More Consider each node of a graph to be generating a data stream that is synchronized and observed at near real-time. At a change-point $τ$, a change occurs at a subset of nodes $C$, which affects the probability distribution of their associated node streams. In this paper, we propose a novel kernel-based method to both detect $τ$ and localize $C$, based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distributions of the node streams. Our main working hypothesis is the smoothness of the likelihood-ratio estimates over the graph, i.e connected nodes are expected to have similar likelihood-ratios. The quality of the proposed method is demonstrated on extensive experiments on synthetic scenarios. △ Less

Submitted 12 January, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

arXiv:2209.04215 [pdf, other]

Fast and Accurate Importance Weighting for Correcting Sample Bias

Authors: Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks o… ▽ More Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods. The proposed approach appears to be the only one able to give relevant reweighting in a reasonable time for large dataset with up to two million data. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 16 pages, 3 figures

arXiv:2205.14461 [pdf, other]

Collaborative likelihood-ratio estimation over graphs

Authors: Alejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos

Abstract: Assuming we have iid observations from two unknown probability density functions (pdfs), $p$ and $q$, the likelihood-ratio estimation (LRE) is an elegant approach to compare the two pdfs only by relying on the available data. In this paper, we introduce the first -to the best of our knowledge-graph-based extension of this problem, which reads as follows: Suppose each node $v$ of a fixed graph has… ▽ More Assuming we have iid observations from two unknown probability density functions (pdfs), $p$ and $q$, the likelihood-ratio estimation (LRE) is an elegant approach to compare the two pdfs only by relying on the available data. In this paper, we introduce the first -to the best of our knowledge-graph-based extension of this problem, which reads as follows: Suppose each node $v$ of a fixed graph has access to observations coming from two unknown node-specific pdfs, $p_v$ and $q_v$, and the goal is to estimate for each node the likelihood-ratio between both pdfs by also taking into account the information provided by the graph structure. The node-level estimation tasks are supposed to exhibit similarities conveyed by the graph, which suggests that the nodes could collaborate to solve them more efficiently. We develop this idea in a concrete non-parametric method that we call Graph-based Relative Unconstrained Least-squares Importance Fitting (GRULSIF). We derive convergence rates for our collaborative approach that highlights the role played by variables such as the number of available observations per node, the size of the graph, and how accurately the graph structure encodes the similarity between tasks. These theoretical results explicit the situations where collaborative estimation effectively leads to an improvement in performance compared to solving each problem independently. Finally, in a series of experiments, we illustrate how GRULSIF infers the likelihood-ratios at the nodes of the graph more accurately compared to state-of-the art LRE methods, which would operate independently at each node, and we also verify that the behavior of GRULSIF is aligned with our previous theoretical analysis. △ Less

Submitted 31 January, 2024; v1 submitted 28 May, 2022; originally announced May 2022.

arXiv:2110.10518 [pdf, other]

Online non-parametric change-point detection for heterogeneous data streams observed over graph nodes

Authors: Alejandro de la Concha, Argyris Kalogeratos, Nicolas Vayatis

Abstract: Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment $τ$, a change-point occurs for a subset of nodes $C$, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric m… ▽ More Consider a heterogeneous data stream being generated by the nodes of a graph. The data stream is in essence composed by multiple streams, possibly of different nature that depends on each node. At a given moment $τ$, a change-point occurs for a subset of nodes $C$, signifying the change in the probability distribution of their associated streams. In this paper we propose an online non-parametric method to infer $τ$ based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distribution associated with the data stream of each node. We propose a kernel-based method, under the hypothesis that connected nodes of the graph are expected to have similar likelihood-ratio estimates when there is no change-point. We demonstrate the quality of our method on synthetic experiments and real-world applications. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: 11 pages

arXiv:2109.01450 [pdf, other]

Epidemic Models for COVID-19 during the First Wave from February to May 2020: a Methodological Review

Authors: Marie Garin, Myrto Limnios, Alice Nicolaï, Ioannis Bargiotas, Olivier Boulant, Stephen Chick, Amir Dib, Theodoros Evgeniou, Mathilde Fekom, Argyris Kalogeratos, Christophe Labourdette, Anton Ovchinnikov, Raphaël Porcher, Camille Pouchol, Nicolas Vayatis

Abstract: We review epidemiological models for the propagation of the COVID-19 pandemic during the early months of the outbreak: from February to May 2020. The aim is to propose a methodological review that highlights the following characteristics: (i) the epidemic propagation models, (ii) the modeling of intervention strategies, (iii) the models and estimation procedures of the epidemic parameters and (iv)… ▽ More We review epidemiological models for the propagation of the COVID-19 pandemic during the early months of the outbreak: from February to May 2020. The aim is to propose a methodological review that highlights the following characteristics: (i) the epidemic propagation models, (ii) the modeling of intervention strategies, (iii) the models and estimation procedures of the epidemic parameters and (iv) the characteristics of the data used. We finally selected 80 articles from open access databases based on criteria such as the theoretical background, the reproducibility, the incorporation of interventions strategies, etc. It mainly resulted to phenomenological, compartmental and individual-level models. A digital companion including an online sheet, a Kibana interface and a markdown document is proposed. Finally, this work provides an opportunity to witness how the scientific community reacted to this unique situation. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2107.03049 [pdf, other]

ADAPT : Awesome Domain Adaptation Python Toolbox

Authors: Antoine de Mathelin, Mounir Atiq, Guillaume Richard, Alejandro de la Concha, Mouad Yachouti, François Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public. ADAPT is compatible with scikit-learn and TensorFlow and a full documentation is proposed online https://adapt-python.… ▽ More In this paper, we introduce the ADAPT library, an open source Python API providing the implementation of the main transfer learning and domain adaptation methods. The library is designed with a user friendly approach to facilitate the access to domain adaptation for a wide public. ADAPT is compatible with scikit-learn and TensorFlow and a full documentation is proposed online https://adapt-python.github.io/adapt/ with a substantial gallery of examples. △ Less

Submitted 1 February, 2023; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: 11 pages, 6 figures

arXiv:2104.02943 [pdf, other]

doi 10.1214/21-EJS1907

Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

Authors: Stéphan Clémençon, Myrto Limnios, Nicolas Vayatis

Abstract: The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, th… ▽ More The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be viewed as summaries of the ROC curve. In this paper, the fact that most of these empirical criteria can be expressed as two-sample linear rank statistics is highlighted and concentration inequalities for collections of such random variables, referred to as two-sample rank processes here, are proved, when indexed by VC classes of scoring functions. Based on these nonasymptotic bounds, the generalization capacity of empirical maximizers of a wide class of ranking performance criteria is next investigated from a theoretical perspective. It is also supported by empirical evidence through convincing numerical experiments. △ Less

Submitted 24 January, 2023; v1 submitted 7 April, 2021; originally announced April 2021.

Journal ref: Electronic Journal of Statistics , Shaker Heights, OH : Institute of Mathematical Statistics, 2021, 15 (2), pp.4659 -- 4717

arXiv:2103.03757 [pdf, other]

Discrepancy-Based Active Learning for Domain Adaptation

Authors: Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labelin… ▽ More The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. A practical K-medoids algorithm that can address the case of large data set is inferred from the theoretical bounds. Our numerical experiments show that the proposed algorithm is competitive against other state-of-the-art active learning techniques in the context of domain adaptation, in particular on large data sets of around one hundred thousand images. △ Less

Submitted 14 September, 2022; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: 32 pages, 15 figures

arXiv:2007.02534 [pdf, other]

Tensor Convolutional Sparse Coding with Low-Rank activations, an application to EEG analysis

Authors: Pierre Humbert, Laurent Oudre, Nivolas Vayatis, Julien Audiffren

Abstract: Recently, there has been growing interest in the analysis of spectrograms of ElectroEncephaloGram (EEG), particularly to study the neural correlates of (un)-consciousness during General Anesthesia (GA). Indeed, it has been shown that order three tensors (channels x frequencies x times) are a natural and useful representation of these signals. However this encoding entails significant difficulties,… ▽ More Recently, there has been growing interest in the analysis of spectrograms of ElectroEncephaloGram (EEG), particularly to study the neural correlates of (un)-consciousness during General Anesthesia (GA). Indeed, it has been shown that order three tensors (channels x frequencies x times) are a natural and useful representation of these signals. However this encoding entails significant difficulties, especially for convolutional sparse coding (CSC) as existing methods do not take advantage of the particularities of tensor representation, such as rank structures, and are vulnerable to the high level of noise and perturbations that are inherent to EEG during medical acts. To address this issue, in this paper we introduce a new CSC model, named Kruskal CSC (K-CSC), that uses the Kruskal decomposition of the activation tensors to leverage the intrinsic low rank nature of these representations in order to extract relevant and interpretable encodings. Our main contribution, TC-FISTA, uses multiple tools to efficiently solve the resulting optimization problem despite the increasing complexity induced by the tensor representation. We then evaluate TC-FISTA on both synthetic dataset and real EEG recorded during GA. The results show that TC-FISTA is robust to noise and perturbations, resulting in accurate, sparse and interpretable encoding of the signals. △ Less

Submitted 10 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2006.16590 [pdf, other]

Robust Kernel Density Estimation with Median-of-Means principle

Authors: Pierre Humbert, Batiste Le Bars, Ludovic Minvielle, Nicolas Vayatis

Abstract: In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). This estimator is shown to achieve robustness to any kind of anomalous data, even in the case of adversarial contamination. In particular, while previous works only prove consistency results under known contamination model, this wo… ▽ More In this paper, we introduce a robust nonparametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). This estimator is shown to achieve robustness to any kind of anomalous data, even in the case of adversarial contamination. In particular, while previous works only prove consistency results under known contamination model, this work provides finite-sample high-probability error-bounds without a priori knowledge on the outliers. Finally, when compared with other robust kernel estimators, we show that MoM-KDE achieves competitive results while having significant lower computational complexity. △ Less

Submitted 30 June, 2020; originally announced June 2020.

arXiv:2006.10628 [pdf, other]

Offline detection of change-points in the mean for stationary graph signals

Authors: Alejandro de la Concha, Nicolas Vayatis, Argyris Kalogeratos

Abstract: This paper addresses the problem of segmenting a stream of graph signals: we aim to detect changes in the mean of a multivariate signal defined over the nodes of a known graph. We propose an offline method that relies on the concept of graph signal stationarity and allows the convenient translation of the problem from the original vertex domain to the spectral domain (Graph Fourier Transform), whe… ▽ More This paper addresses the problem of segmenting a stream of graph signals: we aim to detect changes in the mean of a multivariate signal defined over the nodes of a known graph. We propose an offline method that relies on the concept of graph signal stationarity and allows the convenient translation of the problem from the original vertex domain to the spectral domain (Graph Fourier Transform), where it is much easier to solve. Although the obtained spectral representation is sparse in real applications, to the best of our knowledge this property has not been sufficiently exploited in the existing related literature. Our change-point detection method adopts a model selection approach that takes into account the sparsity of the spectral representation and determines automatically the number of change-points. Our detector comes with a proof of a non-asymptotic oracle inequality. Numerical experiments demonstrate the performance of the proposed method. △ Less

Submitted 29 February, 2024; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: 16 pages, 2 figures, 1 table, 1 annex. 9 pages of main text

ACM Class: I.2.6

arXiv:2006.08251 [pdf, other]

Adversarial Weighting for Domain Adaptation in Regression

Authors: Antoine de Mathelin, Guillaume Richard, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Abstract: We present a novel instance-based approach to handle regression tasks in the context of supervised domain adaptation under an assumption of covariate shift. The approach developed in this paper is based on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimiz… ▽ More We present a novel instance-based approach to handle regression tasks in the context of supervised domain adaptation under an assumption of covariate shift. The approach developed in this paper is based on the assumption that the task on the target domain can be efficiently learned by adequately reweighting the source instances during training phase. We introduce a novel formulation of the optimization objective for domain adaptation which relies on a discrepancy distance characterizing the difference between domains according to a specific task and a class of hypotheses. To solve this problem, we develop an adversarial network algorithm which learns both the source weighting scheme and the task in one feed-forward gradient descent. We provide numerical evidence of the relevance of the method on public data sets for regression domain adaptation through reproducible experiments. △ Less

Submitted 15 September, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 8 pages, 6 figures

arXiv:2006.07199 [pdf, other]

Dynamic Epidemic Control via Sequential Resource Allocation

Authors: Mathilde Fekom, Nicolas Vayatis, Argyris Kalogeratos

Abstract: In the Dynamic Resource Allocation (DRA) problem, an administrator has to allocate a limited amount of resources to the nodes of a network in order to reduce a diffusion process (DP) (e.g. an epidemic). In this paper we propose a multi-round dynamic control framework, which we realize through two derived models: the Restricted and the Sequential DRA (RDRA, SDRA), that allows for restricted informa… ▽ More In the Dynamic Resource Allocation (DRA) problem, an administrator has to allocate a limited amount of resources to the nodes of a network in order to reduce a diffusion process (DP) (e.g. an epidemic). In this paper we propose a multi-round dynamic control framework, which we realize through two derived models: the Restricted and the Sequential DRA (RDRA, SDRA), that allows for restricted information and access to the entire network, contrary to standard full-information and full-access DRA models. At each intervention round, the administrator has only access -- simultaneous for the former, sequential for the latter -- to a fraction of the network nodes. This sequential aspect in the decision process offers a completely new perspective to the dynamic DP control, making this work the first to cast the dynamic control problem as a series of sequential selection problems. Through in-depth SIS epidemic simulations we compare the performance of our multi-round approach with other resource allocation strategies and several sequential selection algorithms on both generated, and real-data networks. The results provide evidence about the efficiency and applicability of the proposed framework for real-life problems. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: arXiv admin note: text overlap with arXiv:1909.09678

arXiv:2002.05160 [pdf, other]

Optimal Multiple Stop** Rule for Warm-Starting Sequential Selection

Authors: Mathilde Fekom, Nicolas Vayatis, Argyris Kalogeratos

Abstract: In this paper we present the Warm-starting Dynamic Thresholding algorithm, developed using dynamic programming, for a variant of the standard online selection problem. The problem allows job positions to be either free or already occupied at the beginning of the process. Throughout the selection process, the decision maker interviews one after the other the new candidates and reveals a quality sco… ▽ More In this paper we present the Warm-starting Dynamic Thresholding algorithm, developed using dynamic programming, for a variant of the standard online selection problem. The problem allows job positions to be either free or already occupied at the beginning of the process. Throughout the selection process, the decision maker interviews one after the other the new candidates and reveals a quality score for each of them. Based on that information, she can (re)assign each job at most once by taking immediate and irrevocable decisions. We relax the hard requirement of the class of dynamic programming algorithms to perfectly know the distribution from which the scores of candidates are drawn, by presenting extensions for the partial and no-information cases, in which the decision maker can learn the underlying score distribution sequentially while interviewing candidates. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:1910.08512 [pdf, other]

Learning the piece-wise constant graph structure of a varying Ising model

Authors: Batiste Le Bars, Pierre Humbert, Argyris Kalogeratos, Nicolas Vayatis

Abstract: This work focuses on the estimation of multiple change-points in a time-varying Ising model that evolves piece-wise constantly. The aim is to identify both the moments at which significant changes occur in the Ising model, as well as the underlying graph structures. For this purpose, we propose to estimate the neighborhood of each node by maximizing a penalized version of its conditional log-likel… ▽ More This work focuses on the estimation of multiple change-points in a time-varying Ising model that evolves piece-wise constantly. The aim is to identify both the moments at which significant changes occur in the Ising model, as well as the underlying graph structures. For this purpose, we propose to estimate the neighborhood of each node by maximizing a penalized version of its conditional log-likelihood. The objective of the penalization is twofold: it imposes sparsity in the learned graphs and, thanks to a fused-type penalty, it also enforces them to evolve piece-wise constantly. Using few assumptions, we provide two change-points consistency theorems. Those are the first in the context of unknown number of change-points detection in time-varying Ising model. Finally, experimental results on several synthetic datasets and a real-world dataset demonstrate the performance of our method. △ Less

Submitted 30 June, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 18 pages (9 pages for Appendix), 4 figures, 2 tables

arXiv:1909.09678 [pdf, other]

Sequential Dynamic Resource Allocation for Epidemic Control

Authors: Mathilde Fekom, Nicolas Vayatis, Argyris Kalogeratos

Abstract: Under the Dynamic Resource Allocation (DRA) model, an administrator has the mission to allocate dynamically a limited budget of resources to the nodes of a network in order to reduce a diffusion process (DP) (e.g. an epidemic). The standard DRA assumes that the administrator has constantly full information and instantaneous access to the entire network. Towards bringing such strategies closer to r… ▽ More Under the Dynamic Resource Allocation (DRA) model, an administrator has the mission to allocate dynamically a limited budget of resources to the nodes of a network in order to reduce a diffusion process (DP) (e.g. an epidemic). The standard DRA assumes that the administrator has constantly full information and instantaneous access to the entire network. Towards bringing such strategies closer to real-life constraints, we first present the Restricted DRA model extension where, at each intervention round, the access is restricted to only a fraction of the network nodes, called sample. Then, inspired by sequential selection problems such as the well-known Secretary Problem, we propose the Sequential DRA (SDRA) model. Our model introduces a sequential aspect in the decision process over the sample of each round, offering a completely new perspective to the dynamic DP control. Finally, we incorporate several sequential selection algorithms to SDRA control strategies and compare their performance in SIS epidemic simulations. △ Less

Submitted 20 September, 2019; originally announced September 2019.

Comments: 6 pages, 5 figures

MSC Class: 68W27; 65C50; 90B15

arXiv:1908.03367 [pdf, other]

Multivariate Convolutional Sparse Coding with Low Rank Tensor

Authors: Pierre Humbert, Julien Audiffren, Laurent Oudre, Nicolas Vayatis

Abstract: This paper introduces a new multivariate convolutional sparse coding based on tensor algebra with a general model enforcing both element-wise sparsity and low-rankness of the activations tensors. By using the CP decomposition, this model achieves a significantly more efficient encoding of the multivariate signal-particularly in the high order/ dimension setting-resulting in better performance. We… ▽ More This paper introduces a new multivariate convolutional sparse coding based on tensor algebra with a general model enforcing both element-wise sparsity and low-rankness of the activations tensors. By using the CP decomposition, this model achieves a significantly more efficient encoding of the multivariate signal-particularly in the high order/ dimension setting-resulting in better performance. We prove that our model is closely related to the Kruskal tensor regression problem, offering interesting theoretical guarantees to our setting. Furthermore, we provide an efficient optimization algorithm based on alternating optimization to solve this model. Finally, we evaluate our algorithm with a large range of experiments, highlighting its advantages and limitations. △ Less

Submitted 9 August, 2019; originally announced August 2019.

arXiv:1907.06614 [pdf, other]

Revealing posturographic features associated with the risk of falling in patients with Parkinsonian syndromes via machine learning

Authors: Ioannis Bargiotas, Argyris Kalogeratos, Myrto Limnios, Pierre-Paul Vidal, Damien Ricard, Nicolas Vayatis

Abstract: Falling in Parkinsonian syndromes (PS) is associated with postural instability and consists a common cause of disability among PS patients. Current posturographic practices record the body's center-of-pressure displacement (statokinesigram) while the patient stands on a force platform. Statokinesigrams, after appropriate signal processing, can offer numerous posturographic features, which however… ▽ More Falling in Parkinsonian syndromes (PS) is associated with postural instability and consists a common cause of disability among PS patients. Current posturographic practices record the body's center-of-pressure displacement (statokinesigram) while the patient stands on a force platform. Statokinesigrams, after appropriate signal processing, can offer numerous posturographic features, which however challenges the efforts for valid statistics via standard univariate approaches. In this work, we present the ts-AUC, a non-parametric multivariate two-sample test, which we employ to analyze statokinesigram differences among PS patients that are fallers (PSf) and non-fallers (PSNF). We included 123 PS patients who were classified into PSF or PSNF based on clinical assessment and underwent simple Romberg Test (eyes open/eyes closed). We analyzed posturographic features using both multiple testing with p-value adjustment and the ts-AUC. While the ts-AUC showed significant difference between groups (p-value = 0.01), multiple testing did not show any such difference. Interestingly, significant difference between the two groups was found only using the open-eyes protocol. PSF showed significantly increased antero-posterior movements as well as increased posturographic area, compared to PSNF. Our study demonstrates the superiority of the ts-AUC test compared to standard statistical tools in distinguishing PSF and PSNF in the multidimensional feature space. This result highlights more generally the fact that machine learning-based statistical tests can be seen as a natural extension of classical statistical approaches and should be considered, especially when dealing with multifactorial assessments. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Comments: 16 pages, 11 figures (plots, tables, algorithms)

MSC Class: 62H15

arXiv:1809.07299 [pdf, other]

The Warm-starting Sequential Selection Problem and its Multi-round Extension

Authors: Mathilde Fekom, Nicolas Vayatis, Argyris Kalogeratos

Abstract: In the Sequential Selection Problem (SSP), immediate and irrevocable decisions need to be made as candidates randomly arrive for a job interview. Standard SSP variants, such as the well-known secretary problem, begin with an empty selection set (cold-start) and perform the selection process once over a single candidate set (single-round). In this paper we address these two limitations. First, we i… ▽ More In the Sequential Selection Problem (SSP), immediate and irrevocable decisions need to be made as candidates randomly arrive for a job interview. Standard SSP variants, such as the well-known secretary problem, begin with an empty selection set (cold-start) and perform the selection process once over a single candidate set (single-round). In this paper we address these two limitations. First, we introduce the novel Warm-starting SSP (WSSP) setting which considers at hand a reference set, a set of previously selected items of a given quality, and tries to update optimally that set by (re-)assigning each job at most once. We adopt a cutoff-based approach to optimize a rank-based objective function over the final assignment of the jobs. In our technical contribution, we provide analytical results regarding the proposed WSSP setting, we introduce the algorithm Cutoff-based Cost Minimization (CCM) (and the low failures-CCM, which is more robust to high rate of resignations) that adapts to changes in the quality of the reference set thanks to the translation method we propose. Finally, we implement and test CCM in a multi-round setting that is particularly interesting for real-world application scenarios. △ Less

Submitted 7 November, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

ACM Class: G.2; G.3

arXiv:1801.00826 [pdf, other]

ruptures: change point detection in Python

Authors: Charles Truong, Laurent Oudre, Nicolas Vayatis

Abstract: ruptures is a Python library for offline change point detection. This package provides methods for the analysis and segmentation of non-stationary signals. Implemented algorithms include exact and approximate detection for various parametric and non-parametric models. ruptures focuses on ease of use by providing a well-documented and consistent interface. In addition, thanks to its modular structu… ▽ More ruptures is a Python library for offline change point detection. This package provides methods for the analysis and segmentation of non-stationary signals. Implemented algorithms include exact and approximate detection for various parametric and non-parametric models. ruptures focuses on ease of use by providing a well-documented and consistent interface. In addition, thanks to its modular structure, different algorithms and models can be connected and extended within this package. △ Less

Submitted 2 January, 2018; originally announced January 2018.

arXiv:1801.00718 [pdf, other]

doi 10.1016/j.sigpro.2019.107299

Selective review of offline change point detection methods

Authors: Charles Truong, Laurent Oudre, Nicolas Vayatis

Abstract: This article presents a selective survey of algorithms for the offline detection of multiple change points in multivariate time series. A general yet structuring methodological strategy is adopted to organize this vast body of work. More precisely, detection algorithms considered in this review are characterized by three elements: a cost function, a search method and a constraint on the number of… ▽ More This article presents a selective survey of algorithms for the offline detection of multiple change points in multivariate time series. A general yet structuring methodological strategy is adopted to organize this vast body of work. More precisely, detection algorithms considered in this review are characterized by three elements: a cost function, a search method and a constraint on the number of changes. Each of those elements is described, reviewed and discussed separately. Implementations of the main algorithms described in this article are provided within a Python package called ruptures. △ Less

Submitted 26 March, 2020; v1 submitted 2 January, 2018; originally announced January 2018.

Journal ref: Signal Processing, 167:107299, 2020

arXiv:1709.05231 [pdf, ps, other]

A Spectral Method for Activity Sha** in Continuous-Time Information Cascades

Authors: Kevin Scaman, Argyris Kalogeratos, Luca Corinzia, Nicolas Vayatis

Abstract: Information Cascades Model captures dynamical properties of user activity in a social network. In this work, we develop a novel framework for activity sha** under the Continuous-Time Information Cascades Model which allows the administrator for local control actions by allocating targeted resources that can alter the spread of the process. Our framework employs the optimization of the spectral r… ▽ More Information Cascades Model captures dynamical properties of user activity in a social network. In this work, we develop a novel framework for activity sha** under the Continuous-Time Information Cascades Model which allows the administrator for local control actions by allocating targeted resources that can alter the spread of the process. Our framework employs the optimization of the spectral radius of the Hazard matrix, a quantity that has been shown to drive the maximum influence in a network, while enjoying a simple convex relaxation when used to minimize the influence of the cascade. In addition, use-cases such as quarantine and node immunization are discussed to highlight the generality of the proposed activity sha** framework. Finally, we present the NetShape influence minimization method which is compared favorably to baseline and state-of-the-art approaches through simulations on real social networks. △ Less

Submitted 15 September, 2017; originally announced September 2017.

MSC Class: 93E20; 91D30 ACM Class: I.2.6

arXiv:1705.10087 [pdf, other]

DICOD: Distributed Convolutional Sparse Coding

Authors: Thomas Moreau, Laurent Oudre, Nicolas Vayatis

Abstract: In this paper, we introduce DICOD, a convolutional sparse coding algorithm which builds shift invariant representations for long signals. This algorithm is designed to run in a distributed setting, with local message passing, making it communication efficient. It is based on coordinate descent and uses locally greedy updates which accelerate the resolution compared to greedy coordinate selection.… ▽ More In this paper, we introduce DICOD, a convolutional sparse coding algorithm which builds shift invariant representations for long signals. This algorithm is designed to run in a distributed setting, with local message passing, making it communication efficient. It is based on coordinate descent and uses locally greedy updates which accelerate the resolution compared to greedy coordinate selection. We prove the convergence of this algorithm and highlight its computational speed-up which is super-linear in the number of cores used. We also provide empirical evidence for the acceleration properties of our algorithm compared to state-of-the-art methods. △ Less

Submitted 13 May, 2018; v1 submitted 29 May, 2017; originally announced May 2017.

arXiv:1703.02628 [pdf, other]

Global optimization of Lipschitz functions

Authors: Cédric Malherbe, Nicolas Vayatis

Abstract: The goal of the paper is to design sequential strategies which lead to efficient optimization of an unknown function under the only assumption that it has a finite Lipschitz constant. We first identify sufficient conditions for the consistency of generic sequential algorithms and formulate the expected minimax rate for their performance. We introduce and analyze a first algorithm called LIPO which… ▽ More The goal of the paper is to design sequential strategies which lead to efficient optimization of an unknown function under the only assumption that it has a finite Lipschitz constant. We first identify sufficient conditions for the consistency of generic sequential algorithms and formulate the expected minimax rate for their performance. We introduce and analyze a first algorithm called LIPO which assumes the Lipschitz constant to be known. Consistency, minimax rates for LIPO are proved, as well as fast rates under an additional Hölder like condition. An adaptive version of LIPO is also introduced for the more realistic setup where the Lipschitz constant is unknown and has to be estimated along with the optimization. Similar theoretical guarantees are shown to hold for the adaptive LIPO algorithm and a numerical assessment is provided at the end of the paper to illustrate the potential of this strategy with respect to state-of-the-art methods over typical benchmark problems for global optimization. △ Less

Submitted 15 June, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

arXiv:1603.07970 [pdf, other]

Spectral Bounds in Random Graphs Applied to Spreading Phenomena and Percolation

Authors: Rémi Lemonnier, Kevin Scaman, Nicolas Vayatis

Abstract: In this paper, we derive nonasymptotic theoretical bounds for the influence in random graphs that depend on the spectral radius of a particular matrix, called the Hazard matrix. We also show that these results are generic and valid for a large class of random graphs displaying correlation at a local scale, called the LPC random graphs. In particular, they lead to tight and novel bounds in percolat… ▽ More In this paper, we derive nonasymptotic theoretical bounds for the influence in random graphs that depend on the spectral radius of a particular matrix, called the Hazard matrix. We also show that these results are generic and valid for a large class of random graphs displaying correlation at a local scale, called the LPC random graphs. In particular, they lead to tight and novel bounds in percolation, epidemiology and information cascades. The main result of the paper states that the influence in the sub-critical regime for LPC random graphs is at most of the order of $O(\sqrt{n})$ where $n$ is the size of the network, and of $O(n^{2/3})$ in the critical regime, where the epidemic thresholds are driven by the size of the spectral radius of the Hazard matrix with respect to 1. As a corollary, it is also shown that such bounds hold for the size of the giant component in inhomogeneous percolation, the SIR model in epidemiology, as well as for the long-term influence of a node in the Independent Cascade Model. △ Less

Submitted 25 March, 2016; originally announced March 2016.

Comments: 32 pages, 1 figure

arXiv:1603.04381 [pdf, other]

A ranking approach to global optimization

Authors: Cédric Malherbe, Nicolas Vayatis

Abstract: We consider the problem of maximizing an unknown function over a compact and convex set using as few observations as possible. We observe that the optimization of the function essentially relies on learning the induced bipartite ranking rule of f. Based on this idea, we relate global optimization to bipartite ranking which allows to address problems with high dimensional input space, as well as ca… ▽ More We consider the problem of maximizing an unknown function over a compact and convex set using as few observations as possible. We observe that the optimization of the function essentially relies on learning the induced bipartite ranking rule of f. Based on this idea, we relate global optimization to bipartite ranking which allows to address problems with high dimensional input space, as well as cases of functions with weak regularity properties. The paper introduces novel meta-algorithms for global optimization which rely on the choice of any bipartite ranking method. Theoretical properties are provided as well as convergence guarantees and equivalences between various optimization methods are obtained as a by-product. Eventually, numerical evidence is given to show that the main algorithm of the paper which adapts empirically to the underlying ranking structure essentially outperforms existing state-of-the-art global optimization algorithms in typical benchmarks. △ Less

Submitted 7 March, 2017; v1 submitted 14 March, 2016; originally announced March 2016.

arXiv:1602.04976 [pdf, ps, other]

Stochastic Process Bandits: Upper Confidence Bounds Algorithms via Generic Chaining

Authors: Emile Contal, Nicolas Vayatis

Abstract: The paper considers the problem of global optimization in the setup of stochastic process bandits. We introduce an UCB algorithm which builds a cascade of discretization trees based on generic chaining in order to render possible his operability over a continuous domain. The theoretical framework applies to functions under weak probabilistic smoothness assumptions and also extends significantly th… ▽ More The paper considers the problem of global optimization in the setup of stochastic process bandits. We introduce an UCB algorithm which builds a cascade of discretization trees based on generic chaining in order to render possible his operability over a continuous domain. The theoretical framework applies to functions under weak probabilistic smoothness assumptions and also extends significantly the spectrum of application of UCB strategies. Moreover generic regret bounds are derived which are then specialized to Gaussian processes indexed on infinite-dimensional spaces as well as to quadratic forms of Gaussian processes. Lower bounds are also proved in the case of Gaussian processes to assess the optimality of the proposed algorithm. △ Less

Submitted 16 February, 2016; originally announced February 2016.

Comments: preprint

arXiv:1510.05576 [pdf, ps, other]

Optimization for Gaussian Processes via Chaining

Authors: Emile Contal, Cédric Malherbe, Nicolas Vayatis

Abstract: In this paper, we consider the problem of stochastic optimization under a bandit feedback model. We generalize the GP-UCB algorithm [Srinivas and al., 2012] to arbitrary kernels and search spaces. To do so, we use a notion of localized chaining to control the supremum of a Gaussian process, and provide a novel optimization scheme based on the computation of covering numbers. The theoretical bounds… ▽ More In this paper, we consider the problem of stochastic optimization under a bandit feedback model. We generalize the GP-UCB algorithm [Srinivas and al., 2012] to arbitrary kernels and search spaces. To do so, we use a notion of localized chaining to control the supremum of a Gaussian process, and provide a novel optimization scheme based on the computation of covering numbers. The theoretical bounds we obtain on the cumulative regret are more generic and present the same convergence rates as the GP-UCB algorithm. Finally, the algorithm is shown to be empirically more efficient than its natural competitors on simple and complex input spaces. △ Less

Submitted 19 October, 2015; originally announced October 2015.

arXiv:1407.4760 [pdf, other]

What Makes a Good Plan? An Efficient Planning Approach to Control Diffusion Processes in Networks

Authors: Kevin Scaman, Argyris Kalogeratos, Nicolas Vayatis

Abstract: In this paper, we analyze the quality of a large class of simple dynamic resource allocation (DRA) strategies which we name priority planning. Their aim is to control an undesired diffusion process by distributing resources to the contagious nodes of the network according to a predefined priority-order. In our analysis, we reduce the DRA problem to the linear arrangement of the nodes of the networ… ▽ More In this paper, we analyze the quality of a large class of simple dynamic resource allocation (DRA) strategies which we name priority planning. Their aim is to control an undesired diffusion process by distributing resources to the contagious nodes of the network according to a predefined priority-order. In our analysis, we reduce the DRA problem to the linear arrangement of the nodes of the network. Under this perspective, we shed light on the role of a fundamental characteristic of this arrangement, the maximum cutwidth, for assessing the quality of any priority planning strategy. Our theoretical analysis validates the role of the maximum cutwidth by deriving bounds for the extinction time of the diffusion process. Finally, using the results of our analysis, we propose a novel and efficient DRA strategy, called Maximum Cutwidth Minimization, that outperforms other competing strategies in our simulations. △ Less

Submitted 17 July, 2014; originally announced July 2014.

Comments: 18 pages, 3 figures

arXiv:1407.4744 [pdf, other]

Tight Bounds for Influence in Diffusion Networks and Application to Bond Percolation and Epidemiology

Authors: Remi Lemonnier, Kevin Scaman, Nicolas Vayatis

Abstract: In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). We relate these bounds to the spectral radius of a particular matrix and show that the behavior is sub-critical when this spectral radius is lower than $1$. More specifically, we point out that, in general networks, the sub-critical regime behaves in $O(\sqrt{n})$ where $n$ is t… ▽ More In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM). We relate these bounds to the spectral radius of a particular matrix and show that the behavior is sub-critical when this spectral radius is lower than $1$. More specifically, we point out that, in general networks, the sub-critical regime behaves in $O(\sqrt{n})$ where $n$ is the size of the network, and that this upper bound is met for star-shaped networks. We apply our results to epidemiology and percolation on arbitrary networks, and derive a bound for the critical value beyond which a giant connected component arises. Finally, we show empirically the tightness of our bounds for a large family of networks. △ Less

Submitted 17 July, 2014; originally announced July 2014.

Comments: 20 pages, 4 figures

arXiv:1405.4175 [pdf, ps, other]

Nonparametric Markovian Learning of Triggering Kernels for Mutually Exciting and Mutually Inhibiting Multivariate Hawkes Processes

Authors: Remi Lemonnier, Nicolas Vayatis

Abstract: In this paper, we address the problem of fitting multivariate Hawkes processes to potentially large-scale data in a setting where series of events are not only mutually-exciting but can also exhibit inhibitive patterns. We focus on nonparametric learning and propose a novel algorithm called MEMIP (Markovian Estimation of Mutually Interacting Processes) that makes use of polynomial approximation th… ▽ More In this paper, we address the problem of fitting multivariate Hawkes processes to potentially large-scale data in a setting where series of events are not only mutually-exciting but can also exhibit inhibitive patterns. We focus on nonparametric learning and propose a novel algorithm called MEMIP (Markovian Estimation of Mutually Interacting Processes) that makes use of polynomial approximation theory and self-concordant analysis in order to learn both triggering kernels and base intensities of events. Moreover, considering that N historical observations are available, the algorithm performs log-likelihood maximization in $O(N)$ operations, while the complexity of non-Markovian methods is in $O(N^{2})$. Numerical experiments on simulated data, as well as real-world data, show that our method enjoys improved prediction performance when compared to state-of-the art methods like MMEL and exponential kernels. △ Less

Submitted 16 May, 2014; originally announced May 2014.

arXiv:1312.0020 [pdf, other]

Sloshing in the LNG ship** industry: risk modelling through multivariate heavy-tail analysis

Authors: Antoine Dematteo, Stéphan CLEMENCON, Nicolas Vayatis, Mathilde Mougeot

Abstract: In the liquefied natural gas (LNG) ship** industry, the phenomenon of sloshing can lead to the occurrence of very high pressures in the tanks of the vessel. The issue of modelling or estimating the probability of the simultaneous occurrence of such extremal pressures is now crucial from the risk assessment point of view. In this paper, heavy-tail modelling, widely used as a conservative approach… ▽ More In the liquefied natural gas (LNG) ship** industry, the phenomenon of sloshing can lead to the occurrence of very high pressures in the tanks of the vessel. The issue of modelling or estimating the probability of the simultaneous occurrence of such extremal pressures is now crucial from the risk assessment point of view. In this paper, heavy-tail modelling, widely used as a conservative approach to risk assessment and corresponding to a worst-case risk analysis, is applied to the study of sloshing. Multivariate heavy-tailed distributions are considered, with Sloshing pressures investigated by means of small-scale replica tanks instrumented with d >1 sensors. When attempting to fit such nonparametric statistical models, one naturally faces computational issues inherent in the phenomenon of dimensionality. The primary purpose of this article is to overcome this barrier by introducing a novel methodology. For d-dimensional heavy-tailed distributions, the structure of extremal dependence is entirely characterised by the angular measure, a positive measure on the intersection of a sphere with the positive orthant in Rd. As d increases, the mutual extremal dependence between variables becomes difficult to assess. Based on a spectral clustering approach, we show here how a low dimensional approximation to the angular measure may be found. The nonparametric method proposed for model sloshing has been successfully applied to pressure data. The parsimonious representation thus obtained proves to be very convenient for the simulation of multivariate heavy-tailed distributions, allowing for the implementation of Monte-Carlo simulation schemes in estimating the probability of failure. Besides confirming its performance on artificial data, the methodology has been implemented on a real data set specifically collected for risk assessment of sloshing in the LNG ship** industry. △ Less

Submitted 29 November, 2013; originally announced December 2013.

arXiv:1311.4825 [pdf, other]

Gaussian Process Optimization with Mutual Information

Authors: Emile Contal, Vianney Perchet, Nicolas Vayatis

Abstract: In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves furthe… ▽ More In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves further these upper bounds for the cumulative regret. We confirm the efficiency of this algorithm on synthetic and real tasks against the natural competitor, GP-UCB, and also the Expected Improvement heuristic. △ Less

Submitted 8 June, 2015; v1 submitted 19 November, 2013; originally announced November 2013.

Comments: Proceedings of The 31st International Conference on Machine Learning (ICML 2014)

arXiv:1305.7385 [pdf, other]

doi 10.1098/rspa.2014.0575

Can Small Islands Protect Nearby Coasts From Tsunamis? An Active Experimental Design Approach

Authors: Themistoklis S. Stefanakis, Emile Contal, Nicolas Vayatis, Frédéric Dias, Costas E. Synolakis

Abstract: Small islands in the vicinity of the mainland are believed to offer protection from wind and waves and thus coastal communities have been developed in these areas. However, what happens when it comes to tsunamis is not clear. Will these islands act as natural barriers ? Recent post-tsunami survey data, supported by numerical simulations, reveal that the run-up on coastal areas behind small islands… ▽ More Small islands in the vicinity of the mainland are believed to offer protection from wind and waves and thus coastal communities have been developed in these areas. However, what happens when it comes to tsunamis is not clear. Will these islands act as natural barriers ? Recent post-tsunami survey data, supported by numerical simulations, reveal that the run-up on coastal areas behind small islands was significantly higher than on neighboring locations not affected by the presence of the island. To study the conditions of this run- up amplification, we solve numerically the nonlinear shallow water equations (NSWE). We use the simplified geometry of a conical island sitting on a flat bed in front of a uniform slo** beach. By doing so, the experimental setup is defined by five physical parameters, namely the island slope, the beach slope, the water depth, the distance between the island and the plane beach and the incoming wavelength, while the wave height was kept fixed. The objective is twofold: Find the maximum run-up amplification with the least number of simulations. To achieve this goal, we build an emulator based on Gaussian Processes to guide the selection of the query points in the parameter space. △ Less

Submitted 31 May, 2013; originally announced May 2013.

arXiv:1304.5350 [pdf, other]

doi 10.1007/978-3-642-40988-2_15

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Authors: Emile Contal, David Buffoni, Alexandre Robicquet, Nicolas Vayatis

Abstract: In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit co… ▽ More In this paper, we consider the challenge of maximizing an unknown function f for which evaluations are noisy and are acquired with high cost. An iterative procedure uses the previous measures to actively select the next estimation of f which is predicted to be the most useful. We focus on the case where the function can be evaluated in parallel with batches of fixed size and analyze the benefit compared to the purely sequential procedure in terms of cumulative regret. We introduce the Gaussian Process Upper Confidence Bound and Pure Exploration algorithm (GP-UCB-PE) which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations. We prove theoretical upper bounds on the regret with batches of size K for this procedure which show the improvement of the order of sqrt{K} for fixed iteration cost over purely sequential versions. Moreover, the multiplicative constants involved have the property of being dimension-free. We also confirm empirically the efficiency of GP-UCB-PE on real and synthetic problems compared to state-of-the-art competitors. △ Less

Submitted 2 September, 2013; v1 submitted 19 April, 2013; originally announced April 2013.

Journal ref: Proceedings of ECML 2013, pp.225-240

arXiv:1209.3230 [pdf, ps, other]

Link Prediction in Graphs with Autoregressive Features

Authors: Emile Richard, Stephane Gaiffas, Nicolas Vayatis

Abstract: In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account b… ▽ More In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efficiently using proximal methods through a generalized forward-backward agorithm. △ Less

Submitted 14 September, 2012; originally announced September 2012.

Comments: NIPS 2012

arXiv:1206.6474 [pdf]

Estimation of Simultaneously Sparse and Low Rank Matrices

Authors: Emile Richard, Pierre-Andre Savalle, Nicolas Vayatis

Abstract: The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and low-rank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are block-diagonal in the appropriate basis. We introduce a convex mixed penalty which involves $\ell_1$-norm and trace norm simultaneous… ▽ More The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and low-rank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are block-diagonal in the appropriate basis. We introduce a convex mixed penalty which involves $\ell_1$-norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the optimization problem efficiently and evaluate performance on synthetic and real data sets. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1205.1406 [pdf, ps, other]

Graph Prediction in a Low-Rank and Autoregressive Setting

Authors: Emile Richard, Pierre-Andre Savalle, Nicolas Vayatis

Abstract: We study the problem of prediction for evolving graph data. We formulate the problem as the minimization of a convex objective encouraging sparsity and low-rank of the solution, that reflect natural graph properties. The convex formulation allows to obtain oracle inequalities and efficient solvers. We provide empirical results for our algorithm and comparison with competing methods, and point out… ▽ More We study the problem of prediction for evolving graph data. We formulate the problem as the minimization of a convex objective encouraging sparsity and low-rank of the solution, that reflect natural graph properties. The convex formulation allows to obtain oracle inequalities and efficient solvers. We provide empirical results for our algorithm and comparison with competing methods, and point out two open questions related to compressed sensing and algebra of low-rank and sparse matrices. △ Less

Submitted 9 May, 2012; v1 submitted 7 May, 2012; originally announced May 2012.

arXiv:1203.5438 [pdf, ps, other]

A Regularization Approach for Prediction of Edges and Node Features in Dynamic Graphs

Authors: Emile Richard, Andreas Argyriou, Theodoros Evgeniou, Nicolas Vayatis

Abstract: We consider the two problems of predicting links in a dynamic graph sequence and predicting functions defined at each node of the graph. In many applications, the solution of one problem is useful for solving the other. Indeed, if these functions reflect node features, then they are related through the graph structure. In this paper, we formulate a hybrid approach that simultaneously learns the st… ▽ More We consider the two problems of predicting links in a dynamic graph sequence and predicting functions defined at each node of the graph. In many applications, the solution of one problem is useful for solving the other. Indeed, if these functions reflect node features, then they are related through the graph structure. In this paper, we formulate a hybrid approach that simultaneously learns the structure of the graph and predicts the values of the node-related functions. Our approach is based on the optimization of a joint regularization objective. We empirically test the benefits of the proposed method with both synthetic and real data. The results indicate that joint regularization improves prediction performance over the graph evolution and the node features. △ Less

Submitted 24 March, 2012; originally announced March 2012.

arXiv:0708.0098 [pdf, ps, other]

doi 10.1214/009053606000001046

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

Authors: Stéphan Clémençon, Gábor Lugosi, Nicolas Vayatis

Abstract: Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083] Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083] △ Less

Submitted 1 August, 2007; originally announced August 2007.

Comments: Published at http://dx.doi.org/10.1214/009053606000001046 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS0195C

Journal ref: Annals of Statistics 2006, Vol. 34, No. 6, 2672-2676

arXiv:math/0611133 [pdf, ps, other]

Ranking the best instances

Authors: Stéphan Clémençon, Nicolas Vayatis

Abstract: We formulate the local ranking problem in the framework of bipartite ranking where the goal is to focus on the best instances. We propose a methodology based on the construction of real-valued scoring functions. We study empirical risk minimization of dedicated statistics which involve empirical quantiles of the scores. We first state the problem of finding the best instances which can be cast a… ▽ More We formulate the local ranking problem in the framework of bipartite ranking where the goal is to focus on the best instances. We propose a methodology based on the construction of real-valued scoring functions. We study empirical risk minimization of dedicated statistics which involve empirical quantiles of the scores. We first state the problem of finding the best instances which can be cast as a classification problem with mass constraint. Next, we develop special performance measures for the local ranking problem which extend the Area Under an ROC Curve (AUC/AROC) criterion and describe the optimal elements of these new criteria. We also highlight the fact that the goal of ranking the best instances cannot be achieved in a stage-wise manner where first, the best instances would be tentatively identified and then a standard AUC criterion could be applied. Eventually, we state preliminary statistical results for the local ranking problem. △ Less

Submitted 14 February, 2007; v1 submitted 6 November, 2006; originally announced November 2006.

Comments: 29 pages

MSC Class: 68Q32; 60G99; 62G99; 62M99

arXiv:math/0603123 [pdf, ps, other]

Ranking and empirical minimization of U-statistics

Authors: Stéphan Clémençon, Gábor Lugosi, Nicolas Vayatis

Abstract: The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is "better," with minimum ranking risk. Since the natural estimates of the risk are of the form of… ▽ More The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is "better," with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish in particular a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied. △ Less

Submitted 5 March, 2006; originally announced March 2006.

Comments: 32 pages

MSC Class: 68Q32; 60G99; 62G99; 62M99

Showing 1–50 of 51 results for author: Vayatis, N