Skip to main content

Showing 1–50 of 54 results for author: Guedj, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.09796  [pdf, ps, other

    stat.ML cs.LG cs.RO

    Closed-form Filtering for Non-linear Systems

    Authors: Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

    Abstract: Sequential Bayesian Filtering aims to estimate the current state distribution of a Hidden Markov Model, given the past observations. The problem is well-known to be intractable for most application domains, except in notable cases such as the tabular setting or for linear dynamical systems with gaussian noise. In this work, we propose a new class of filters based on Gaussian PSD Models, which offe… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 38 pages

  2. arXiv:2402.08508  [pdf, other

    stat.ML cs.LG

    A PAC-Bayesian Link Between Generalisation and Flat Minima

    Authors: Maxime Haddouche, Paul Viallard, Umut Simsekli, Benjamin Guedj

    Abstract: Modern machine learning usually involves predictors in the overparametrised setting (number of trained parameters greater than dataset size), and their training yield not only good performances on training data, but also good generalisation capacity. This phenomenon challenges many theoretical results, and remains an open problem. To reach a better understanding, we provide novel generalisation bo… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: We provide novel PAC-Bayesian generalisation bounds involving gradient norms and being interpretable under the lens of flat minima

  3. arXiv:2402.05101  [pdf, ps, other

    stat.ML cs.LG

    Tighter Generalisation Bounds via Interpolation

    Authors: Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

    Abstract: This paper contains a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, Γ)$-divergence, and, in addition, presents PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences (including but not limited to KL, Wasserstein, and total variation), making the best out of many worlds depending on the posterior distributions properties. We expl… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  4. arXiv:2312.13259  [pdf, ps, other

    stat.ML cs.LG

    A note on regularised NTK dynamics with an application to PAC-Bayesian training

    Authors: Eugenio Clerico, Benjamin Guedj

    Abstract: We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, alt… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  5. arXiv:2310.11203  [pdf, other

    cs.LG stat.ML

    Federated Learning with Nonvacuous Generalisation Bounds

    Authors: Pierre Jobic, Maxime Haddouche, Benjamin Guedj

    Abstract: We introduce a novel strategy to train randomised predictors in federated learning, where each node of the network aims at preserving its privacy by releasing a local predictor but kee** secret its training dataset with respect to the other nodes. We then build a global randomised predictor which inherits the properties of the local private predictors in the sense of a PAC-Bayesian generalisatio… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  6. arXiv:2310.10534  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Comparing Comparators in Generalization Bounds

    Authors: Fredrik Hellström, Benjamin Guedj

    Abstract: We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that… ▽ More

    Submitted 21 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024

  7. arXiv:2309.04381  [pdf, other

    cs.LG cs.AI cs.IT math.ST stat.ML

    Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

    Authors: Fredrik Hellström, Giuseppe Durisi, Benjamin Guedj, Maxim Raginsky

    Abstract: A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neu… ▽ More

    Submitted 27 March, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: 228 pages

  8. arXiv:2306.04375  [pdf, ps, other

    stat.ML cs.LG

    Learning via Wasserstein-Based High Probability Generalisation Bounds

    Authors: Paul Viallard, Maxime Haddouche, Umut Şimşekli, Benjamin Guedj

    Abstract: Minimising upper bounds on the population risk or the generalisation gap has been widely used in structural risk minimisation (SRM) -- this is in particular at the core of PAC-Bayesian learning. Despite its successes and unfailing surge of interest in recent years, a limitation of the PAC-Bayesian framework is that most bounds involve a Kullback-Leibler (KL) divergence term (or its variations), wh… ▽ More

    Submitted 27 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  9. arXiv:2304.07048  [pdf, other

    stat.ML cs.LG math.OC

    Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation

    Authors: Maxime Haddouche, Benjamin Guedj

    Abstract: PAC-Bayes learning is an established framework to both assess the generalisation ability of learning algorithms, and design new learning algorithm by exploiting generalisation bounds as training objectives. Most of the exisiting bounds involve a \emph{Kullback-Leibler} (KL) divergence, which fails to capture the geometric properties of the loss function which are often useful in optimisation. We a… ▽ More

    Submitted 30 May, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

  10. arXiv:2301.07530  [pdf, other

    cs.LG math.OC stat.ML

    Optimistically Tempered Online Learning

    Authors: Maxime Haddouche, Olivier Wintenberger, Benjamin Guedj

    Abstract: Optimistic Online Learning algorithms have been developed to exploit expert advices, assumed optimistically to be always useful. However, it is legitimate to question the relevance of such advices \emph{w.r.t.} the learning information provided by gradient-based online algorithms. In this work, we challenge the confidence assumption on the expert and develop the \emph{optimistically tempered} (OT)… ▽ More

    Submitted 14 February, 2024; v1 submitted 18 January, 2023; originally announced January 2023.

  11. arXiv:2210.11289  [pdf, ps, other

    cs.LG stat.ML

    Tighter PAC-Bayes Generalisation Bounds by Leveraging Example Difficulty

    Authors: Felix Biggs, Benjamin Guedj

    Abstract: We introduce a modified version of the excess risk, which can be used to obtain tighter, fast-rate PAC-Bayesian generalisation bounds. This modified excess risk leverages information about the relative hardness of data examples to reduce the variance of its empirical counterpart, tightening the bound. We combine this with a new bound for $[-1, 1]$-valued (and potentially non-independent) signed lo… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: 22 pages

    Journal ref: AISTATS 2023

  12. arXiv:2210.00928  [pdf, ps, other

    stat.ML cs.LG math.ST

    PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales

    Authors: Maxime Haddouche, Benjamin Guedj

    Abstract: While PAC-Bayes is now an established learning framework for light-tailed losses (\emph{e.g.}, subgaussian or subexponential), its extension to the case of heavy-tailed losses remains largely uncharted and has attracted a growing interest in recent years. We contribute PAC-Bayes generalisation bounds for heavy-tailed losses under the sole assumption of bounded variance of the loss function. Under… ▽ More

    Submitted 24 April, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: New Section 3 on Online PAC-Bayes

  13. arXiv:2209.02525  [pdf, other

    stat.ML cs.LG

    Generalisation under gradient descent via deterministic PAC-Bayes

    Authors: Eugenio Clerico, Tyler Farghly, George Deligiannidis, Benjamin Guedj, Arnaud Doucet

    Abstract: We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution… ▽ More

    Submitted 4 April, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

  14. arXiv:2206.09194  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Efficient Aggregated Kernel Tests using Incomplete $U$-statistics

    Authors: Antonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur Gretton

    Abstract: We propose a series of computationally efficient nonparametric tests for the two-sample, independence, and goodness-of-fit problems, using the Maximum Mean Discrepancy (MMD), Hilbert Schmidt Independence Criterion (HSIC), and Kernel Stein Discrepancy (KSD), respectively. Our test statistics are incomplete $U$-statistics, with a computational cost that interpolates between linear time in the number… ▽ More

    Submitted 26 January, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 34 pages, 5 figures

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  15. arXiv:2206.04607  [pdf, other

    cs.LG math.ST stat.ML

    On Margins and Generalisation for Voting Classifiers

    Authors: Felix Biggs, Valentina Zantedeschi, Benjamin Guedj

    Abstract: We study the generalisation properties of majority voting on finite ensembles of classifiers, proving margin-based generalisation bounds via the PAC-Bayes theory. These provide state-of-the-art guarantees on a number of classification tasks. Our central results leverage the Dirichlet posteriors studied recently by Zantedeschi et al. [2021] for training voting classifiers; in contrast to that work… ▽ More

    Submitted 20 October, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 20 pages, 8 figures

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  16. Opening up echo chambers via optimal content recommendation

    Authors: Antoine Vendeville, Anastasios Giovanidis, Effrosyni Papanastasiou, Benjamin Guedj

    Abstract: Online social platforms have become central in the political debate. In this context, the existence of echo chambers is a problem of primary relevance. These clusters of like-minded individuals tend to reinforce prior beliefs, elicit animosity towards others and aggravate the spread of misinformation. We study this phenomenon on a Twitter dataset related to the 2017 French presidential elections a… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  17. arXiv:2206.00024  [pdf, other

    cs.LG math.ST stat.ML

    Online PAC-Bayes Learning

    Authors: Maxime Haddouche, Benjamin Guedj

    Abstract: Most PAC-Bayesian bounds hold in the batch learning setting where data is collected at once, prior to inference or prediction. This somewhat departs from many contemporary learning problems where data streams are collected and the algorithms must dynamically adjust. We prove new PAC-Bayesian bounds in this online learning framework, leveraging an updated definition of regret, and we revisit classi… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: 21 pages

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  18. arXiv:2204.12024  [pdf, other

    cs.CL

    Reprint: a randomized extrapolation based on principal components for data augmentation

    Authors: Jiale Wei, Qiyuan Chen, Pai Peng, Benjamin Guedj, Le Li

    Abstract: Data scarcity and data imbalance have attracted a lot of attention in many fields. Data augmentation, explored as an effective approach to tackle them, can improve the robustness and efficiency of classification models by generating new samples. This paper presents REPRINT, a simple and effective hidden-space data augmentation method for imbalanced data classification. Given hidden-space represent… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  19. arXiv:2203.02002  [pdf, other

    cs.SI physics.soc-ph

    Discord in the voter model for complex networks

    Authors: Antoine Vendeville, Shi Zhou, Benjamin Guedj

    Abstract: Online social networks have become primary means of communication. As they often exhibit undesirable effects such as hostility, polarisation or echo chambers, it is crucial to develop analytical tools that help us better understand them. In this paper, we are interested in the evolution of discord in social networks. Formally, we introduce a method to calculate the probability of discord between a… ▽ More

    Submitted 21 February, 2024; v1 submitted 3 March, 2022; originally announced March 2022.

    Journal ref: Phys. Rev. E, 109(2), 024312 (2024)

  20. arXiv:2202.11455  [pdf, other

    cs.LG cs.CV math.ST stat.ML

    On PAC-Bayesian reconstruction guarantees for VAEs

    Authors: Badr-Eddine Chérief-Abdellatif, Yuyang Shi, Arnaud Doucet, Benjamin Guedj

    Abstract: Despite its wide use and empirical successes, the theoretical understanding and study of the behaviour and performance of the variational autoencoder (VAE) have only emerged in the past few years. We contribute to this recent line of work by analysing the VAE's reconstruction ability for unseen test data, leveraging arguments from the PAC-Bayes theory. We provide generalisation bounds on the theor… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 14 pages

    Journal ref: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022, Valencia, Spain. PMLR: Volume 151

  21. arXiv:2202.05614  [pdf, other

    stat.ML cs.LG

    Measuring dissimilarity with diffeomorphism invariance

    Authors: Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

    Abstract: Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms. We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data's internal structure to be invariant to diffeomorphisms. We prove that DID enjoys properties which make it relevant for theoretical study and practical use. By representing each dat… ▽ More

    Submitted 7 March, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: A pre-print

  22. arXiv:2202.05568  [pdf, ps, other

    stat.ML cs.IT cs.LG math.PR math.ST

    On change of measure inequalities for $f$-divergences

    Authors: Antoine Picard-Weibel, Benjamin Guedj

    Abstract: We propose new change of measure inequalities based on $f$-divergences (of which the Kullback-Leibler divergence is a particular case). Our strategy relies on combining the Legendre transform of $f$-divergences and the Young-Fenchel inequality. By exploiting these new change of measure inequalities, we derive new PAC-Bayesian generalisation bounds with a complexity involving $f$-divergences, and h… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: 17 pages

  23. arXiv:2202.05560  [pdf, other

    stat.ML cs.LG math.ST

    Controlling Multiple Errors Simultaneously with a PAC-Bayes Bound

    Authors: Reuben Adams, John Shawe-Taylor, Benjamin Guedj

    Abstract: Current PAC-Bayes generalisation bounds are restricted to scalar metrics of performance, such as the loss or error rate. However, one ideally wants more information-rich certificates that control the entire distribution of possible outcomes, such as the distribution of the test loss in regression, or the probabilities of different mis classifications. We provide the first PAC-Bayes bound capable o… ▽ More

    Submitted 22 February, 2024; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: 31 pages

  24. arXiv:2202.01627  [pdf, ps, other

    cs.LG stat.ML

    Non-Vacuous Generalisation Bounds for Shallow Neural Networks

    Authors: Felix Biggs, Benjamin Guedj

    Abstract: We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks wi… ▽ More

    Submitted 15 June, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: 19 pages, 12 figures

    Journal ref: Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  25. arXiv:2202.00824  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    KSD Aggregated Goodness-of-fit Test

    Authors: Antonin Schrab, Benjamin Guedj, Arthur Gretton

    Abstract: We investigate properties of goodness-of-fit tests based on the Kernel Stein Discrepancy (KSD). We introduce a strategy to construct a test, called KSDAgg, which aggregates multiple tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide non-asympto… ▽ More

    Submitted 20 December, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 27 pages, 3 figures, Appendices A.4 and I.4 updated

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  26. arXiv:2111.07737  [pdf, other

    cs.LG cs.CV

    Progress in Self-Certified Neural Networks

    Authors: Maria Perez-Ortiz, Omar Rivasplata, Emilio Parrado-Hernandez, Benjamin Guedj, John Shawe-Taylor

    Abstract: A learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality with a tight statistical certificate that is valid on unseen data. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead not only to accurate predictors, but also to tight risk certificates, bearing promise towards achieving self-certi… ▽ More

    Submitted 10 December, 2021; v1 submitted 15 November, 2021; originally announced November 2021.

    Journal ref: Published at NeurIPS 2021 workshop: Bayesian Deep Learning

  27. arXiv:2110.15073  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    MMD Aggregated Two-Sample Test

    Authors: Antonin Schrab, Ilmun Kim, Mélisande Albert, Béatrice Laurent, Benjamin Guedj, Arthur Gretton

    Abstract: We propose two novel nonparametric two-sample kernel tests based on the Maximum Mean Discrepancy (MMD). First, for a fixed kernel, we construct an MMD test using either permutations or a wild bootstrap, two popular numerical procedures to determine the test threshold. We prove that this test controls the probability of type I error non-asymptotically. Hence, it can be used reliably even in setting… ▽ More

    Submitted 21 August, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: 81 pages

    Journal ref: Journal of Machine Learning Research 24(194), 1-81, 2023

  28. arXiv:2109.10304  [pdf, other

    cs.LG cs.CV

    Learning PAC-Bayes Priors for Probabilistic Neural Networks

    Authors: Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matthew Gleeson, **gyu Zhang, John Shawe-Taylor, Miroslaw Bober, Josef Kittler

    Abstract: Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data. This combination has been shown to lead not only to accurate classifiers, but also to remarkably tight risk certificates, bearing promise towards self-certified learning (i.e. use all the data to learn a predictor and certify its quality). In this work, we… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  29. arXiv:2107.03955  [pdf, other

    cs.LG math.ST

    On Margins and Derandomisation in PAC-Bayes

    Authors: Felix Biggs, Benjamin Guedj

    Abstract: We give a general recipe for derandomising PAC-Bayesian bounds using margins, with the critical ingredient being that our randomised predictions concentrate around some value. The tools we develop straightforwardly lead to margin bounds for various classifiers, including linear prediction -- a class that includes boosting and the support vector machine -- single-hidden-layer neural networks with a… ▽ More

    Submitted 23 February, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: 23 pages

    Journal ref: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022, Valencia, Spain. PMLR: Volume 151

  30. arXiv:2106.12535  [pdf, other

    cs.LG stat.ME stat.ML

    Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

    Authors: Valentina Zantedeschi, Paul Viallard, Emilie Morvant, Rémi Emonet, Amaury Habrard, Pascal Germain, Benjamin Guedj

    Abstract: We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective.… ▽ More

    Submitted 19 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  31. arXiv:2012.10369  [pdf, ps, other

    cs.LG math.ST stat.ML

    Upper and Lower Bounds on the Performance of Kernel PCA

    Authors: Maxime Haddouche, Benjamin Guedj, John Shawe-Taylor

    Abstract: Principal Component Analysis (PCA) is a popular method for dimension reduction and has attracted an unfailing interest for decades. More recently, kernel PCA (KPCA) has emerged as an extension of PCA but, despite its use in practice, a sound theoretical understanding of KPCA is missing. We contribute several lower and upper bounds on the efficiency of KPCA, involving the empirical eigenvalues of t… ▽ More

    Submitted 23 January, 2023; v1 submitted 18 December, 2020; originally announced December 2020.

    Comments: 16 pages

  32. arXiv:2012.03780  [pdf, other

    cs.LG math.ST stat.ML

    A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings

    Authors: Théophile Cantelobre, Benjamin Guedj, María Pérez-Ortiz, John Shawe-Taylor

    Abstract: Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of produ… ▽ More

    Submitted 21 December, 2020; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: 38 pages

  33. arXiv:2011.11820  [pdf, other

    stat.AP cs.LG math.OC stat.ML

    An end-to-end data-driven optimisation framework for constrained trajectories

    Authors: Florent Dewez, Benjamin Guedj, Arthur Talpaert, Vincent Vandewalle

    Abstract: Many real-world problems require to optimise trajectories under constraints. Classical approaches are based on optimal control methods but require an exact knowledge of the underlying dynamics, which could be challenging or even out of reach. In this paper, we leverage data-driven approaches to design a new end-to-end framework which is dynamics-free for optimised and realistic trajectories. We fi… ▽ More

    Submitted 5 February, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: 28 pages

  34. arXiv:2011.07866  [pdf, other

    cs.LG stat.CO stat.ME stat.ML

    Cluster-Specific Predictions with Multi-Task Gaussian Processes

    Authors: Arthur Leroy, Pierre Latouche, Benjamin Guedj, Servane Gey

    Abstract: A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variation… ▽ More

    Submitted 30 November, 2022; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: 47 pages

    Journal ref: Journal of Machine Learning Research (2023)

  35. Forecasting elections results via the voter model with stubborn nodes

    Authors: Antoine Vendeville, Benjamin Guedj, Shi Zhou

    Abstract: In this paper we propose a novel method to forecast the result of elections using only official results of previous ones. It is based on the voter model with stubborn nodes and uses theoretical results developed in a previous work of ours. We look at popular vote shares for the Conservative and Labour parties in the UK and the Republican and Democrat parties in the US. We are able to perform time-… ▽ More

    Submitted 12 October, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Journal ref: Applied Network Science 6, 1 (2021)

  36. arXiv:2007.10731  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    MAGMA: Inference and Prediction with Multi-Task Gaussian Processes

    Authors: Arthur Leroy, Pierre Latouche, Benjamin Guedj, Servane Gey

    Abstract: A novel multi-task Gaussian process (GP) framework is proposed, by using a common mean process for sharing information across tasks. In particular, we investigate the problem of time series forecasting, with the objective to improve multiple-step-ahead predictions. The common mean process is defined as a GP for which the hyper-posterior distribution is tractable. Therefore an EM algorithm is deriv… ▽ More

    Submitted 24 May, 2022; v1 submitted 21 July, 2020; originally announced July 2020.

    Journal ref: Machine Learning, 2022

  37. arXiv:2006.14763  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayesian Bound for the Conditional Value at Risk

    Authors: Zakaria Mhammedi, Benjamin Guedj, Robert C. Williamson

    Abstract: Conditional Value at Risk (CVaR) is a family of "coherent risk measures" which generalize the traditional mathematical expectation. Widely used in mathematical finance, it is garnering increasing interest in machine learning, e.g., as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the C… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Journal ref: NeurIPS 2020

  38. arXiv:2006.12228  [pdf, other

    cs.LG stat.ML

    Differentiable PAC-Bayes Objectives with Partially Aggregated Neural Networks

    Authors: Felix Biggs, Benjamin Guedj

    Abstract: We make three related contributions motivated by the challenge of training stochastic neural networks, particularly in a PAC-Bayesian setting: (1) we show how averaging over an ensemble of stochastic neural networks enables a new class of \emph{partially-aggregated} estimators; (2) we show that these lead to provably lower-variance gradient estimates for non-differentiable signed-output networks;… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Journal ref: Entropy 2021

  39. arXiv:2006.07279  [pdf, other

    stat.ML cs.LG math.ST

    PAC-Bayes unleashed: generalisation bounds with unbounded losses

    Authors: Maxime Haddouche, Benjamin Guedj, Omar Rivasplata, John Shawe-Taylor

    Abstract: We present new PAC-Bayesian generalisation bounds for learning problems with unbounded loss functions. This extends the relevance and applicability of the PAC-Bayes learning framework, where most of the existing literature focuses on supervised learning problems with a bounded loss function (typically assumed to take values in the interval [0;1]). In order to relax this assumption, we propose a ne… ▽ More

    Submitted 30 September, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 24 pages

    Journal ref: Entropy 2021

  40. Towards control of opinion diversity by introducing zealots into a polarised social group

    Authors: Antoine Vendeville, Benjamin Guedj, Shi Zhou

    Abstract: We explore a method to influence or even control the diversity of opinions within a polarised social group. We leverage the voter model in which users hold binary opinions and repeatedly update their beliefs based on others they connect with. Stubborn agents who never change their minds ("zealots") are also disseminated through the network, which is modelled by a connected graph. Building on earli… ▽ More

    Submitted 6 January, 2022; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 14 pages, 4 figures

    Journal ref: Proceedings of the Tenth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2021

  41. arXiv:2005.05286  [pdf, other

    stat.AP cs.LG stat.ME stat.ML

    From industry-wide parameters to aircraft-centric on-flight inference: improving aeronautics performance prediction with machine learning

    Authors: Florent Dewez, Benjamin Guedj, Vincent Vandewalle

    Abstract: Aircraft performance models play a key role in airline operations, especially in planning a fuel-efficient flight. In practice, manufacturers provide guidelines which are slightly modified throughout the aircraft life cycle via the tuning of a single factor, enabling better fuel predictions. However this has limitations, in particular they do not reflect the evolution of each feature impacting the… ▽ More

    Submitted 4 February, 2021; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: Published in Data-Centric Engineering

    Journal ref: Data-Centric Engineering 2020

  42. arXiv:1912.08311  [pdf, other

    cs.LG stat.CO stat.ML

    Kernel-Based Ensemble Learning in Python

    Authors: Benjamin Guedj, Bhargav Srinivasa Desikan

    Abstract: We propose a new supervised learning algorithm, for classification and regression problems where two or more preliminary predictors are available. We introduce \texttt{KernelCobra}, a non-linear learning strategy for combining an arbitrary number of initial predictors. \texttt{KernelCobra} builds on the COBRA algorithm introduced by \citet{biau2016cobra}, which combined estimators based on a notio… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

    Comments: 11 pages

    Journal ref: Information 2020, 11(2)

  43. arXiv:1910.04464  [pdf, ps, other

    cs.LG math.ST stat.ML

    PAC-Bayesian Contrastive Unsupervised Representation Learning

    Authors: Kento Nozawa, Pascal Germain, Benjamin Guedj

    Abstract: Contrastive unsupervised representation learning (CURL) is the state-of-the-art technique to learn representations (as a set of features) from unlabelled data. While CURL has collected several empirical successes recently, theoretical understanding of its performance was still missing. In a recent work, Arora et al. (2019) provide the first generalisation bounds for CURL, relying on a Rademacher c… ▽ More

    Submitted 17 July, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Published in the proceedings of the Conference on Uncertainty in Artificial Intelligence 2020 (UAI)

    Journal ref: PMLR, volume 124 (UAI 2020), 2020

  44. arXiv:1910.04460  [pdf, other

    cs.LG math.ST stat.ML

    Still no free lunches: the price to pay for tighter PAC-Bayes bounds

    Authors: Benjamin Guedj, Louis Pujol

    Abstract: "No free lunch" results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling. Some models are expensive (strong assumptions, such as as subgaussian tails), others are cheap (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interest… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Journal ref: Entropy 2021

  45. arXiv:1909.06861  [pdf, other

    cs.LG stat.ML

    Online k-means Clustering

    Authors: Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

    Abstract: We study the problem of online clustering where a clustering algorithm has to assign a new point that arrives to one of $k$ clusters. The specific formulation we use is the $k$-means objective: At each time step the algorithm has to maintain a set of k candidate centers and the loss incurred is the squared distance between the new point and the closest center. The goal is to minimize regret with r… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: 11 pages, 1 figure

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 130:1126-1134, 2021

  46. arXiv:1905.13367  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayes Un-Expected Bernstein Inequality

    Authors: Zakaria Mhammedi, Peter D. Grunwald, Benjamin Guedj

    Abstract: We present a new PAC-Bayesian generalization bound. Standard bounds contain a $\sqrt{L_n \cdot \KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithm's randomized predictions, vanishes. We manage to replace $L_n$ by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset a… ▽ More

    Submitted 3 November, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: 24 pages, 6 figures. To Appear in NeurIPS2019

    Journal ref: NeurIPS 2019

  47. Attributing and Referencing (Research) Software: Best Practices and Outlook from Inria

    Authors: Pierre Alliez, Roberto Di Cosmo, Benjamin Guedj, Alain Girault, Mohand-Said Hacid, Arnaud Legrand, Nicolas P. Rougier

    Abstract: Software is a fundamental pillar of modern scientiic research, not only in computer science, but actually across all elds and disciplines. However, there is a lack of adequate means to cite and reference software, for many reasons. An obvious rst reason is software authorship, which can range from a single developer to a whole team, and can even vary in time. The panorama is even more complex than… ▽ More

    Submitted 25 November, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

    Journal ref: Computing in Science \& Engineering 2020

  48. arXiv:1905.10259  [pdf, other

    cs.LG stat.ML

    Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks

    Authors: Gaël Letarte, Pascal Germain, Benjamin Guedj, François Laviolette

    Abstract: We present a comprehensive study of multilayer neural networks with binary activation, relying on the PAC-Bayesian theory. Our contributions are twofold: (i) we develop an end-to-end framework to train a binary activated deep neural network, (ii) we provide nonvacuous PAC-Bayesian generalization bounds for binary activated deep neural networks. Our results are obtained by minimizing the expected l… ▽ More

    Submitted 4 February, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

    Journal ref: NeurIPS 2019

  49. arXiv:1905.10201  [pdf, other

    cs.LG stat.ML

    Model Validation Using Mutated Training Labels: An Exploratory Study

    Authors: Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor

    Abstract: We introduce an exploratory study on Mutation Validation (MV), a model validation method using mutated training labels for supervised learning. MV mutates training data labels, retrains the model against the mutated data, then uses the metamorphic relation that captures the consequent training performance changes to assess model fit. It does not use a validation set or test set. The intuition unde… ▽ More

    Submitted 20 October, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

  50. arXiv:1904.00865  [pdf, other

    stat.ML cs.CV cs.LG eess.IV

    Non-linear aggregation of filters to improve image denoising

    Authors: Benjamin Guedj, Juliette Rengot

    Abstract: We introduce a novel aggregation method to efficiently perform image denoising. Preliminary filters are aggregated in a non-linear fashion, using a new metric of pixel proximity based on how the pool of filters reaches a consensus. We provide a theoretical bound to support our aggregation scheme, its numerical performance is illustrated and we show that the aggregate significantly outperforms each… ▽ More

    Submitted 23 June, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: To appear at Computing Conference 2020

    Journal ref: Computing Conference 2020