Skip to main content

Showing 1–8 of 8 results for author: Sportisse, A

.
  1. arXiv:2304.08054  [pdf, other

    stat.ML cs.LG

    Fed-MIWAE: Federated Imputation of Incomplete Data via Deep Generative Models

    Authors: Irene Balelli, Aude Sportisse, Francesco Cremonesi, Pierre-Alexandre Mattei, Marco Lorenzi

    Abstract: Federated learning allows for the training of machine learning models on multiple decentralized local datasets without requiring explicit data exchange. However, data pre-processing, including strategies for handling missing data, remains a major bottleneck in real-world federated learning deployment, and is typically performed locally. This approach may be biased, since the subpopulations locally… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  2. arXiv:2302.07540  [pdf, other

    stat.ML

    Are labels informative in semi-supervised learning? -- Estimating and leveraging the missing-data mechanism

    Authors: Aude Sportisse, Hugo Schmutz, Olivier Humbert, Charles Bouveyron, Pierre-Alexandre Mattei

    Abstract: Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of ``informative'' labels, which occur when some classes are more likely to be labeled than others. In the missing data literature, such labels are called missing not at random. In this paper, we propose a novel approach to address this issue by… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  3. arXiv:2112.10425  [pdf, other

    stat.ML cs.LG

    Model-based Clustering with Missing Not At Random Data

    Authors: Aude Sportisse, Matthieu Marbac, Fabien Laporte, Gilles Celeux, Claire Boyer, Julie Josse, Christophe Biernacki

    Abstract: Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 December, 2021; originally announced December 2021.

  4. arXiv:2005.05628  [pdf, other

    stat.AP stat.ME stat.ML

    Robust Lasso-Zero for sparse corruption and model selection with missing covariates

    Authors: Pascaline Descloux, Claire Boyer, Julie Josse, Aude Sportisse, Sylvain Sardy

    Abstract: We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology, initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased for variable selection with missing values i… ▽ More

    Submitted 23 March, 2022; v1 submitted 12 May, 2020; originally announced May 2020.

  5. arXiv:2002.09338  [pdf, other

    math.ST

    Debiasing Stochastic Gradient Descent to handle missing values

    Authors: Julie Josse, Aude Sportisse, Claire Boyer, Aymeric Dieuleveut

    Abstract: Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning.However, a major caveat of large data is their incompleteness.We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for het… ▽ More

    Submitted 8 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  6. arXiv:1908.04822  [pdf, other

    stat.ME

    R-miss-tastic: a unified platform for missing values methods and workflows

    Authors: Imke Mayer, Aude Sportisse, Julie Josse, Nicholas Tierney, Nathalie Vialaneix

    Abstract: Missing values are unavoidable when working with data. Their occurrence is exacerbated as more data from different sources become available. However, most statistical models and visualization methods require complete data, and improper handling of missing data results in information loss or biased analyses. Since the seminal work of Rubin (1976), a burgeoning literature on missing values has arise… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 August, 2019; originally announced August 2019.

    Comments: 36 pages, 9 figures

  7. arXiv:1906.02493  [pdf, other

    math.ST

    Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

    Authors: Aude Sportisse, Claire Boyer, Julie Josse

    Abstract: Missing Not At Random (MNAR) values lead to significant biases in the data, since the probability of missingness depends on the unobserved values.They are ''not ignorable'' in the sense that they often require defining a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the di… ▽ More

    Submitted 10 June, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

  8. arXiv:1812.11409  [pdf, other

    stat.ML cs.LG

    Imputation and low-rank estimation with Missing Not At Random data

    Authors: Aude Sportisse, Claire Boyer, Julie Josse

    Abstract: Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matri… ▽ More

    Submitted 29 January, 2020; v1 submitted 29 December, 2018; originally announced December 2018.