Skip to main content

Showing 1–50 of 50 results for author: Josse, J

.
  1. arXiv:2405.15641  [pdf, other

    stat.ME

    Predictive Uncertainty Quantification with Missing Covariates

    Authors: Margaux Zaffran, Julie Josse, Yaniv Romano, Aymeric Dieuleveut

    Abstract: Predictive uncertainty quantification is crucial in decision-making problems. We investigate how to adequately quantify predictive uncertainty with missing covariates. A bottleneck is that missing values induce heteroskedasticity on the response's predictive distribution given the observed covariates. Thus, we focus on building predictive sets for the response that are valid conditionally to the m… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2403.19196  [pdf, other

    math.ST

    What Is a Good Imputation Under MAR Missingness?

    Authors: Jeffrey Näf, Erwan Scornet, Julie Josse

    Abstract: Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. The present paper attempts to take a step back and provide a more systematic analysis. Starting from an in-depth discussion of the Missing at Random (MAR) condition for nonparametric imputation, we first develop an id… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  3. arXiv:2312.00448  [pdf, other

    stat.CO

    AdaptiveConformal: An R Package for Adaptive Conformal Inference

    Authors: Herbert Susmann, Antoine Chambaz, Julie Josse

    Abstract: Conformal Inference (CI) is a popular approach for generating finite sample prediction intervals based on the output of any point prediction method when data are exchangeable. Adaptive Conformal Inference (ACI) algorithms extend CI to the case of sequentially observed data, such as time series, and exhibit strong theoretical guarantees without having to assume exchangeability of the observed data.… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  4. arXiv:2310.12115  [pdf, other

    stat.ML cs.LG stat.ME

    MMD-based Variable Importance for Distributional Random Forest

    Authors: Clément Bénard, Jeffrey Näf, Julie Josse

    Abstract: Distributional Random Forest (DRF) is a flexible forest-based method to estimate the full conditional distribution of a multivariate output of interest given input variables. In this article, we introduce a variable importance algorithm for DRFs, based on the well-established drop and relearn principle and MMD distance. While traditional importance measures only detect variables with an influence… ▽ More

    Submitted 14 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

  5. arXiv:2310.06969  [pdf, other

    stat.ME cs.LG stat.ML

    Positivity-free Policy Learning with Observational Data

    Authors: Pan Zhao, Antoine Chambaz, Julie Josse, Shu Yang

    Abstract: Policy learning utilizing observational data is pivotal across various domains, with the objective of learning the optimal treatment assignment policy while adhering to specific constraints such as fairness, budget, and simplicity. This study introduces a novel positivity-free (stochastic) policy learning framework designed to address the challenges posed by the impracticality of the positivity as… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2308.03369  [pdf, ps, other

    stat.ML

    Variable importance for causal forests: breaking down the heterogeneity of treatment effects

    Authors: Clément Bénard, Julie Josse

    Abstract: Causal random forests provide efficient estimates of heterogeneous treatment effects. However, forest algorithms are also well-known for their black-box nature, and therefore, do not characterize how input variables are involved in treatment effect heterogeneity, which is a strong practical limitation. In this article, we develop a new importance variable algorithm for causal forests, to quantify… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  7. arXiv:2306.02732  [pdf, other

    stat.ML cs.LG

    Conformal Prediction with Missing Values

    Authors: Margaux Zaffran, Aymeric Dieuleveut, Julie Josse, Yaniv Romano

    Abstract: Conformal prediction is a theoretically grounded framework for constructing predictive intervals. We study conformal prediction with missing values in the covariates -- a setting that brings new challenges to uncertainty quantification. We first show that the marginal coverage guarantee of conformal prediction holds on imputed data for any missingness distribution and almost all imputation functio… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Code for our experiments can be found at https://github.com/mzaffran/ConformalPredictionMissingValues . To be published in the proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA

  8. arXiv:2303.16008  [pdf, other

    stat.ME

    Risk ratio, odds ratio, risk difference... Which causal measure is easier to generalize?

    Authors: Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet

    Abstract: There are many measures to report so-called treatment or causal effects: absolute difference, ratio, odds ratio, number needed to treat, and so on. The choice of a measure, eg absolute versus relative, is often debated because it leads to different impressions of the benefit or risk of a treatment. Besides, different causal measures may lead to various treatment effect heterogeneity: some input va… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

  9. arXiv:2301.05491  [pdf, other

    stat.ME stat.ML

    Efficient and robust transfer learning of optimal individualized treatment regimes with right-censored survival data

    Authors: Pan Zhao, Julie Josse, Shu Yang

    Abstract: An individualized treatment regime (ITR) is a decision rule that assigns treatments based on patients' characteristics. The value function of an ITR is the expected outcome in a counterfactual world had this ITR been implemented. Recently, there has been increasing interest in combining heterogeneous data sources, such as leveraging the complementary features of randomized controlled trial (RCT) d… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  10. arXiv:2208.07614  [pdf, other

    stat.ME

    Reweighting the RCT for generalization: finite sample error and variable selection

    Authors: Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet

    Abstract: Randomized Controlled Trials (RCTs) may suffer from limited scope. In particular, samples may be unrepresentative: some RCTs over- or under- sample individuals with certain characteristics compared to the target population, for which one wants conclusions on treatment effectiveness. Re-weighting trial individuals to match the target population can improve the treatment effect estimation. In this w… ▽ More

    Submitted 13 March, 2024; v1 submitted 16 August, 2022; originally announced August 2022.

  11. arXiv:2202.10580  [pdf, other

    cs.LG cs.AI

    Benchmarking missing-values approaches for predictive models on health databases

    Authors: Alexandre Perez-Lebel, Gaël Varoquaux, Marine Le Morvan, Julie Josse, Jean-Baptiste Poline

    Abstract: BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative -- rather than generative -- modeling,… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: GigaScience, Oxford Univ Press, In press

  12. arXiv:2202.07282  [pdf, other

    stat.ML cs.LG

    Adaptive Conformal Predictions for Time Series

    Authors: Margaux Zaffran, Aymeric Dieuleveut, Olivier Féron, Yannig Goude, Julie Josse

    Abstract: Uncertainty quantification of predictive models is crucial in decision-making problems. Conformal prediction is a general and theoretically sound answer. However, it requires exchangeable data, excluding time series. While recent works tackled this issue, we argue that Adaptive Conformal Inference (ACI, Gibbs and Cand{è}s, 2021), developed for distribution-shift time series, is a good procedure fo… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  13. arXiv:2112.10425  [pdf, other

    stat.ML cs.LG

    Model-based Clustering with Missing Not At Random Data

    Authors: Aude Sportisse, Matthieu Marbac, Fabien Laporte, Gilles Celeux, Claire Boyer, Julie Josse, Christophe Biernacki

    Abstract: Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (… ▽ More

    Submitted 22 December, 2023; v1 submitted 20 December, 2021; originally announced December 2021.

  14. arXiv:2106.00311  [pdf, other

    stat.ML cs.AI cs.LG

    What's a good imputation to predict with missing values?

    Authors: Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux

    Abstract: How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all mi… ▽ More

    Submitted 30 November, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  15. Causal effect on a target population: a sensitivity analysis to handle missing covariates

    Authors: Bénédicte Colnet, Julie Josse, Erwan Scornet, Gaël Varoquaux

    Abstract: Randomized Controlled Trials (RCTs) are often considered the gold standard for estimating causal effect, but they may lack external validity when the population eligible to the RCT is substantially different from the target population. Having at hand a sample of the target population of interest allows us to generalize the causal effect. Identifying the treatment effect in the target population re… ▽ More

    Submitted 10 January, 2023; v1 submitted 13 May, 2021; originally announced May 2021.

  16. arXiv:2104.12639  [pdf, other

    stat.ME stat.AP

    Generalizing treatment effects with incomplete covariates: identifying assumptions and multiple imputation algorithms

    Authors: Imke Mayer, Julie Josse, Traumabase Group

    Abstract: We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect id… ▽ More

    Submitted 24 February, 2023; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: preprint, 38 pages, 14 figures

    MSC Class: 62P10; 93C41 ACM Class: G.3

  17. arXiv:2011.08047  [pdf, other

    stat.ME

    Causal inference methods for combining randomized trials and observational studies: a review

    Authors: Bénédicte Colnet, Imke Mayer, Guanhua Chen, Awa Dieng, Ruohong Li, Gaël Varoquaux, Jean-Philippe Vert, Julie Josse, Shu Yang

    Abstract: With increasing data availability, causal effects can be evaluated across different data sets, both randomized controlled trials (RCTs) and observational studies. RCTs isolate the effect of the treatment from that of unwanted (confounding) co-occurring effects but they may suffer from unrepresentativeness, and thus lack external validity. On the other hand, large observational samples are often mo… ▽ More

    Submitted 10 January, 2023; v1 submitted 16 November, 2020; originally announced November 2020.

  18. arXiv:2011.06501  [pdf, other

    stat.CO stat.AP

    VARCLUST: clustering variables using dimensionality reduction

    Authors: Piotr Sobczyk, Stanislaw Wilczynski, Malgorzata Bogdan, Piotr Graczyk, Julie Josse, Fabien Panloup, Valérie Seegers, Mateusz Staniak

    Abstract: VARCLUST algorithm is proposed for clustering variables under the assumption that variables in a given cluster are linear combinations of a small number of hidden latent variables, corrupted by the random noise. The entire clustering task is viewed as the problem of selection of the statistical model, which is defined by the number of clusters, the partition of variables into these clusters and th… ▽ More

    Submitted 18 December, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: 24 pages, 34 figures

  19. arXiv:2007.01627  [pdf, other

    cs.LG cs.AI stat.ML

    NeuMiss networks: differentiable programming for supervised learning with missing values

    Authors: Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux

    Abstract: The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing pattern… ▽ More

    Submitted 4 November, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Journal ref: Advances in Neural Information Processing Systems 33, Dec 2020, Vancouver, Canada

  20. arXiv:2005.05628  [pdf, other

    stat.AP stat.ME stat.ML

    Robust Lasso-Zero for sparse corruption and model selection with missing covariates

    Authors: Pascaline Descloux, Claire Boyer, Julie Josse, Aude Sportisse, Sylvain Sardy

    Abstract: We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology, initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is showcased for variable selection with missing values i… ▽ More

    Submitted 23 March, 2022; v1 submitted 12 May, 2020; originally announced May 2020.

  21. arXiv:2002.10837  [pdf, ps, other

    stat.ME cs.LG stat.ML

    MissDeepCausal: Causal Inference from Incomplete Data Using Deep Latent Variable Models

    Authors: Imke Mayer, Julie Josse, Félix Raimundo, Jean-Philippe Vert

    Abstract: Inferring causal effects of a treatment, intervention or policy from observational data is central to many applications. However, state-of-the-art methods for causal inference seldom consider the possibility that covariates have missing values, which is ubiquitous in many real-world analyses. Missing data greatly complicate causal inference procedures as they require an adapted unconfoundedness hy… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

  22. arXiv:2002.09338  [pdf, other

    math.ST

    Debiasing Stochastic Gradient Descent to handle missing values

    Authors: Julie Josse, Aude Sportisse, Claire Boyer, Aymeric Dieuleveut

    Abstract: Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning.However, a major caveat of large data is their incompleteness.We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for het… ▽ More

    Submitted 8 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  23. arXiv:2002.03860  [pdf, other

    stat.ML cs.LG

    Missing Data Imputation using Optimal Transport

    Authors: Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

    Abstract: Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize… ▽ More

    Submitted 1 July, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  24. arXiv:2002.00658  [pdf, other

    cs.LG cs.AI stat.ML

    Linear predictor on linearly-generated data with missing values: non consistency and solutions

    Authors: Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux

    Abstract: We consider building predictors when the data have missing values. We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data and we show that, in the presence of missing values, the optimal predictor may not be linear. In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and t… ▽ More

    Submitted 12 May, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Journal ref: Proceedings of Machine Learning Research, PMLR, In press

  25. arXiv:1910.10624  [pdf, other

    stat.ME

    Doubly robust treatment effect estimation with missing attributes

    Authors: Imke Mayer, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager, Julie Josse

    Abstract: Missing attributes are ubiquitous in causal inference, as they are in most applied statistical work. In this paper, we consider various sets of assumptions under which causal inference is possible despite missing attributes and discuss corresponding approaches to average treatment effect estimation, including generalized propensity score methods and multiple imputation. Across an extensive simulat… ▽ More

    Submitted 22 May, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    MSC Class: 93C41; 62G35; 62F35; 62P10

  26. arXiv:1909.06631  [pdf, other

    stat.ME stat.AP stat.CO

    Adaptive Bayesian SLOPE -- High-dimensional Model Selection with Missing Values

    Authors: Wei Jiang, Malgorzata Bogdan, Julie Josse, Blazej Miasojedow, Veronika Rockova, TraumaBase Group

    Abstract: We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure -- adaptive Bayesian SLOPE -- which effectively combines the SLOPE method (sorted $l_1$ regularization) together with the Spike-and-Slab LASSO method. We position our approach within a Baye… ▽ More

    Submitted 6 November, 2019; v1 submitted 14 September, 2019; originally announced September 2019.

    Comments: R package https://github.com/wjiang94/ABSLOPE

  27. arXiv:1908.04822  [pdf, other

    stat.ME

    R-miss-tastic: a unified platform for missing values methods and workflows

    Authors: Imke Mayer, Aude Sportisse, Julie Josse, Nicholas Tierney, Nathalie Vialaneix

    Abstract: Missing values are unavoidable when working with data. Their occurrence is exacerbated as more data from different sources become available. However, most statistical models and visualization methods require complete data, and improper handling of missing data results in information loss or biased analyses. Since the seminal work of Rubin (1976), a burgeoning literature on missing values has arise… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 August, 2019; originally announced August 2019.

    Comments: 36 pages, 9 figures

  28. arXiv:1906.02493  [pdf, other

    math.ST

    Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

    Authors: Aude Sportisse, Claire Boyer, Julie Josse

    Abstract: Missing Not At Random (MNAR) values lead to significant biases in the data, since the probability of missingness depends on the unobserved values.They are ''not ignorable'' in the sense that they often require defining a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the di… ▽ More

    Submitted 10 June, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

  29. arXiv:1902.06931  [pdf, other

    stat.ML cs.LG math.ST

    On the consistency of supervised learning with missing values

    Authors: Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaël Varoquaux

    Abstract: In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two appr… ▽ More

    Submitted 21 March, 2024; v1 submitted 19 February, 2019; originally announced February 2019.

  30. arXiv:1812.11409  [pdf, other

    stat.ML cs.LG

    Imputation and low-rank estimation with Missing Not At Random data

    Authors: Aude Sportisse, Claire Boyer, Julie Josse

    Abstract: Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matri… ▽ More

    Submitted 29 January, 2020; v1 submitted 29 December, 2018; originally announced December 2018.

  31. arXiv:1812.08398  [pdf, other

    stat.ML cs.LG

    Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames

    Authors: Geneviève Robin, Hoi-To Wai, Julie Josse, Olga Klopp, Éric Moulines

    Abstract: Many applications of machine learning involve the analysis of large data frames-matrices collecting heterogeneous measurements (binary, numerical, counts, etc.) across samples-with missing values. Low-rank models, as studied by Udell et al. [30], are popular in this framework for tasks such as visualization, clustering and missing value imputation. Yet, available methods with statistical guarantee… ▽ More

    Submitted 20 December, 2018; originally announced December 2018.

  32. arXiv:1806.09734  [pdf, other

    stat.ME

    Main effects and interactions in mixed and incomplete data frames

    Authors: Geneviève Robin, Olga Klopp, Julie Josse, Éric Moulines, Robert Tibshirani

    Abstract: A mixed data frame (MDF) is a table collecting categorical, numerical and count observations. The use of MDF is widespread in statistics and the applications are numerous from abundance data in ecology to recommender systems. In many cases, an MDF exhibits simultaneously main effects, such as row, column or group effects and interactions, for which a low-rank model has often been suggested. Althou… ▽ More

    Submitted 26 March, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: 25 pages, 1 figure, 4 tables

  33. arXiv:1805.04602  [pdf, other

    stat.ME

    Logistic Regression with Missing Covariates -- Parameter Estimation, Model Selection and Prediction within a Joint-Modeling Framework

    Authors: Wei Jiang, Julie Josse, Marc Lavielle, TraumaBase Group

    Abstract: Logistic regression is a common classification method in supervised learning. Surprisingly, there are very few solutions for performing logistic regression with missing values in the covariates. We suggest a complete approach based on a stochastic approximation version of the EM algorithm to do statistical inference with missing values including the estimation of the parameters and their variance,… ▽ More

    Submitted 7 August, 2019; v1 submitted 11 May, 2018; originally announced May 2018.

    Comments: R package misaem https://CRAN.R-project.org/package=misaem, R implementations https://github.com/wjiang94/miSAEM_logReg

  34. arXiv:1804.11087  [pdf, other

    stat.AP

    Imputation of mixed data with multilevel singular value decomposition

    Authors: François Husson, Julie Josse, Balasubramanian Narasimhan, Geneviève Robin

    Abstract: Statistical analysis of large data sets offers new opportunities to better understand many processes. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources. As a consequence, such data sets often contain mixed data, i.e. both quantitative and qualitative and many missing values. Furthermore, aggregated data present a natural \textit{multilevel} struct… ▽ More

    Submitted 30 April, 2018; originally announced April 2018.

  35. arXiv:1707.09508  [pdf, ps, other

    stat.ME

    Empirical Bayes approaches to PageRank type algorithms for rating scientific journals

    Authors: Jean-Louis Foulley, Gilles Celeux, Julie Josse

    Abstract: Following criticisms against the journal Impact Factor, new journal influence scores have been developed such as the Eigenfactor or the Prestige Scimago Journal Rank. They are based on PageRank type algorithms on the cross-citations transition matrix of the citing-cited network. The PageRank algorithm performs a smoothing of the transition matrix combining a random walk on the data network and a t… ▽ More

    Submitted 23 January, 2018; v1 submitted 29 July, 2017; originally announced July 2017.

  36. arXiv:1705.03727  [pdf, ps, other

    stat.ME

    Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig

    Authors: Gilles Celeux, Jack Jewson, Julie Josse, Jean-Michel Marin, Christian P. Robert

    Abstract: This note is a collection of several discussions of the paper "Beyond subjective and objective in statistics", read by A. Gelman and C. Hennig to the Royal Statistical Society on April 12, 2017, and to appear in the Journal of the Royal Statistical Society, Series A.

    Submitted 10 May, 2017; originally announced May 2017.

  37. arXiv:1703.02296  [pdf, other

    stat.ME

    Low-rank model with covariates for count data analysis

    Authors: Geneviève Robin, Julie Josse, Eric Moulines, Sylvain Sardy

    Abstract: Count data are collected in many scientific and engineering tasks including image processing, single-cell RNA sequencing and ecological studies. Such data sets often contain missing values, for example because some ecological sites cannot be reached in a certain year. In addition, in many instances, side information is also available, for example covariates about ecological sites or species. Low-r… ▽ More

    Submitted 24 October, 2018; v1 submitted 7 March, 2017; originally announced March 2017.

  38. arXiv:1701.03513  [pdf, other

    stat.ME

    Nonparametric imputation by data depth

    Authors: Pavlo Mozharovskyi, Julie Josse, Francois Husson

    Abstract: We present single imputation method for missing values which borrows the idea of data depth---a measure of centrality defined for an arbitrary point of a space with respect to a probability distribution or data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. For each singl… ▽ More

    Submitted 6 August, 2018; v1 submitted 12 January, 2017; originally announced January 2017.

  39. arXiv:1606.05333  [pdf, other

    stat.ME

    Bayesian dimensionality reduction with PCA using penalized semi-integrated likelihood

    Authors: Piotr Sobczyk, Malgorzata Bogdan, Julie Josse

    Abstract: We discuss the problem of estimating the number of principal components in Principal Com- ponents Analysis (PCA). Despite of the importance of the problem and the multitude of solutions proposed in the literature, it comes as a surprise that there does not exist a coherent asymptotic framework which would justify different approaches depending on the actual size of the data set. In this paper we a… ▽ More

    Submitted 5 July, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: 31 pages, 7 figures

    MSC Class: 62H25

  40. arXiv:1605.04212  [pdf, other

    stat.ME

    Multiple Correspondence Analysis & the Multilogit Bilinear Model

    Authors: William Fithian, Julie Josse

    Abstract: Multiple Correspondence Analysis (MCA) is a dimension reduction method which plays a large role in the analysis of tables with categorical nominal variables such as survey data. Though it is usually motivated and derived using geometric considerations, in fact we prove that it amounts to a single proximal Newtown step of a natural bilinear exponential family model for categorical data the multinom… ▽ More

    Submitted 13 May, 2016; originally announced May 2016.

  41. arXiv:1603.03174  [pdf, other

    stat.ME

    Multinomial Multiple Correspondence Analysis

    Authors: Patrick J. F. Groenen, Julie Josse

    Abstract: Relations between categorical variables can be analyzed conveniently by multiple correspondence analysis (MCA). %It is well suited to discover relations that may exist between categories of different variables. The graphical representation of MCA results in so-called biplots makes it easy to interpret the most important associations. However, a major drawback of MCA is that it does not have an und… ▽ More

    Submitted 10 March, 2016; originally announced March 2016.

  42. arXiv:1602.01206  [pdf, other

    stat.AP stat.ME

    denoiseR: A Package for Low Rank Matrix Estimation

    Authors: Julie Josse, Sylvain Sardy, Stefan Wager

    Abstract: We introduce denoiseR, an R package that provides a unified implementation of several state-of-the-art proposals for regularized low rank matrix estimation, along with automatic selection of the regularization parameters. We also extend these methods to allow for missing values. The regularization schemes discussed in this paper are built around singular-value shrinkage and bootstrap-based stabili… ▽ More

    Submitted 8 August, 2018; v1 submitted 3 February, 2016; originally announced February 2016.

  43. arXiv:1505.08116  [pdf, other

    stat.ME

    MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis

    Authors: Vincent Audigier, François Husson, Julie Josse

    Abstract: We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimatin… ▽ More

    Submitted 29 May, 2015; originally announced May 2015.

  44. arXiv:1410.8275  [pdf, other

    stat.ME cs.LG stat.ML

    Bootstrap-Based Regularization for Low-Rank Matrix Estimation

    Authors: Julie Josse, Stefan Wager

    Abstract: We develop a flexible framework for low-rank matrix estimation that allows us to transform noise models into regularization schemes via a simple bootstrap algorithm. Effectively, our procedure seeks an autoencoding basis for the observed matrix that is stable with respect to the specified noise model; we call the resulting procedure a stable autoencoder. In the simplest case, with an isotropic noi… ▽ More

    Submitted 28 June, 2016; v1 submitted 30 October, 2014; originally announced October 2014.

    Comments: To appear in the Journal of Machine Learning Research

  45. arXiv:1407.7614  [pdf, other

    stat.ME

    Confidence Areas for Fixed-Effects PCA

    Authors: Julie Josse, Stefan Wager, François Husson

    Abstract: PCA is often used to visualize data when the rows and the columns are both of interest. In such a setting there is a lack of inferential methods on the PCA output. We study the asymptotic variance of a fixed-effects model for PCA, and propose several approaches to assessing the variability of PCA estimates: a method based on a parametric bootstrap, a new cell-wise jackknife, as well as a computati… ▽ More

    Submitted 28 July, 2014; originally announced July 2014.

  46. arXiv:1401.5747  [pdf, ps, other

    stat.ME

    Multiple imputation for continuous variables using a Bayesian principal component analysis

    Authors: Vincent Audigier, François Husson, Julie Josse

    Abstract: We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fu… ▽ More

    Submitted 19 August, 2015; v1 submitted 22 January, 2014; originally announced January 2014.

    Comments: 16 pages

    MSC Class: 62H25; 62F10; 62F40; 62F15

  47. arXiv:1310.6602  [pdf, other

    stat.ME

    Adaptive Shrinkage of singular values

    Authors: Julie Josse, Sylvain Sardy

    Abstract: To recover a low rank structure from a noisy matrix, truncated singular value decomposition has been extensively used and studied. Recent studies suggested that the signal can be better estimated by shrinking the singular values. We pursue this line of research and propose a new estimator offering a continuum of thresholding and shrinking functions. To avoid an unstable and costly cross-validation… ▽ More

    Submitted 22 November, 2014; v1 submitted 24 October, 2013; originally announced October 2013.

  48. arXiv:1307.7383  [pdf, other

    stat.ME

    Measures of dependence between random vectors and tests of independence. Literature review

    Authors: Julie Josse, Susan Holmes

    Abstract: Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients have been adopted by different research communities. Scientists use these coefficients to test whether two random vectors are linked. If they are, it is… ▽ More

    Submitted 17 August, 2014; v1 submitted 28 July, 2013; originally announced July 2013.

    Comments: Incorporated new section on actual examples of data analyses

    MSC Class: 62H20

  49. arXiv:1301.4797  [pdf, other

    stat.AP

    A principal components method to impute missing values for mixed data

    Authors: Vincent Audigier, François Husson, Julie Josse

    Abstract: We propose a new method to impute missing values in mixed datasets. It is based on a principal components method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical in the construction of the dimensions of variability. Because the imputation uses the principal axes and components, the prediction of the missing values are bas… ▽ More

    Submitted 19 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

  50. arXiv:1301.4649  [pdf, other

    stat.ME

    Regularised PCA to denoise and visualise data

    Authors: Marie Verbanck, Julie Josse, François Husson

    Abstract: Principal component analysis (PCA) is a well-established method commonly used to explore and visualise data. A classical PCA model is the fixed effect model where data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression… ▽ More

    Submitted 9 May, 2013; v1 submitted 20 January, 2013; originally announced January 2013.