Skip to main content

Showing 1–7 of 7 results for author: Morvan, M L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.16315  [pdf, other

    cs.LG cs.AI stat.ML

    Beyond calibration: estimating the grou** loss of modern neural networks

    Authors: Alexandre Perez-Lebel, Marine Le Morvan, Gaël Varoquaux

    Abstract: The ability to ensure that a classifier gives reliable confidence scores is essential to ensure informed decision-making. To this end, recent work has focused on miscalibration, i.e., the over or under confidence of model scores. Yet calibration is not enough: even a perfectly calibrated classifier with the best possible accuracy can have confidence scores that are far from the true posterior prob… ▽ More

    Submitted 27 April, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Journal ref: ICLR 2023 -- The Eleventh International Conference on Learning Representations, May 2023, Kigali, Rwanda

  2. arXiv:2202.10580  [pdf, other

    cs.LG cs.AI

    Benchmarking missing-values approaches for predictive models on health databases

    Authors: Alexandre Perez-Lebel, Gaël Varoquaux, Marine Le Morvan, Julie Josse, Jean-Baptiste Poline

    Abstract: BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative -- rather than generative -- modeling,… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: GigaScience, Oxford Univ Press, In press

  3. arXiv:2106.00311  [pdf, other

    stat.ML cs.AI cs.LG

    What's a good imputation to predict with missing values?

    Authors: Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux

    Abstract: How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all mi… ▽ More

    Submitted 30 November, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  4. arXiv:2007.01627  [pdf, other

    cs.LG cs.AI stat.ML

    NeuMiss networks: differentiable programming for supervised learning with missing values

    Authors: Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux

    Abstract: The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing pattern… ▽ More

    Submitted 4 November, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Journal ref: Advances in Neural Information Processing Systems 33, Dec 2020, Vancouver, Canada

  5. arXiv:2002.00658  [pdf, other

    cs.LG cs.AI stat.ML

    Linear predictor on linearly-generated data with missing values: non consistency and solutions

    Authors: Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux

    Abstract: We consider building predictors when the data have missing values. We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data and we show that, in the presence of missing values, the optimal predictor may not be linear. In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and t… ▽ More

    Submitted 12 May, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Journal ref: Proceedings of Machine Learning Research, PMLR, In press

  6. arXiv:1802.05980  [pdf, other

    q-bio.QM cs.LG stat.ML

    WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models

    Authors: Marine Le Morvan, Jean-Philippe Vert

    Abstract: Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

  7. arXiv:1706.00244  [pdf, other

    stat.ML cs.LG q-bio.QM

    Supervised Quantile Normalisation

    Authors: Marine Le Morvan, Jean-Philippe Vert

    Abstract: Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after normalisation, they follow the same target distribution for each sample. Choosing a "good" target distribution remains however largely empirical and heuristic, and is… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.