Skip to main content

Showing 1–9 of 9 results for author: Chidambaram, M

.
  1. arXiv:2406.04068  [pdf, other

    cs.LG math.ST stat.ML

    Reassessing How to Compare and Improve the Calibration of Machine Learning Models

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. This property has become increasingly important as the impact of machine learning models has continued to spread to various domains. As a result, there are now a dizzying number of recent papers on measuring and improving the calibr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 20 pages, 7 figures

  2. arXiv:2402.10046  [pdf, other

    cs.LG math.PR

    How Flawed Is ECE? An Analysis via Logit Smoothing

    Authors: Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov

    Abstract: Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fu… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 23 pages, 6 figures

    MSC Class: 68T37 (Primary) 62-08; 60E05 (Secondary)

  3. arXiv:2402.06855  [pdf, other

    cs.LG cs.CV

    For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of s… ▽ More

    Submitted 27 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 18 pages, 3 figures

  4. arXiv:2306.00740  [pdf, other

    cs.LG stat.ML

    On the Limitations of Temperature Scaling for Distributions with Overlaps

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures such as temperature scaling. While temperature scaling is frequently used b… ▽ More

    Submitted 13 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 27 pages, 9 Figures, published in ICLR 2024

  5. arXiv:2302.12715  [pdf, other

    cs.LG cs.AI

    Hiding Data Helps: On the Benefits of Masking for Sparse Coding

    Authors: Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

    Abstract: Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imaging. While this success has spurred much work on provable guarantees for dictionary recovery when the learned dictionary is the same size as the… ▽ More

    Submitted 1 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 16 pages, 1 figure, ICML 2023

  6. arXiv:2210.13512  [pdf, other

    cs.LG cs.AI cs.CV math.OC stat.ML

    Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

    Authors: Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

    Abstract: Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain so… ▽ More

    Submitted 1 June, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 37 pages, 2 figures, ICML 2023

  7. arXiv:2110.07647  [pdf, other

    cs.LG cs.AI

    Towards Understanding the Data Dependency of Mixup-style Training

    Authors: Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge

    Abstract: In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these b… ▽ More

    Submitted 19 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 26 pages, 14 figures, Accepted to ICLR 2022 (Spotlight)

  8. arXiv:1810.12836  [pdf, other

    cs.CL

    Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model

    Authors: Muthuraman Chidambaram, Yinfei Yang, Daniel Cer, Steve Yuan, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

    Abstract: A significant roadblock in multilingual neural language modeling is the lack of labeled non-English data. One potential method for overcoming this issue is learning cross-lingual text representations that can be used to transfer the performance from training on English tasks to non-English tasks, despite little to no task-specific non-English data. In this paper, we explore a natural setup for lea… ▽ More

    Submitted 1 August, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted at the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

    Journal ref: In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

  9. arXiv:1702.06762  [pdf, other

    cs.LG

    Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently

    Authors: Muthuraman Chidambaram, Yanjun Qi

    Abstract: The idea of style transfer has largely only been explored in image-based tasks, which we attribute in part to the specific nature of loss functions used for style transfer. We propose a general formulation of style transfer as an extension of generative adversarial networks, by using a discriminator to regularize a generator with an otherwise separate loss function. We apply our approach to the ta… ▽ More

    Submitted 7 May, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

    Comments: style transfer, Generative Adversarial Networks