Skip to main content

Showing 1–15 of 15 results for author: Puli, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.08777  [pdf, other

    hep-ex cs.LG hep-ph physics.data-an

    Robust Anomaly Detection for Particle Physics Using Multi-Background Representation Learning

    Authors: Abhijith Gandrakota, Lily Zhang, Aahlad Puli, Kyle Cranmer, Jennifer Ngadiuba, Rajesh Ranganath, Nhan Tran

    Abstract: Anomaly, or out-of-distribution, detection is a promising tool for aiding discoveries of new particles or processes in particle physics. In this work, we identify and address two overlooked opportunities to improve anomaly detection for high-energy physics. First, rather than train a generative model on the single most dominant background process, we build detection algorithms using representation… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Report number: FERMILAB-PUB-23-675-CMS-CSAID

  2. arXiv:2308.12553  [pdf, other

    cs.LG stat.ML

    Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy

    Authors: Aahlad Puli, Lily Zhang, Yoav Wald, Rajesh Ranganath

    Abstract: Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not p… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  3. arXiv:2308.04431  [pdf, other

    cs.LG cs.CV

    When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations

    Authors: Rhys Compton, Lily Zhang, Aahlad Puli, Rajesh Ranganath

    Abstract: In machine learning, incorporating more data is often seen as a reliable strategy for improving model performance; this work challenges that notion by demonstrating that the addition of external datasets in many cases can hurt the resulting model's performance. In a large-scale empirical study across combinations of four different open-source chest x-ray datasets and 9 different labels, we demonst… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at MLHC 2023

  4. arXiv:2303.12888  [pdf, other

    cs.LG cs.AI

    A dynamic risk score for early prediction of cardiogenic shock using machine learning

    Authors: Yuxuan Hu, Albert Lui, Mark Goldstein, Mukund Sudarshan, Andrea Tinsay, Cindy Tsui, Samuel Maidman, John Medamana, Neil Jethani, Aahlad Puli, Vuthy Nguy, Yindalon Aphinyanaphongs, Nicholas Kiefer, Nathaniel Smilowitz, James Horowitz, Tania Ahuja, Glenn I Fishman, Judith Hochman, Stuart Katz, Samuel Bernard, Rajesh Ranganath

    Abstract: Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to… ▽ More

    Submitted 28 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

  5. arXiv:2302.09344  [pdf, other

    cs.LG cs.AI cs.CV

    Beyond Distribution Shift: Spurious Features Through the Lens of Training Dynamics

    Authors: Nihal Murali, Aahlad Puli, Ke Yu, Rajesh Ranganath, Kayhan Batmanghelich

    Abstract: Deep Neural Networks (DNNs) are prone to learning spurious features that correlate with the label during training but are irrelevant to the learning problem. This hurts model generalization and poses problems when deploying them in safety-critical applications. This paper aims to better understand the effects of spurious features through the lens of the learning dynamics of the internal neurons du… ▽ More

    Submitted 14 October, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Main paper: 12 pages, 2 tables, and 10 figures. Supplementary: 10 pages and 9 figures. Accepted in TMLR23 (https://openreview.net/pdf?id=Tkvmt9nDmB)

  6. arXiv:2210.01302  [pdf, other

    cs.LG cs.CV

    Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation

    Authors: Aahlad Puli, Nitish Joshi, Yoav Wald, He He, Rajesh Ranganath

    Abstract: In prediction tasks, there exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with varying relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is semantic but because images of cows often have grass backgrounds but not always, the bac… ▽ More

    Submitted 3 July, 2024; v1 submitted 3 October, 2022; originally announced October 2022.

  7. arXiv:2208.08579  [pdf, other

    stat.ME cs.LG stat.ML

    DIET: Conditional independence testing with marginal dependence measures of residual information

    Authors: Mukund Sudarshan, Aahlad Manas Puli, Wesley Tansey, Rajesh Ranganath

    Abstract: Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$, having observed covariates $z$. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which… ▽ More

    Submitted 11 April, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

  8. arXiv:2205.02900  [pdf, other

    cs.LG cs.AI cs.CY

    New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography

    Authors: Neil Jethani, Aahlad Puli, Hao Zhang, Leonid Garber, Lior Jankelson, Yindalon Aphinyanaphongs, Rajesh Ranganath

    Abstract: Undiagnosed diabetes is present in 21.4% of adults with diabetes. Diabetes can remain asymptomatic and undetected due to limitations in screening rates. To address this issue, questionnaires, such as the American Diabetes Association (ADA) Risk test, have been recommended for use by physicians and the public. Based on evidence that blood glucose concentration can affect cardiac electrophysiology,… ▽ More

    Submitted 22 March, 2023; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: 21 pages, 8 figures

  9. arXiv:2112.00881  [pdf, other

    cs.LG stat.ML

    Learning Invariant Representations with Missing Data

    Authors: Mark Goldstein, Jörn-Henrik Jacobsen, Olina Chau, Adriel Saporta, Aahlad Puli, Rajesh Ranganath, Andrew C. Miller

    Abstract: Spurious correlations allow flexible models to predict well during training but poorly on related test distributions. Recent work has shown that models that satisfy particular independencies involving correlation-inducing \textit{nuisance} variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such a… ▽ More

    Submitted 8 June, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: CLeaR (Causal Learning and Reasoning) 2022

  10. arXiv:2111.08175  [pdf, other

    cs.LG stat.ML

    Inverse-Weighted Survival Games

    Authors: Xintian Han, Mark Goldstein, Aahlad Puli, Thomas Wies, Adler J Perotte, Rajesh Ranganath

    Abstract: Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum lik… ▽ More

    Submitted 31 January, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Neurips 2021

  11. arXiv:2107.00520  [pdf, other

    cs.LG stat.ML

    Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

    Authors: Aahlad Puli, Lily H. Zhang, Eric K. Oermann, Rajesh Ranganath

    Abstract: In many prediction problems, spurious correlations are induced by a changing relationship between the label and a nuisance variable that is also correlated with the covariates. For example, in classifying animals in natural images, the background, which is a nuisance, can predict the type of animal. This nuisance-label relationship does not always hold, and the performance of a model trained under… ▽ More

    Submitted 12 February, 2023; v1 submitted 29 June, 2021; originally announced July 2021.

  12. arXiv:2102.08533  [pdf, other

    stat.ME cs.LG stat.ML

    Causal Estimation with Functional Confounders

    Authors: Aahlad Puli, Adler J. Perotte, Rajesh Ranganath

    Abstract: Causal inference relies on two fundamental assumptions: ignorability and positivity. We study causal inference when the true confounder value can be expressed as a function of the observed data; we call this setting estimation with functional confounders (EFC). In this setting, ignorability is satisfied, however positivity is violated, and causal inference is impossible in general. We consider two… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: 17 pages, 7 figures, 2 tables

  13. arXiv:2101.05346  [pdf, other

    cs.LG stat.ML

    X-CAL: Explicit Calibration for Survival Analysis

    Authors: Mark Goldstein, Xintian Han, Aahlad Puli, Adler J. Perotte, Rajesh Ranganath

    Abstract: Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 20… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  14. arXiv:1907.03451  [pdf, other

    cs.LG stat.ML

    General Control Functions for Causal Effect Estimation from Instrumental Variables

    Authors: Aahlad Manas Puli, Rajesh Ranganath

    Abstract: Causal effect estimation relies on separating the variation in the outcome into parts due to the treatment and due to the confounders. To achieve this separation, practitioners often use external sources of randomness that only influence the treatment called instrumental variables (IVs). We study variables constructed from treatment and IV that help estimate effects, called control functions. We c… ▽ More

    Submitted 2 February, 2021; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: 24 pages, 6 figures

  15. arXiv:1810.11646  [pdf, other

    stat.ML cs.LG

    Removing Hidden Confounding by Experimental Grounding

    Authors: Nathan Kallus, Aahlad Manas Puli, Uri Shalit

    Abstract: Observational data is increasingly used as a means for making individual-level causal predictions and intervention recommendations. The foremost challenge of causal inference from observational data is hidden confounding, whose presence cannot be tested in data and can invalidate any causal conclusion. Experimental data does not suffer from confounding but is usually limited in both scope and scal… ▽ More

    Submitted 27 October, 2018; originally announced October 2018.