Skip to main content

Showing 1–11 of 11 results for author: Vendrow, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.00194  [pdf, other

    cs.LG

    Ask Your Distribution Shift if Pre-Training is Right for You

    Authors: Benjamin Cohen-Wang, Joshua Vendrow, Aleksander Madry

    Abstract: Pre-training is a widely used approach to develop models that are robust to distribution shifts. However, in practice, its effectiveness varies: fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others (compared to training from scratch). In this work, we seek to characterize the failure modes that pre-training can and cannot address. In particular,… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  2. arXiv:2312.06205  [pdf, other

    cs.CV cs.LG

    The Journey, Not the Destination: How Data Guides Diffusion Models

    Authors: Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry

    Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diff… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 29 pages, 17 figures

  3. arXiv:2303.00058  [pdf, other

    cs.LG stat.ML

    Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

    Authors: Tyler Will, Runyu Zhang, Eli Sadovnik, Mengdi Gao, Joshua Vendrow, Jamie Haddock, Denali Molitor, Deanna Needell

    Abstract: We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We deriv… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  4. arXiv:2302.07865  [pdf, other

    cs.LG

    Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

    Authors: Joshua Vendrow, Saachi Jain, Logan Engstrom, Aleksander Madry

    Abstract: Distribution shift is a major source of failure for machine learning models. However, evaluating model reliability under distribution shift can be challenging, especially since it may be difficult to acquire counterfactual examples that exhibit a specified shift. In this work, we introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, retur… ▽ More

    Submitted 19 June, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

  5. arXiv:2209.02415  [pdf, other

    cs.CV cs.AI

    Automatic Infectious Disease Classification Analysis with Concept Discovery

    Authors: Elena Sizikova, Joshua Vendrow, Xu Cao, Rachel Grotheer, Jamie Haddock, Lara Kassab, Alona Kryshchenko, Thomas Merkh, R. W. M. A. Madushani, Kenny Moise, Annie Ulichney, Huy V. Vo, Chuntian Wang, Megan Coffee, Kathryn Leonard, Deanna Needell

    Abstract: Automatic infectious disease classification from images can facilitate needed medical diagnoses. Such an approach can identify diseases, like tuberculosis, which remain under-diagnosed due to resource constraints and also novel and emerging diseases, like monkeypox, which clinicians have little experience or acumen in diagnosing. Avoiding missed or delayed diagnoses would prevent further transmiss… ▽ More

    Submitted 14 November, 2022; v1 submitted 28 August, 2022; originally announced September 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 13 pages

  6. arXiv:2109.14820  [pdf, other

    cs.LG stat.ML

    A Generalized Hierarchical Nonnegative Tensor Decomposition

    Authors: Joshua Vendrow, Jamie Haddock, Deanna Needell

    Abstract: Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-m… ▽ More

    Submitted 15 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: 6 pages, 2 figues, 3 tables

  7. arXiv:2104.14028  [pdf, other

    cs.LG cs.CY

    Analysis of Legal Documents via Non-negative Matrix Factorization Methods

    Authors: Ryan Budahazy, Lu Cheng, Yihuan Huang, Andrew Johnson, Pengyu Li, Joshua Vendrow, Zhoutong Wu, Denali Molitor, Elizaveta Rebrova, Deanna Needell

    Abstract: The California Innocence Project (CIP), a clinical law school program aiming to free wrongfully convicted prisoners, evaluates thousands of mails containing new requests for assistance and corresponding case files. Processing and interpreting this large amount of information presents a significant challenge for CIP officials, which can be successfully aided by topic modeling techniques.In this pap… ▽ More

    Submitted 6 November, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: 16 pages, 4 figures

  8. arXiv:2102.06984  [pdf, other

    cs.SI cs.LG math.OC physics.soc-ph stat.ML

    Learning low-rank latent mesoscale structures in networks

    Authors: Hanbaek Lyu, Yacoub H. Kureh, Joshua Vendrow, Mason A. Porter

    Abstract: It is common to use networks to encode the architecture of interactions between entities in complex systems in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structure… ▽ More

    Submitted 13 July, 2023; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: 82 pages, 25 figures, 2 tables

  9. arXiv:2012.14048  [pdf, other

    math.DS cs.LG nlin.AO

    Learning to predict synchronization of coupled oscillators on randomly generated graphs

    Authors: Hardeep Bassi, Richard Yim, Rohith Kodukula, Joshua Vendrow, Cherlin Zhu, Hanbaek Lyu

    Abstract: Suppose we are given a system of coupled oscillators on an unknown graph along with the trajectory of the system during some period. Can we predict whether the system will eventually synchronize? Even with a known underlying graph structure, this is an important yet analytically intractable question in general. In this work, we take an alternative approach to the synchronization prediction problem… ▽ More

    Submitted 23 August, 2022; v1 submitted 27 December, 2020; originally announced December 2020.

    Comments: 22 pages, 13 figures, 3 tables

  10. arXiv:2010.11365  [pdf, other

    cs.LG

    On a Guided Nonnegative Matrix Factorization

    Authors: Joshua Vendrow, Jamie Haddock, Elizaveta Rebrova, Deanna Needell

    Abstract: Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed \textit{Guided NMF}, that inc… ▽ More

    Submitted 5 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: 6 pages, 6 tables

  11. arXiv:2009.09087  [pdf, other

    cs.CY cs.LG stat.ML

    Feature Selection on Lyme Disease Patient Survey Data

    Authors: Joshua Vendrow, Jamie Haddock, Deanna Needell, Lorraine Johnson

    Abstract: Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease pa… ▽ More

    Submitted 24 August, 2020; originally announced September 2020.

    Comments: 9 pages, 8 figures, 6 tables