Skip to main content

Showing 1–6 of 6 results for author: Zhang, L H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13660  [pdf, other

    cs.CL cs.AI

    Towards Minimal Targeted Updates of Language Models with Targeted Negative Training

    Authors: Lily H. Zhang, Rajesh Ranganath, Arya Tafvizi

    Abstract: Generative models of language exhibit impressive capabilities but still place non-negligible probability mass over undesirable outputs. In this work, we address the task of updating a model to avoid unwanted outputs while minimally changing model behavior otherwise, a challenge we refer to as a minimal targeted update. We first formalize the notion of a minimal targeted update and propose a method… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Published in Transactions of Machine Learning Research

  2. arXiv:2405.19534  [pdf, other

    cs.LG cs.AI cs.CL

    Preference Learning Algorithms Do Not Learn Preference Rankings

    Authors: Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho

    Abstract: Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2302.04132  [pdf, other

    cs.LG cs.AI

    Robustness to Spurious Correlations Improves Semantic Out-of-Distribution Detection

    Authors: Lily H. Zhang, Rajesh Ranganath

    Abstract: Methods which utilize the outputs or feature representations of predictive models have emerged as promising approaches for out-of-distribution (OOD) detection of image inputs. However, these methods struggle to detect OOD inputs that share nuisance values (e.g. background) with in-distribution inputs. The detection of shared-nuisance out-of-distribution (SN-OOD) inputs is particularly relevant in… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: AAAI 2023

  4. arXiv:2206.11925  [pdf, other

    cs.LG

    Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets

    Authors: Lily H. Zhang, Veronica Tozzo, John M. Higgins, Rajesh Ranganath

    Abstract: Permutation invariant neural networks are a promising tool for making predictions from sets. However, we show that existing permutation invariant architectures, Deep Sets and Set Transformer, can suffer from vanishing or exploding gradients when they are deep. Additionally, layer norm, the normalization of choice in Set Transformer, can hurt performance by removing information useful for predictio… ▽ More

    Submitted 13 July, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted at ICML 2022

  5. arXiv:2107.06908  [pdf, other

    cs.LG

    Understanding Failures in Out-of-Distribution Detection with Deep Generative Models

    Authors: Lily H. Zhang, Mark Goldstein, Rajesh Ranganath

    Abstract: Deep generative models (DGMs) seem a natural fit for detecting out-of-distribution (OOD) inputs, but such models have been shown to assign higher probabilities or densities to OOD images than images from the training distribution. In this work, we explain why this behavior should be attributed to model misestimation. We first prove that no method can guarantee performance beyond random chance with… ▽ More

    Submitted 16 July, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted at ICML 2021

  6. arXiv:2107.00520  [pdf, other

    cs.LG stat.ML

    Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

    Authors: Aahlad Puli, Lily H. Zhang, Eric K. Oermann, Rajesh Ranganath

    Abstract: In many prediction problems, spurious correlations are induced by a changing relationship between the label and a nuisance variable that is also correlated with the covariates. For example, in classifying animals in natural images, the background, which is a nuisance, can predict the type of animal. This nuisance-label relationship does not always hold, and the performance of a model trained under… ▽ More

    Submitted 12 February, 2023; v1 submitted 29 June, 2021; originally announced July 2021.