-
Towards Minimal Targeted Updates of Language Models with Targeted Negative Training
Authors:
Lily H. Zhang,
Rajesh Ranganath,
Arya Tafvizi
Abstract:
Generative models of language exhibit impressive capabilities but still place non-negligible probability mass over undesirable outputs. In this work, we address the task of updating a model to avoid unwanted outputs while minimally changing model behavior otherwise, a challenge we refer to as a minimal targeted update. We first formalize the notion of a minimal targeted update and propose a method…
▽ More
Generative models of language exhibit impressive capabilities but still place non-negligible probability mass over undesirable outputs. In this work, we address the task of updating a model to avoid unwanted outputs while minimally changing model behavior otherwise, a challenge we refer to as a minimal targeted update. We first formalize the notion of a minimal targeted update and propose a method to achieve such updates using negative examples from a model's generations. Our proposed Targeted Negative Training (TNT) results in updates that keep the new distribution close to the original, unlike existing losses for negative signal which push down probability but do not control what the updated distribution will be. In experiments, we demonstrate that TNT yields a better trade-off between reducing unwanted behavior and maintaining model generation behavior than baselines, paving the way towards a modeling paradigm based on iterative training updates that constrain models from generating undesirable outputs while preserving their impressive capabilities.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Preference Learning Algorithms Do Not Learn Preference Rankings
Authors:
Angelica Chen,
Sadhika Malladi,
Lily H. Zhang,
Xinyi Chen,
Qiuyi Zhang,
Rajesh Ranganath,
Kyunghyun Cho
Abstract:
Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via…
▽ More
Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via $\textit{ranking accuracy}$. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the $\textit{idealized ranking accuracy}$ that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant $\textit{alignment gap}$ -- $\textit{i.e.}$, a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e.g., RLHF) and off-policy (e.g., DPO) preference learning algorithms.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Towards more scientific meta-analyses
Authors:
Lily H. Zhang,
Menelaos Konstantinidis,
Marie-Abèle Bind,
Donald B. Rubin
Abstract:
Meta-analysis can be a critical part of the research process, often serving as the primary analysis on which the practitioners, policymakers, and individuals base their decisions. However, current literature synthesis approaches to meta-analysis typically estimate a different quantity than what is implicitly intended; concretely, standard approaches estimate the average effect of a treatment for a…
▽ More
Meta-analysis can be a critical part of the research process, often serving as the primary analysis on which the practitioners, policymakers, and individuals base their decisions. However, current literature synthesis approaches to meta-analysis typically estimate a different quantity than what is implicitly intended; concretely, standard approaches estimate the average effect of a treatment for a population of imperfect studies, rather than the true scientific effect that would be measured in a population of hypothetical perfect studies. We advocate for an alternative method, called response-surface meta-analysis, which models the relationship between the quality of the study design as predictor variables and its reported estimated effect size as the outcome variable in order to estimate the effect size obtained by the hypothetical ideal study. The idea was first introduced by Rubin several decades ago, and here we provide a practical implementation. First, we reintroduce the idea of response-surface meta-analysis, highlighting its focus on a scientifically-motivated estimand while proposing a straightforward implementation. Then we compare the approach to traditional meta-analysis techniques used in practice. We then implement response-surface meta-analysis and contrast its results with existing literature-synthesis approaches on both simulated data and a real-world example published by the Cochrane Collaboration. We conclude by detailing the primary challenges in the implementation of response-surface meta-analysis and offer some suggestions to tackle these challenges.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Robustness to Spurious Correlations Improves Semantic Out-of-Distribution Detection
Authors:
Lily H. Zhang,
Rajesh Ranganath
Abstract:
Methods which utilize the outputs or feature representations of predictive models have emerged as promising approaches for out-of-distribution (OOD) detection of image inputs. However, these methods struggle to detect OOD inputs that share nuisance values (e.g. background) with in-distribution inputs. The detection of shared-nuisance out-of-distribution (SN-OOD) inputs is particularly relevant in…
▽ More
Methods which utilize the outputs or feature representations of predictive models have emerged as promising approaches for out-of-distribution (OOD) detection of image inputs. However, these methods struggle to detect OOD inputs that share nuisance values (e.g. background) with in-distribution inputs. The detection of shared-nuisance out-of-distribution (SN-OOD) inputs is particularly relevant in real-world applications, as anomalies and in-distribution inputs tend to be captured in the same settings during deployment. In this work, we provide a possible explanation for SN-OOD detection failures and propose nuisance-aware OOD detection to address them. Nuisance-aware OOD detection substitutes a classifier trained via empirical risk minimization and cross-entropy loss with one that 1. is trained under a distribution where the nuisance-label relationship is broken and 2. yields representations that are independent of the nuisance under this distribution, both marginally and conditioned on the label. We can train a classifier to achieve these objectives using Nuisance-Randomized Distillation (NuRD), an algorithm developed for OOD generalization under spurious correlations. Output- and feature-based nuisance-aware OOD detection perform substantially better than their original counterparts, succeeding even when detection based on domain generalization algorithms fails to improve performance.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets
Authors:
Lily H. Zhang,
Veronica Tozzo,
John M. Higgins,
Rajesh Ranganath
Abstract:
Permutation invariant neural networks are a promising tool for making predictions from sets. However, we show that existing permutation invariant architectures, Deep Sets and Set Transformer, can suffer from vanishing or exploding gradients when they are deep. Additionally, layer norm, the normalization of choice in Set Transformer, can hurt performance by removing information useful for predictio…
▽ More
Permutation invariant neural networks are a promising tool for making predictions from sets. However, we show that existing permutation invariant architectures, Deep Sets and Set Transformer, can suffer from vanishing or exploding gradients when they are deep. Additionally, layer norm, the normalization of choice in Set Transformer, can hurt performance by removing information useful for prediction. To address these issues, we introduce the clean path principle for equivariant residual connections and develop set norm, a normalization tailored for sets. With these, we build Deep Sets++ and Set Transformer++, models that reach high depths with comparable or better performance than their original counterparts on a diverse suite of tasks. We additionally introduce Flow-RBC, a new single-cell dataset and real-world application of permutation invariant prediction. We open-source our data and code here: https://github.com/rajesh-lab/deep_permutation_invariant.
△ Less
Submitted 13 July, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Understanding Failures in Out-of-Distribution Detection with Deep Generative Models
Authors:
Lily H. Zhang,
Mark Goldstein,
Rajesh Ranganath
Abstract:
Deep generative models (DGMs) seem a natural fit for detecting out-of-distribution (OOD) inputs, but such models have been shown to assign higher probabilities or densities to OOD images than images from the training distribution. In this work, we explain why this behavior should be attributed to model misestimation. We first prove that no method can guarantee performance beyond random chance with…
▽ More
Deep generative models (DGMs) seem a natural fit for detecting out-of-distribution (OOD) inputs, but such models have been shown to assign higher probabilities or densities to OOD images than images from the training distribution. In this work, we explain why this behavior should be attributed to model misestimation. We first prove that no method can guarantee performance beyond random chance without assumptions on which out-distributions are relevant. We then interrogate the typical set hypothesis, the claim that relevant out-distributions can lie in high likelihood regions of the data distribution, and that OOD detection should be defined based on the data distribution's typical set. We highlight the consequences implied by assuming support overlap between in- and out-distributions, as well as the arbitrariness of the typical set for OOD detection. Our results suggest that estimation error is a more plausible explanation than the misalignment between likelihood-based OOD detection and out-distributions of interest, and we illustrate how even minimal estimation error can lead to OOD detection failures, yielding implications for future work in deep generative modeling and OOD detection.
△ Less
Submitted 16 July, 2021; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations
Authors:
Aahlad Puli,
Lily H. Zhang,
Eric K. Oermann,
Rajesh Ranganath
Abstract:
In many prediction problems, spurious correlations are induced by a changing relationship between the label and a nuisance variable that is also correlated with the covariates. For example, in classifying animals in natural images, the background, which is a nuisance, can predict the type of animal. This nuisance-label relationship does not always hold, and the performance of a model trained under…
▽ More
In many prediction problems, spurious correlations are induced by a changing relationship between the label and a nuisance variable that is also correlated with the covariates. For example, in classifying animals in natural images, the background, which is a nuisance, can predict the type of animal. This nuisance-label relationship does not always hold, and the performance of a model trained under one such relationship may be poor on data with a different nuisance-label relationship. To build predictive models that perform well regardless of the nuisance-label relationship, we develop Nuisance-Randomized Distillation (NURD). We introduce the nuisance-randomized distribution, a distribution where the nuisance and the label are independent. Under this distribution, we define the set of representations such that conditioning on any member, the nuisance and the label remain independent. We prove that the representations in this set always perform better than chance, while representations outside of this set may not. NURD finds a representation from this set that is most informative of the label under the nuisance-randomized distribution, and we prove that this representation achieves the highest performance regardless of the nuisance-label relationship. We evaluate NURD on several tasks including chest X-ray classification where, using non-lung patches as the nuisance, NURD produces models that predict pneumonia under strong spurious correlations.
△ Less
Submitted 12 February, 2023; v1 submitted 29 June, 2021;
originally announced July 2021.
-
Optimization of Design Parameters for SPPC Longitudinal Dynamics
Authors:
L. H. Zhang,
J. Y. Tang,
Y. Hong,
Y. K. Chen,
L. J. Wang
Abstract:
As the second stage of the CEPC-SPPC project, SPPC (Super Proton-Proton Collider) aims at exploring new physics beyond the Standard Model. The key design goal for the SPPC accelerator complex is to reach 75 TeV in center of mass energy with a circumference of 100 km for the collider and an injector chain of four accelerators in cascade to support the collider. As an important part of the SPPC conc…
▽ More
As the second stage of the CEPC-SPPC project, SPPC (Super Proton-Proton Collider) aims at exploring new physics beyond the Standard Model. The key design goal for the SPPC accelerator complex is to reach 75 TeV in center of mass energy with a circumference of 100 km for the collider and an injector chain of four accelerators in cascade to support the collider. As an important part of the SPPC conceptual study, the longitudinal beam dynamics was studied systematically, which includes the dynamics in the collider and its complex injector chain. First, the bunch filling scheme of the SPPC complex was designed on the basis of various constraints such as the technical challenges of the kicker magnets, limited extraction energy per injection and so on. Next, the study on the longitudinal dynamics in the collider focused on the RF scheme to meet the requirements for luminosity and mitigate relevant instabilities. A higher harmonic RF system (800 MHz) together with the basic RF system (400 MHz) to form a dual-harmonic RF system was employed to mitigate collective instabilities and increase the luminosity by producing shorter bunches. In addition, the longitudinal matchings between the bunches in the different accelerator stages were studied, with special attention to the space charge effects and the beam loading effect in the two lower energy rings (p-RCS and MSS), which result in an optimization of the RF schemes. A set of self-consistent beam and RF parameters for the SPPC complex was obtained. The collider and the three proton synchrotrons of the injector chain have unprecedented features, thus this study demonstrates how a future proton-proton collider complex looks like.
△ Less
Submitted 1 February, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Interface and electronic characterization of thin epitaxial Co3O4 films
Authors:
C. A. F. Vaz,
H. -Q. Wang,
C. H. Ahn,
V. E. Henrich,
M. Z. Baykara,
T. C. Schwendemann,
N. Pilet,
B. J. Albers,
U. D. Schwarz,
L. H. Zhang,
Y. Zhu,
J. Wang,
E. I. Altman
Abstract:
The interface and electronic structure of thin (~20-74 nm) Co3O4(110) epitaxial films grown by oxygen-assisted molecular beam epitaxy on MgAl2O4(110) single crystal substrates have been investigated by means of real and reciprocal space techniques. As-grown film surfaces are found to be relatively disordered and exhibit an oblique low energy electron diffraction (LEED) pattern associated with th…
▽ More
The interface and electronic structure of thin (~20-74 nm) Co3O4(110) epitaxial films grown by oxygen-assisted molecular beam epitaxy on MgAl2O4(110) single crystal substrates have been investigated by means of real and reciprocal space techniques. As-grown film surfaces are found to be relatively disordered and exhibit an oblique low energy electron diffraction (LEED) pattern associated with the O-rich CoO2 bulk termination of the (110) surface. Interface and bulk film structure are found to improve significantly with post-growth annealing at 820 K in air and display sharp rectangular LEED patterns, suggesting a surface stoichiometry of the alternative Co2O2 bulk termination of the (110) surface. Non-contact atomic force microscopy demonstrates the presence of wide terraces separated by atomic steps in the annealed films that are not present in the as-grown structures; the step height of ~ 2.7 A corresponds to two atomic layers and confirms a single termination for the annealed films, consistent with the LEED results. A model of the (1 * 1) surfaces that allows for compensation of the polar surfaces is presented.
△ Less
Submitted 23 November, 2008;
originally announced November 2008.