Skip to main content

Showing 1–6 of 6 results for author: Mallen, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.04362  [pdf, other

    cs.LG

    Neural Networks Learn Statistics of Increasing Complexity

    Authors: Nora Belrose, Quintin Pope, Lucia Quirke, Alex Mallen, Xiaoli Fern

    Abstract: The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in train… ▽ More

    Submitted 13 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  2. arXiv:2312.01037  [pdf, other

    cs.LG cs.AI cs.CL

    Eliciting Latent Knowledge from Quirky Language Models

    Authors: Alex Mallen, Madeline Brumley, Julia Kharchenko, Nora Belrose

    Abstract: Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations that robustly track the true state of the world, especially in hard-to-verify cases where the model's output is untrusted. To further ELK research, we introduce 12 datasets and a corresponding suite of "quirky" language models (LMs) that are finetuned to make systematic errors when answering questions… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Preprint

  3. arXiv:2310.01405  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Representation Engineering: A Top-Down Approach to AI Transparency

    Authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

    Abstract: In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equip** us with novel methods for monitoring and manipulating high-level cognitive p… ▽ More

    Submitted 10 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Code is available at https://github.com/andyzoujm/representation-engineering

  4. arXiv:2212.10511  [pdf, other

    cs.CL cs.AI cs.LG

    When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

    Authors: Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi

    Abstract: Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 m… ▽ More

    Submitted 2 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023; Code and data available at https://github.com/AlexTMallen/adaptive-retrieval

  5. arXiv:2209.08618  [pdf, other

    cs.LG

    Koopman-theoretic Approach for Identification of Exogenous Anomalies in Nonstationary Time-series Data

    Authors: Alex Mallen, Christoph A. Keller, J. Nathan Kutz

    Abstract: In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be taken. Determining whether current observations are abnormal is challenging. It requires learning an extrapolative probabilistic model of the dynamics from historical data, and using a limited number of current… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 10 pages, 8 figures

    ACM Class: I.6.0; I.5.0; J.2

  6. arXiv:2106.06033  [pdf, other

    cs.LG

    Deep Probabilistic Koopman: Long-term time-series forecasting under periodic uncertainties

    Authors: Alex Mallen, Henning Lange, J. Nathan Kutz

    Abstract: Probabilistic forecasting of complex phenomena is paramount to various scientific disciplines and applications. Despite the generality and importance of the problem, general mathematical techniques that allow for stable long-term forecasts with calibrated uncertainty measures are lacking. For most time series models, the difficulty of obtaining accurate probabilistic future time step predictions i… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 16 pages, 10 figures, submitted to NeurIPS 2021