Search | arXiv e-print repository

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Authors: Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George Konidaris

Abstract: Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, wit… ▽ More Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to--or knowledge of--an underlying, unobservable state space. Our metric, the $λ$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($λ$) with a different value of $λ$. Since TD($λ$=0) makes an implicit Markov assumption and TD($λ$=1) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $λ$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $λ$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $λ$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: GitHub URL: https://github.com/brownirl/lambda_discrepancy

arXiv:2401.05604 [pdf, other]

REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Authors: Andrew Gritsevskiy, Arjun Panickssery, Aaron Kirtland, Derik Kauffman, Hans Gundlach, Irina Gritsevskaya, Joe Cavanagh, Jonathan Chiang, Lydia La Roux, Michelle Hung

Abstract: We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with h… ▽ More We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with hypothesis testing, multi-step reasoning, and an understanding of human cognition, making for a complex, multimodal evaluation of capabilities. We find that GPT-4o significantly outperforms all other models, followed by proprietary models outperforming all other evaluated models. However, even the best model has a final accuracy of only 42\%, which goes down to just 7\% on hard puzzles, highlighting the need for substantial improvements in reasoning. Further, models rarely understand all parts of a puzzle, and are almost always incapable of retroactively explaining the correct answer. Our benchmark can therefore be used to identify major shortcomings in the knowledge and reasoning of multimodal large language models. △ Less

Submitted 3 June, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 20 pages, 5 figures. For code, see http://github.com/cvndsh/rebus

arXiv:2306.09479 [pdf, other]

Inverse Scaling: When Bigger Isn't Better

Authors: Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zheng** Zhou, Najoung Kim , et al. (2 additional authors not shown)

Abstract: Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling… ▽ More Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models. △ Less

Submitted 12 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Published in TMLR (2023), 39 pages

Journal ref: Transactions on Machine Learning Research (TMLR), 10/2023, https://openreview.net/forum?id=DwgRm72GQF

arXiv:2209.05944 [pdf, other]

An Unstructured Mesh Approach to Nonlinear Noise Reduction for Coupled Systems

Authors: Aaron Kirtland, Jonah Botvinick-Greenhouse, Marianne DeBrito, Megan Osborne, Casey Johnson, Robert S. Martin, Samuel J. Araki, Daniel Q. Eckhardt

Abstract: To address noise inherent in electronic data acquisition systems and real world sources, Araki et al. [Physica D: Nonlinear Phenomena, 417 (2021) 132819] demonstrated a grid based nonlinear technique to remove noise from a chaotic signal, leveraging a clean high-fidelity signal from the same dynamical system and ensemble averaging in multidimensional phase space. This method achieved denoising of… ▽ More To address noise inherent in electronic data acquisition systems and real world sources, Araki et al. [Physica D: Nonlinear Phenomena, 417 (2021) 132819] demonstrated a grid based nonlinear technique to remove noise from a chaotic signal, leveraging a clean high-fidelity signal from the same dynamical system and ensemble averaging in multidimensional phase space. This method achieved denoising of a time-series data with 100% added noise but suffered in regions of low data density. To improve this grid-based method, here an unstructured mesh based on triangulations and Voronoi diagrams is used to accomplish the same task. The unstructured mesh more uniformly distributes data samples over mesh cells to improve the accuracy of the reconstructed signal. By empirically balancing bias and variance errors in selecting the number of unstructured cells as a function of the number of available samples, the method achieves asymptotic statistical convergence with known test data and reduces synthetic noise on experimental signals from Hall Effect Thrusters (HETs) with greater success than the original grid-based strategy. △ Less

Submitted 7 September, 2022; originally announced September 2022.

arXiv:2008.04459 [pdf, other]

The Polynomial Learning With Errors Problem and the Smearing Condition

Authors: Liljana Babinkostova, Ariana Chin, Aaron Kirtland, Vladyslav Nazarchuk, Esther Plotnick

Abstract: As quantum computing advances rapidly, guaranteeing the security of cryptographic protocols resistant to quantum attacks is paramount. Some leading candidate cryptosystems use the Learning with Errors (LWE) problem, attractive for its simplicity and hardness guaranteed by reductions from hard computational lattice problems. Its algebraic variants, Ring-Learning with Errors (RLWE) and Polynomial Le… ▽ More As quantum computing advances rapidly, guaranteeing the security of cryptographic protocols resistant to quantum attacks is paramount. Some leading candidate cryptosystems use the Learning with Errors (LWE) problem, attractive for its simplicity and hardness guaranteed by reductions from hard computational lattice problems. Its algebraic variants, Ring-Learning with Errors (RLWE) and Polynomial Learning with Errors (PLWE), gain in efficiency over standard LWE, but their security remains to be thoroughly investigated. In this work, we consider the "smearing" condition, a condition for attacks on PLWE and RLWE introduced in [6]. We expand upon some questions about smearing posed by Elias et al. in [6] and show how smearing is related to the Coupon Collector's Problem Furthermore, we develop some practical algorithms for calculating probabilities related to smearing. Finally, we present a smearing-based attack on PLWE, and demonstrate its effectiveness. △ Less

Submitted 14 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.13189 [pdf, ps, other]

The Ring Learning With Errors Problem: Spectral Distortion

Authors: L. Babinkostova, Ariana Chin, Aaron Kirtland, Vladyslav Nazarchuk, Esther Plotnick

Abstract: We answer a question posed by Y. Elias et al. in [8] about possible spectral distortions of algebraic numbers. We provide a closed form for the spectral distortion of certain classes of cyclotomic polynomials. Moreover, we present a bound on the spectral distortion of cyclotomic polynomials. We answer a question posed by Y. Elias et al. in [8] about possible spectral distortions of algebraic numbers. We provide a closed form for the spectral distortion of certain classes of cyclotomic polynomials. Moreover, we present a bound on the spectral distortion of cyclotomic polynomials. △ Less

Submitted 28 July, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

Showing 1–6 of 6 results for author: Kirtland, A