-
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Authors:
Anton Xue,
Avishree Khare,
Rajeev Alur,
Surbhi Goel,
Eric Wong
Abstract:
We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically const…
▽ More
We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically constructed models. Empirically, we find that attacks on our theoretical models mirror popular attacks on large language models. Our work suggests that studying smaller theoretical models can help understand the behavior of large language models in rule-based settings like logical reasoning and jailbreak attacks.
△ Less
Submitted 21 June, 2024;
originally announced July 2024.
-
A Passwordless MFA Utlizing Biometrics, Proximity and Contactless Communication
Authors:
Sneha Shukla,
Gaurav Varshney,
Shreya Singh,
Swati Goel
Abstract:
Despite being more secure and strongly promoted, two-factor (2FA) or multi-factor (MFA) schemes either fail to protect against recent phishing threats such as real-time MITM, controls/relay MITM, malicious browser extension-based phishing attacks, and/or need the users to purchase and carry other hardware for additional account protection. Leveraging the unprecedented popularity of NFC and BLE-ena…
▽ More
Despite being more secure and strongly promoted, two-factor (2FA) or multi-factor (MFA) schemes either fail to protect against recent phishing threats such as real-time MITM, controls/relay MITM, malicious browser extension-based phishing attacks, and/or need the users to purchase and carry other hardware for additional account protection. Leveraging the unprecedented popularity of NFC and BLE-enabled smartphones, we explore a new horizon for designing an MFA scheme. This paper introduces an advanced authentication method for user verification that utilizes the user's real-time facial biometric identity, which serves as an inherent factor, together with BLE- NFC-enabled mobile devices, which operate as an ownership factor. We have implemented a prototype authentication system on a BLE-NFC-enabled Android device, and initial threat modeling suggests that it is safe against known phishing attacks. The scheme has been compared with other popular schemes using the Bonneau et al. assessment framework in terms of usability, deployability, and security.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Tolerant Algorithms for Learning with Arbitrary Covariate Shift
Authors:
Surbhi Goel,
Abhishek Shetty,
Konstantinos Stavropoulos,
Arsen Vasilyan
Abstract:
We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: PQ learning [Goldwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020], allowing abstention on adversarially generated parts of the test distribution, a…
▽ More
We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: PQ learning [Goldwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020], allowing abstention on adversarially generated parts of the test distribution, and TDS learning [Klivans, Stavropoulos, Vasilyan COLT 2024], permitting abstention on the entire test distribution if distribution shift is detected. All prior known algorithms either rely on learning primitives that are computationally hard even for simple function classes, or end up abstaining entirely even in the presence of a tiny amount of distribution shift.
We address both these challenges for natural function classes, including intersections of halfspaces and decision trees, and standard training distributions, including Gaussians. For PQ learning, we give efficient learning algorithms, while for TDS learning, our algorithms can tolerate moderate amounts of distribution shift. At the core of our approach is an improved analysis of spectral outlier-removal techniques from learning with nasty noise. Our analysis can (1) handle arbitrarily large fraction of outliers, which is crucial for handling arbitrary distribution shifts, and (2) obtain stronger bounds on polynomial moments of the distribution after outlier removal, yielding new insights into polynomial regression under distribution shifts. Lastly, our techniques lead to novel results for tolerant testable learning [Rubinfeld and Vasilyan STOC 2023], and learning with nasty noise.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
Authors:
Mahdi Sabbaghi,
George Pappas,
Hamed Hassani,
Surbhi Goel
Abstract:
Despite the success of Transformers on language understanding, code generation, and logical reasoning, they still fail to generalize over length on basic arithmetic tasks such as addition and multiplication. A major reason behind this failure is the vast difference in structure between numbers and text; For example, the numbers are typically parsed from right to left, and there is a correspondence…
▽ More
Despite the success of Transformers on language understanding, code generation, and logical reasoning, they still fail to generalize over length on basic arithmetic tasks such as addition and multiplication. A major reason behind this failure is the vast difference in structure between numbers and text; For example, the numbers are typically parsed from right to left, and there is a correspondence between digits at the same position across different numbers. In contrast, for text, such symmetries are quite unnatural. In this work, we propose to encode these semantics explicitly into the model via modified number formatting and custom positional encodings. Empirically, our method allows a Transformer trained on numbers with at most 5-digits for addition and multiplication to generalize up to 50-digit numbers, without using additional data for longer sequences. We further demonstrate that traditional absolute positional encodings (APE) fail to generalize to longer sequences, even when trained with augmented data that captures task symmetries. To elucidate the importance of explicitly encoding structure, we prove that explicit incorporation of structure via positional encodings is necessary for out-of-distribution generalization. Finally, we pinpoint other challenges inherent to length generalization beyond capturing symmetries, in particular complexity of the underlying task, and propose changes in the training distribution to address them.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Stochastic Bandits with ReLU Neural Networks
Authors:
Kan Xu,
Hamsa Bastani,
Surbhi Goel,
Osbert Bastani
Abstract:
We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first ex…
▽ More
We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
Authors:
Chiara Plizzari,
Shubham Goel,
Toby Perrett,
Jacob Chalk,
Angjoo Kanazawa,
Dima Damen
Abstract:
As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We int…
▽ More
As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We introduce Lift, Match and Keep (LMK), a method which lifts partial 2D observations to 3D world coordinates, matches them over time using visual appearance, 3D location and interactions to form object tracks, and keeps these object tracks even when they go out-of-view of the camera - hence kee** in mind what is out of sight. We test LMK on 100 long videos from EPIC-KITCHENS. Our results demonstrate that spatial cognition is critical for correctly locating objects over short and long time scales. E.g., for one long egocentric video, we estimate the 3D location of 50 active objects. Of these, 60% can be correctly positioned in 3D after 2 minutes of leaving the camera view.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Analyzing LLM Usage in an Advanced Computing Class in India
Authors:
Chaitanya Arora,
Utkarsh Venaik,
Pavit Singh,
Sahil Goyal,
Jatin Tyagi,
Shyama Goel,
Ujjwal Singhal,
Dhruv Kumar
Abstract:
This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between student…
▽ More
This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
The More You See in 2D, the More You Perceive in 3D
Authors:
Xinyang Han,
Zelin Gao,
Angjoo Kanazawa,
Shubham Goel,
Yossi Gandelsman
Abstract:
Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with…
▽ More
Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with the camera poses of the images via test-time fine-tuning. The adapted diffusion model and the obtained camera poses are then utilized as instance-specific priors for 3D reconstruction and novel view synthesis. We show that as the number of input images increases, the performance of our approach improves, bridging the gap between optimization-based prior-less 3D reconstruction methods and single-image-to-3D diffusion-based methods. We demonstrate our system on real images as well as standard synthetic benchmarks. Our ablation studies confirm that this adaption behavior is key for more accurate 3D understanding.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Auditing the Use of Language Models to Guide Hiring Decisions
Authors:
Johann D. Gaebler,
Sharad Goel,
Aziz Huq,
Prasanna Tambe
Abstract:
Regulatory efforts to protect against algorithmic bias have taken on increased urgency with rapid advances in large language models (LLMs), which are machine learning models that can achieve performance rivaling human experts on a wide array of tasks. A key theme of these initiatives is algorithmic "auditing," but current regulations -- as well as the scientific literature -- provide little guidan…
▽ More
Regulatory efforts to protect against algorithmic bias have taken on increased urgency with rapid advances in large language models (LLMs), which are machine learning models that can achieve performance rivaling human experts on a wide array of tasks. A key theme of these initiatives is algorithmic "auditing," but current regulations -- as well as the scientific literature -- provide little guidance on how to conduct these assessments. Here we propose and investigate one approach for auditing algorithms: correspondence experiments, a widely applied tool for detecting bias in human judgements. In the employment context, correspondence experiments aim to measure the extent to which race and gender impact decisions by experimentally manipulating elements of submitted application materials that suggest an applicant's demographic traits, such as their listed name. We apply this method to audit candidate assessments produced by several state-of-the-art LLMs, using a novel corpus of applications to K-12 teaching positions in a large public school district. We find evidence of moderate race and gender disparities, a pattern largely robust to varying the types of application material input to the models, as well as the framing of the task to the LLMs. We conclude by discussing some important limitations of correspondence experiments for auditing algorithms.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
Authors:
GuanWen Qiu,
Da Kuang,
Surbhi Goel
Abstract:
Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover, studies mainly focus on end performance rather than the learning dynamics of feature learning. In this paper, we propose a theoretical framework and an associated synthetic dataset grounded in boolean fu…
▽ More
Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover, studies mainly focus on end performance rather than the learning dynamics of feature learning. In this paper, we propose a theoretical framework and an associated synthetic dataset grounded in boolean function analysis. This setup allows for fine-grained control over the relative complexity (compared to core features) and correlation strength (with respect to the label) of spurious features to study the dynamics of feature learning under spurious correlations. Our findings uncover several interesting phenomena: (1) stronger spurious correlations or simpler spurious features slow down the learning rate of the core features, (2) two distinct subnetworks are formed to learn core and spurious features separately, (3) learning phases of spurious and core features are not always separable, (4) spurious features are not forgotten even after core features are fully learned. We demonstrate that our findings justify the success of retraining the last layer to remove spurious correlation and also identifies limitations of popular debiasing algorithms that exploit early learning of spurious features. We support our empirical findings with theoretical analyses for the case of learning XOR features with a one-hidden-layer ReLU network.
△ Less
Submitted 16 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Authors:
Nathaniel Li,
Alexander Pan,
Anjali Gopal,
Summer Yue,
Daniel Berrios,
Alice Gatti,
Justin D. Li,
Ann-Kathrin Dombrowski,
Shashwat Goel,
Long Phan,
Gabriel Mukobi,
Nathan Helm-Burger,
Rassin Lababidi,
Lennart Justen,
Andrew B. Liu,
Michael Chen,
Isabelle Barrass,
Oliver Zhang,
Xiaoyuan Zhu,
Rishub Tamirisa,
Bhrugu Bharathi,
Adam Khoja,
Zhenqi Zhao,
Ariel Herbert-Voss,
Cort B. Breuer
, et al. (32 additional authors not shown)
Abstract:
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe…
▽ More
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai
△ Less
Submitted 15 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Optimality of weighted contracts for multi-agent contract design with a budget
Authors:
Sumit Goel,
Wade Hann-Caruthers
Abstract:
We study a contract design problem between a principal and multiple agents. Each agent participates in an independent task with binary outcomes (success or failure), in which it may exert costly effort towards improving its probability of success, and the principal has a fixed budget which it can use to provide outcome-dependent rewards to the agents. Crucially, we assume the principal cares only…
▽ More
We study a contract design problem between a principal and multiple agents. Each agent participates in an independent task with binary outcomes (success or failure), in which it may exert costly effort towards improving its probability of success, and the principal has a fixed budget which it can use to provide outcome-dependent rewards to the agents. Crucially, we assume the principal cares only about maximizing the agents' probabilities of success, not how much of the budget it expends. We first show that a contract is optimal for some objective if and only if it is a successful-get-everything contract. An immediate consequence of this result is that piece-rate contracts and bonus-pool contracts are never optimal in this setting. We then show that for any objective, there is an optimal priority-based weighted contract, which assigns positive weights and priority levels to the agents, and splits the budget among the highest-priority successful agents, with each such agent receiving a fraction of the budget proportional to her weight. This result provides a significant reduction in the dimensionality of the principal's optimal contract design problem and gives an interpretable and easily implementable optimal contract. Finally, we discuss an application of our results to the design of optimal contracts with two agents and quadratic costs. In this context, we find that the optimal contract assigns a higher weight to the agent whose success it values more, irrespective of the heterogeneity in the agents' cost parameters. This suggests that the structure of the optimal contract depends primarily on the bias in the principal's objective and is, to some extent, robust to the heterogeneity in the agents' cost functions.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Corrective Machine Unlearning
Authors:
Shashwat Goel,
Ameya Prabhu,
Philip Torr,
Ponnurangam Kumaraguru,
Amartya Sanyal
Abstract:
Machine Learning models increasingly face data integrity challenges due to the use of large-scale training datasets drawn from the internet. We study what model developers can do if they detect that some data was manipulated or incorrect. Such manipulated data can cause adverse effects like vulnerability to backdoored samples, systematic biases, and in general, reduced accuracy on certain input do…
▽ More
Machine Learning models increasingly face data integrity challenges due to the use of large-scale training datasets drawn from the internet. We study what model developers can do if they detect that some data was manipulated or incorrect. Such manipulated data can cause adverse effects like vulnerability to backdoored samples, systematic biases, and in general, reduced accuracy on certain input domains. Often, all manipulated training samples are not known, and only a small, representative subset of the affected data is flagged.
We formalize "Corrective Machine Unlearning" as the problem of mitigating the impact of data affected by unknown manipulations on a trained model, possibly knowing only a subset of impacted samples. We demonstrate that the problem of corrective unlearning has significantly different requirements from traditional privacy-oriented unlearning. We find most existing unlearning methods, including the gold-standard retraining-from-scratch, require most of the manipulated data to be identified for effective corrective unlearning. However, one approach, SSD, achieves limited success in unlearning adverse effects with just a small portion of the manipulated samples, showing the tractability of this setting. We hope our work spurs research towards develo** better methods for corrective unlearning and offers practitioners a new strategy to handle data integrity challenges arising from web-scale training.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Authors:
Benjamin L. Edelman,
Ezra Edelman,
Surbhi Goel,
Eran Malach,
Nikolaos Tsilivis
Abstract:
Large language models have the ability to generate text that mimics patterns in their inputs. We introduce a simple Markov Chain sequence modeling task in order to study how this in-context learning (ICL) capability emerges. In our setting, each example is sampled from a Markov chain drawn from a prior distribution over Markov chains. Transformers trained on this task form \emph{statistical induct…
▽ More
Large language models have the ability to generate text that mimics patterns in their inputs. We introduce a simple Markov Chain sequence modeling task in order to study how this in-context learning (ICL) capability emerges. In our setting, each example is sampled from a Markov chain drawn from a prior distribution over Markov chains. Transformers trained on this task form \emph{statistical induction heads} which compute accurate next-token probabilities given the bigram statistics of the context. During the course of training, models pass through multiple phases: after an initial stage in which predictions are uniform, they learn to sub-optimally predict using in-context single-token statistics (unigrams); then, there is a rapid phase transition to the correct in-context bigram solution. We conduct an empirical and theoretical investigation of this multi-phase process, showing how successful learning results from the interaction between the transformer's layers, and uncovering evidence that the presence of the simpler unigram solution may delay formation of the final bigram solution. We examine how learning is affected by varying the prior distribution over Markov chains, and consider the generalization of our in-context learning of Markov chains (ICL-MC) task to $n$-grams for $n > 2$.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Mechanised uniform interpolation for modal logics K, GL, and iSL
Authors:
Hugo Férée,
Iris van der Giessen,
Sam van Gool,
Ian Shillito
Abstract:
The uniform interpolation property in a given logic can be understood as the definability of propositional quantifiers. We mechanise the computation of these quantifiers and prove correctness in the Coq proof assistant for three modal logics, namely: (1) the modal logic K, for which a pen-and-paper proof exists; (2) Gödel-Löb logic GL, for which our formalisation clarifies an important point in an…
▽ More
The uniform interpolation property in a given logic can be understood as the definability of propositional quantifiers. We mechanise the computation of these quantifiers and prove correctness in the Coq proof assistant for three modal logics, namely: (1) the modal logic K, for which a pen-and-paper proof exists; (2) Gödel-Löb logic GL, for which our formalisation clarifies an important point in an existing, but incomplete, sequent-style proof; and (3) intuitionistic strong Löb logic iSL, for which this is the first proof-theoretic construction of uniform interpolants. Our work also yields verified programs that allow one to compute the propositional quantifiers on any formula in this logic.
△ Less
Submitted 29 April, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
On the moments of averages of Ramanujan sums
Authors:
Shivani Goel,
M. Ram Murty
Abstract:
Chan and Kumchev studied averages of the first and second moments of Ramanujan sums. In this article, we extend this investigation by estimating the higher moments of averages of Ramanujan sums using the Brèteche Tauberian theorem. We also give a result for the moments of averages of Cohen-Ramanujan sums.
Chan and Kumchev studied averages of the first and second moments of Ramanujan sums. In this article, we extend this investigation by estimating the higher moments of averages of Ramanujan sums using the Brèteche Tauberian theorem. We also give a result for the moments of averages of Cohen-Ramanujan sums.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Moments of Averages of Ramanujan Sums over Number Fields
Authors:
Sneha Chaubey,
Shivani Goel
Abstract:
Assuming the generalized Lindelöf hypothesis, we provide asymptotic formulas for the mean values of the first and second moments of Ramanujan sums over any number field. Additionally, unconditionally, we estimate the second moment of Ramanujan sums over cyclotomic number fields.
Assuming the generalized Lindelöf hypothesis, we provide asymptotic formulas for the mean values of the first and second moments of Ramanujan sums over any number field. Additionally, unconditionally, we estimate the second moment of Ramanujan sums over cyclotomic number fields.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds
Authors:
Shivam Goel,
Yichen Wei,
Panagiotis Lymperopoulos,
Klara Chura,
Matthias Scheutz,
Jivko Sinapov
Abstract:
As AI agents leave the lab and venture into the real world as autonomous vehicles, delivery robots, and cooking robots, it is increasingly necessary to design and comprehensively evaluate algorithms that tackle the ``open-world''. To this end, we introduce NovelGym, a flexible and adaptable ecosystem designed to simulate gridworld environments, serving as a robust platform for benchmarking reinfor…
▽ More
As AI agents leave the lab and venture into the real world as autonomous vehicles, delivery robots, and cooking robots, it is increasingly necessary to design and comprehensively evaluate algorithms that tackle the ``open-world''. To this end, we introduce NovelGym, a flexible and adaptable ecosystem designed to simulate gridworld environments, serving as a robust platform for benchmarking reinforcement learning (RL) and hybrid planning and learning agents in open-world contexts. The modular architecture of NovelGym facilitates rapid creation and modification of task environments, including multi-agent scenarios, with multiple environment transformations, thus providing a dynamic testbed for researchers to develop open-world AI agents.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Open World Object Detection in the Era of Foundation Models
Authors:
Orr Zohar,
Alejandro Lozano,
Shelly Goel,
Serena Yeung,
Kuan-Chieh Wang
Abstract:
Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method deve…
▽ More
Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method development is hindered due to the stringent benchmark and task definitions. These definitions effectively prohibit foundation models. Here, we aim to relax these definitions and investigate the utilization of pre-trained foundation models in OWD. First, we show that existing benchmarks are insufficient in evaluating methods that utilize foundation models, as even naive integration methods nearly saturate these benchmarks. This result motivated us to curate a new and challenging benchmark for these models. Therefore, we introduce a new benchmark that includes five real-world application-driven datasets, including challenging domains such as aerial and surgical images, and establish baselines. We exploit the inherent connection between classes in application-driven datasets and introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects. FOMO has ~3x unknown object mAP compared to baselines on our benchmark. However, our results indicate a significant place for improvement - suggesting a great research opportunity in further scaling object detection methods to real-world domains. Our code and benchmark are available at https://orrzohar.github.io/projects/fomo/.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
A Framework for Few-Shot Policy Transfer through Observation Map** and Behavior Cloning
Authors:
Yash Shukla,
Bharat Kesari,
Shivam Goel,
Robert Wright,
Jivko Sinapov
Abstract:
Despite recent progress in Reinforcement Learning for robotics applications, many tasks remain prohibitively difficult to solve because of the expensive interaction cost. Transfer learning helps reduce the training time in the target domain by transferring knowledge learned in a source domain. Sim2Real transfer helps transfer knowledge from a simulated robotic domain to a physical target domain. K…
▽ More
Despite recent progress in Reinforcement Learning for robotics applications, many tasks remain prohibitively difficult to solve because of the expensive interaction cost. Transfer learning helps reduce the training time in the target domain by transferring knowledge learned in a source domain. Sim2Real transfer helps transfer knowledge from a simulated robotic domain to a physical target domain. Knowledge transfer reduces the time required to train a task in the physical world, where the cost of interactions is high. However, most existing approaches assume exact correspondence in the task structure and the physical properties of the two domains. This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Map** and Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned map** to clone the successful source task behavior policy to the target domain. We observe successful behavior policy transfer with limited target task interactions and in cases where the source and target task are semantically dissimilar.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Representation Engineering: A Top-Down Approach to AI Transparency
Authors:
Andy Zou,
Long Phan,
Sarah Chen,
James Campbell,
Phillip Guo,
Richard Ren,
Alexander Pan,
Xuwang Yin,
Mantas Mazeika,
Ann-Kathrin Dombrowski,
Shashwat Goel,
Nathaniel Li,
Michael J. Byun,
Zifan Wang,
Alex Mallen,
Steven Basart,
Sanmi Koyejo,
Dawn Song,
Matt Fredrikson,
J. Zico Kolter,
Dan Hendrycks
Abstract:
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equip** us with novel methods for monitoring and manipulating high-level cognitive p…
▽ More
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equip** us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including honesty, harmlessness, power-seeking, and more, demonstrating the promise of top-down transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.
△ Less
Submitted 10 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Authors:
Benjamin L. Edelman,
Surbhi Goel,
Sham Kakade,
Eran Malach,
Cyril Zhang
Abstract:
In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are known to modulate nuanced resource tradeoffs. This work investigates how these complexities necessarily arise for feature learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lo…
▽ More
In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are known to modulate nuanced resource tradeoffs. This work investigates how these complexities necessarily arise for feature learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
△ Less
Submitted 30 October, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Authors:
Neel Guha,
Julian Nyarko,
Daniel E. Ho,
Christopher Ré,
Adam Chilton,
Aditya Narayana,
Alex Chohlas-Wood,
Austin Peters,
Brandon Waldon,
Daniel N. Rockmore,
Diego Zambrano,
Dmitry Talisman,
Enam Hoque,
Faiz Surani,
Frank Fagan,
Galit Sarfaty,
Gregory M. Dickinson,
Haggai Porat,
Jason Hegland,
Jessica Wu,
Joe Nudell,
Joel Niklaus,
John Nay,
Jonathan H. Choi,
Kevin Tobia
, et al. (15 additional authors not shown)
Abstract:
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisc…
▽ More
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
The Disparate Impacts of College Admissions Policies on Asian American Applicants
Authors:
Joshua Grossman,
Sabina Tomkins,
Lindsay Page,
Sharad Goel
Abstract:
There is debate over whether Asian American students are admitted to selective colleges and universities at lower rates than white students with similar academic qualifications. However, there have been few empirical investigations of this issue, in large part due to a dearth of data. Here we present the results from analyzing 685,709 applications from Asian American and white students to a subset…
▽ More
There is debate over whether Asian American students are admitted to selective colleges and universities at lower rates than white students with similar academic qualifications. However, there have been few empirical investigations of this issue, in large part due to a dearth of data. Here we present the results from analyzing 685,709 applications from Asian American and white students to a subset of selective U.S. institutions over five application cycles, beginning with the 2015-2016 cycle. The dataset does not include admissions decisions, and so we construct a proxy based in part on enrollment choices. Based on this proxy, we estimate the odds that Asian American applicants were admitted to at least one of the schools we consider were 28% lower than the odds for white students with similar test scores, grade-point averages, and extracurricular activities. The gap was particularly pronounced for students of South Asian descent (49% lower odds). We trace this pattern in part to two factors. First, many selective colleges openly give preference to the children of alumni, and we find that white applicants were substantially more likely to have such legacy status than Asian applicants, especially South Asian applicants. Second, after adjusting for observed student characteristics, the institutions we consider appear less likely to admit students from geographic regions with relatively high shares of applicants who are Asian. We hope these results inform ongoing discussions on the equity of college admissions policies.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Proportional Aggregation of Preferences for Sequential Decision Making
Authors:
Nikhil Chandak,
Shashwat Goel,
Dominik Peters
Abstract:
We study the problem of fair sequential decision making given voter preferences. In each round, a decision rule must choose a decision from a set of alternatives where each voter reports which of these alternatives they approve. Instead of going with the most popular choice in each round, we aim for proportional representation. We formalize this aim using axioms based on Proportional Justified Rep…
▽ More
We study the problem of fair sequential decision making given voter preferences. In each round, a decision rule must choose a decision from a set of alternatives where each voter reports which of these alternatives they approve. Instead of going with the most popular choice in each round, we aim for proportional representation. We formalize this aim using axioms based on Proportional Justified Representation (PJR), which were proposed in the literature on multi-winner voting and were recently adapted to multi-issue decision making. The axioms require that every group of $α\%$ of the voters, if it agrees in every round (i.e., approves a common alternative), then those voters must approve at least $α\%$ of the decisions. A stronger version of the axioms requires that every group of $α\%$ of the voters that agrees in a $β$ fraction of rounds must approve $β\cdotα\%$ of the decisions. We show that three attractive voting rules satisfy axioms of this style. One of them (Sequential Phragmén) makes its decisions online, and the other two satisfy strengthened versions of the axioms but make decisions semi-online (Method of Equal Shares) or fully offline (Proportional Approval Voting). The first two are polynomial-time computable, and the latter is based on an NP-hard optimization, but it admits a polynomial-time local search algorithm that satisfies the same axiomatic properties. We present empirical results about the performance of these rules based on synthetic data and U.S. political elections. We also run experiments where votes are cast by preference models trained on user responses from the moral machine dataset about ethical dilemmas.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Adversarial Resilience in Sequential Prediction via Abstention
Authors:
Surbhi Goel,
Steve Hanneke,
Shay Moran,
Abhishek Shetty
Abstract:
We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical reco…
▽ More
We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical recommendations, where abstaining from predictions on adversarial examples is preferable to misclassification. On the other hand, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice.
To capture this motivation, we propose a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we design a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we design a learner for VC dimension~1 classes, which works even in the absence of access to the marginal distribution. Our key technical contribution is a novel measure for quantifying uncertainty for learning VC classes, which may be of independent interest.
△ Less
Submitted 24 January, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Automated Reminders Reduce Incarceration for Missed Court Dates: Evidence from a Text Message Experiment
Authors:
Alex Chohlas-Wood,
Madison Coots,
Joe Nudell,
Julian Nyarko,
Emma Brunskill,
Todd Rogers,
Sharad Goel
Abstract:
Millions of Americans must attend mandatory court dates every year. To boost appearance rates, jurisdictions nationwide are increasingly turning to automated reminders, but previous research offers mixed evidence on their effectiveness. In partnership with the Santa Clara County Public Defender Office, we randomly assigned 5,709 public defender clients to either receive automated text message remi…
▽ More
Millions of Americans must attend mandatory court dates every year. To boost appearance rates, jurisdictions nationwide are increasingly turning to automated reminders, but previous research offers mixed evidence on their effectiveness. In partnership with the Santa Clara County Public Defender Office, we randomly assigned 5,709 public defender clients to either receive automated text message reminders (treatment) or not receive reminders (control). We found that reminders reduced warrants issued for missed court dates by approximately 20%, with 12.1% of clients in the control condition issued a warrant compared to 9.7% of clients in the treatment condition. We further found that incarceration from missed court dates dropped by a similar amount, from 6.2% in the control condition to 4.8% in the treatment condition. Our results provide evidence that automated reminders can help people avoid the negative consequences of missing court.
△ Less
Submitted 22 March, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Reevaluating the Role of Race and Ethnicity in Diabetes Screening
Authors:
Madison Coots,
Soroush Saghafian,
David Kent,
Sharad Goel
Abstract:
There is active debate over whether to consider patient race and ethnicity when estimating disease risk. By accounting for race and ethnicity, it is possible to improve the accuracy of risk predictions, but there is concern that their use may encourage a racialized view of medicine. In diabetes risk models, despite substantial gains in statistical accuracy from using race and ethnicity, the gains…
▽ More
There is active debate over whether to consider patient race and ethnicity when estimating disease risk. By accounting for race and ethnicity, it is possible to improve the accuracy of risk predictions, but there is concern that their use may encourage a racialized view of medicine. In diabetes risk models, despite substantial gains in statistical accuracy from using race and ethnicity, the gains in clinical utility are surprisingly modest. These modest clinical gains stem from two empirical patterns: first, the vast majority of individuals receive the same screening recommendation regardless of whether race or ethnicity are included in risk models; and second, for those who do receive different screening recommendations, the difference in utility between screening and not screening is relatively small. Our results are based on broad statistical principles, and so are likely to generalize to many other risk-based clinical decisions.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Exposing Attention Glitches with Flip-Flop Language Modeling
Authors:
Bingbin Liu,
Jordan T. Ash,
Surbhi Goel,
Akshay Krishnamurthy,
Cyril Zhang
Abstract:
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem,…
▽ More
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of attention glitches, in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning. To isolate the issue, we introduce flip-flop language modeling (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.
△ Less
Submitted 30 October, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Humans in 4D: Reconstructing and Tracking Humans with Transformers
Authors:
Shubham Goel,
Georgios Pavlakos,
Jathushan Rajasegaran,
Angjoo Kanazawa,
Jitendra Malik
Abstract:
We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery. This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images. To analyze video, we use 3D reconstruction…
▽ More
We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version of a network for human mesh recovery. This network, HMR 2.0, advances the state of the art and shows the capability to analyze unusual poses that have in the past been difficult to reconstruct from single images. To analyze video, we use 3D reconstructions from HMR 2.0 as input to a tracking system that operates in 3D. This enables us to deal with multiple people and maintain identities through occlusion events. Our complete approach, 4DHumans, achieves state-of-the-art results for tracking people from monocular video. Furthermore, we demonstrate the effectiveness of HMR 2.0 on the downstream task of action recognition, achieving significant improvements over previous pose-based action recognition approaches. Our code and models are available on the project website: https://shubham-goel.github.io/4dhumans/.
△ Less
Submitted 31 August, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Risk Scores, Label Bias, and Everything but the Kitchen Sink
Authors:
Michael Zanger-Tishler,
Julian Nyarko,
Sharad Goel
Abstract:
In designing risk assessment algorithms, many scholars promote a "kitchen sink" approach, reasoning that more information yields more accurate predictions. We show, however, that this rationale often fails when algorithms are trained to predict a proxy of the true outcome, as is typically the case. With such "label bias", one should exclude a feature if its correlation with the proxy and its corre…
▽ More
In designing risk assessment algorithms, many scholars promote a "kitchen sink" approach, reasoning that more information yields more accurate predictions. We show, however, that this rationale often fails when algorithms are trained to predict a proxy of the true outcome, as is typically the case. With such "label bias", one should exclude a feature if its correlation with the proxy and its correlation with the true outcome have opposite signs, conditional on the other model features. This criterion is often satisfied when a feature is weakly correlated with the true outcome, and, additionally, that feature and the true outcome are both direct causes of the remaining features. For example, due to patterns of police deployment, criminal behavior and geography may be weakly correlated and direct causes of one's criminal record, suggesting one should exclude geography in criminal risk assessments trained to predict arrest as a proxy for behavior.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Pair correlation of real-valued vector sequences
Authors:
Sneha Chaubey,
Shivani Goel
Abstract:
In this article, we investigate the fine-scale statistics of real-valued arithmetic sequences. In particular, we focus on real-valued vector sequences and show the Poissonian behavior of the pair correlation function for certain classes of such sequences, thereby extending previous works of Boca et al. and the first author on local statistics of integer-valued and rational-valued vector sequences.
In this article, we investigate the fine-scale statistics of real-valued arithmetic sequences. In particular, we focus on real-valued vector sequences and show the Poissonian behavior of the pair correlation function for certain classes of such sequences, thereby extending previous works of Boca et al. and the first author on local statistics of integer-valued and rational-valued vector sequences.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Optimal tie-breaking rules
Authors:
Sumit Goel,
Amit Goyal
Abstract:
We consider two-player contests with the possibility of ties and study the effect of different tie-breaking rules on effort. For ratio-form and difference-form contests that admit pure-strategy Nash equilibrium, we find that the effort of both players is monotone decreasing in the probability that ties are broken in favor of the stronger player. Thus, the effort-maximizing tie-breaking rule commit…
▽ More
We consider two-player contests with the possibility of ties and study the effect of different tie-breaking rules on effort. For ratio-form and difference-form contests that admit pure-strategy Nash equilibrium, we find that the effort of both players is monotone decreasing in the probability that ties are broken in favor of the stronger player. Thus, the effort-maximizing tie-breaking rule commits to breaking ties in favor of the weaker agent. With symmetric agents, we find that the equilibrium is generally symmetric and independent of the tie-breaking rule. We also study the design of random tie-breaking rules that are ex-ante fair and identify sufficient conditions under which breaking ties before the contest actually leads to greater expected effort than the more commonly observed practice of breaking ties after the contest.
△ Less
Submitted 30 August, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Learning Narrow One-Hidden-Layer ReLU Networks
Authors:
Sitan Chen,
Zehao Dou,
Surbhi Goel,
Adam R Klivans,
Raghu Meka
Abstract:
We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weigh…
▽ More
We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned.
Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestep** the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness
Authors:
Allison Koenecke,
Eric Giannella,
Robb Willer,
Sharad Goel
Abstract:
Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are res…
▽ More
Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California's Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Unveiling the non-Abelian statistics of $D(S_3)$ anyons via photonic simulation
Authors:
Suraj Goel,
Matthew Reynolds,
Matthew Girling,
Will McCutcheon,
Saroch Leedumrongwatthanakun,
Vatshal Srivastav,
David Jennings,
Mehul Malik,
Jiannis K. Pachos
Abstract:
Simulators can realise novel phenomena by separating them from the complexities of a full physical implementation. Here we put forward a scheme that can simulate the exotic statistics of $D(S_3)$ non-Abelian anyons with minimal resources. The qudit lattice representation of this planar code supports local encoding of $D(S_3)$ anyons. As a proof-of-principle demonstration we employ a photonic simul…
▽ More
Simulators can realise novel phenomena by separating them from the complexities of a full physical implementation. Here we put forward a scheme that can simulate the exotic statistics of $D(S_3)$ non-Abelian anyons with minimal resources. The qudit lattice representation of this planar code supports local encoding of $D(S_3)$ anyons. As a proof-of-principle demonstration we employ a photonic simulator to encode a single qutrit and manipulate it to perform the fusion and braiding properties of non-Abelian $D(S_3)$ anyons. The photonic technology allows us to perform the required non-unitary operations with much higher fidelity than what can be achieved with current quantum computers. Our approach can be directly generalised to larger systems or to different anyonic models, thus enabling advances in the exploration of quantum error correction and fundamental physics alike.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Distance matrix of enhanced power graphs of finite groups
Authors:
Anita Arora,
Hiranya Kishore Dey,
Shivani Goel
Abstract:
The enhanced power graph of a group $G$ is the graph $\mathcal{G}_E(G)$ with vertex set $G$ and edge set $ \{(u,v): u, v \in \langle w \rangle,~\mbox{for some}~ w \in G\}$. In this paper, we compute the spectrum of the distance matrix of the enhanced power graph of non-abelian groups of order $pq$, dihedral groups, dicyclic groups, elementary abelian groups $\mathrm{El}(p^n)$ and the non-cyclic ab…
▽ More
The enhanced power graph of a group $G$ is the graph $\mathcal{G}_E(G)$ with vertex set $G$ and edge set $ \{(u,v): u, v \in \langle w \rangle,~\mbox{for some}~ w \in G\}$. In this paper, we compute the spectrum of the distance matrix of the enhanced power graph of non-abelian groups of order $pq$, dihedral groups, dicyclic groups, elementary abelian groups $\mathrm{El}(p^n)$ and the non-cyclic abelian groups $\mathrm{El}(p^n)\times\mathrm{El}(q^m)$ and $\mathrm{El}(p^n)\times \mathbb{Z}_m$, where $p$ and $q$ are distinct primes.
For the non-cyclic abelian group $\mathrm{El}(p^n)\times \mathrm{El}(q^m)$, we also compute the spectrum of the adjacency matrix of its enhanced power graph and the spectrum of the adjacency and the distance matrix of its power graph.
△ Less
Submitted 21 June, 2023; v1 submitted 9 April, 2023;
originally announced April 2023.
-
Referenceless characterisation of complex media using physics-informed neural networks
Authors:
Suraj Goel,
Claudio Conti,
Saroch Leedumrongwatthanakun,
Mehul Malik
Abstract:
In this work, we present a method to characterise the transmission matrices of complex scattering media using a physics-informed, multi-plane neural network (MPNN) without the requirement of a known optical reference field. We use this method to accurately measure the transmission matrix of a commercial multi-mode fiber without the problems of output-phase ambiguity and dark spots, leading to upto…
▽ More
In this work, we present a method to characterise the transmission matrices of complex scattering media using a physics-informed, multi-plane neural network (MPNN) without the requirement of a known optical reference field. We use this method to accurately measure the transmission matrix of a commercial multi-mode fiber without the problems of output-phase ambiguity and dark spots, leading to upto 58% improvement in focusing efficiency compared with phase-step** holography. We demonstrate how our method is significantly more noise-robust than phase-step** holography and show how it can be generalised to characterise a cascade of transmission matrices, allowing one to control the propagation of light between independent scattering media. This work presents an essential tool for accurate light control through complex media, with applications ranging from classical optical networks, biomedical imaging, to quantum information processing.
△ Less
Submitted 26 September, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Low impact agency: review and discussion
Authors:
Danilo Naiff,
Shashwat Goel
Abstract:
Powerful artificial intelligence poses an existential threat if the AI decides to drastically change the world in pursuit of its goals. The hope of low-impact artificial intelligence is to incentivize AI to not do that just because this causes a large impact in the world. In this work, we first review the concept of low-impact agency and previous proposals to approach the problem, and then propose…
▽ More
Powerful artificial intelligence poses an existential threat if the AI decides to drastically change the world in pursuit of its goals. The hope of low-impact artificial intelligence is to incentivize AI to not do that just because this causes a large impact in the world. In this work, we first review the concept of low-impact agency and previous proposals to approach the problem, and then propose future research directions in the topic, with the goal to ensure low-impactedness is useful in making AI safe.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Designing Equitable Algorithms
Authors:
Alex Chohlas-Wood,
Madison Coots,
Sharad Goel,
Julian Nyarko
Abstract:
Predictive algorithms are now used to help distribute a large share of our society's resources and sanctions, such as healthcare, loans, criminal detentions, and tax audits. Under the right circumstances, these algorithms can improve the efficiency and equity of decision-making. At the same time, there is a danger that the algorithms themselves could entrench and exacerbate disparities, particular…
▽ More
Predictive algorithms are now used to help distribute a large share of our society's resources and sanctions, such as healthcare, loans, criminal detentions, and tax audits. Under the right circumstances, these algorithms can improve the efficiency and equity of decision-making. At the same time, there is a danger that the algorithms themselves could entrench and exacerbate disparities, particularly along racial, ethnic, and gender lines. To help ensure their fairness, many researchers suggest that algorithms be subject to at least one of three constraints: (1) no use of legally protected features, such as race, ethnicity, and gender; (2) equal rates of "positive" decisions across groups; and (3) equal error rates across groups. Here we show that these constraints, while intuitively appealing, often worsen outcomes for individuals in marginalized groups, and can even leave all groups worse off. The inherent trade-off we identify between formal fairness constraints and welfare improvements -- particularly for the marginalized -- highlights the need for a more robust discussion on what it means for an algorithm to be "fair". We illustrate these ideas with examples from healthcare and the criminal-legal system, and make several proposals to help practitioners design more equitable algorithms.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Deciding Equations in the Time Warp Algebra
Authors:
Sam van Gool,
Adrien Guatto,
George Metcalfe,
Simon Santschi
Abstract:
Join-preserving maps on the discrete time scale $ω^+$, referred to as time warps, have been proposed as graded modalities that can be used to quantify the growth of information in the course of program execution. The set of time warps forms a simple distributive involutive residuated lattice -- called the time warp algebra -- that is equipped with residual operations relevant to potential applicat…
▽ More
Join-preserving maps on the discrete time scale $ω^+$, referred to as time warps, have been proposed as graded modalities that can be used to quantify the growth of information in the course of program execution. The set of time warps forms a simple distributive involutive residuated lattice -- called the time warp algebra -- that is equipped with residual operations relevant to potential applications. In this paper, we show that although the time warp algebra generates a variety that lacks the finite model property, it nevertheless has a decidable equational theory. We also describe an implementation of a procedure for deciding equations in this algebra, written in the OCaml programming language, that makes use of the Z3 theorem prover.
△ Less
Submitted 25 January, 2024; v1 submitted 15 January, 2023;
originally announced February 2023.
-
Profinite lambda-terms and parametricity
Authors:
Sam van Gool,
Paul-André Melliès,
Vincent Moreau
Abstract:
Combining ideas coming from Stone duality and Reynolds parametricity, we formulate in a clean and principled way a notion of profinite lambda-term which, we show, generalizes at every type the traditional notion of profinite word coming from automata theory. We start by defining the Stone space of profinite lambda-terms as a projective limit of finite sets of usual lambda-terms, considered modulo…
▽ More
Combining ideas coming from Stone duality and Reynolds parametricity, we formulate in a clean and principled way a notion of profinite lambda-term which, we show, generalizes at every type the traditional notion of profinite word coming from automata theory. We start by defining the Stone space of profinite lambda-terms as a projective limit of finite sets of usual lambda-terms, considered modulo a notion of equivalence based on the finite standard model. One main contribution of the paper is to establish that, somewhat surprisingly, the resulting notion of profinite lambda-term coming from Stone duality lives in perfect harmony with the principles of Reynolds parametricity. In addition, we show that the notion of profinite lambda-term is compositional by constructing a cartesian closed category of profinite lambda-terms, and we establish that the embedding from lambda-terms modulo beta-eta-conversion to profinite lambda-terms is faithful using Statman's finite completeness theorem. Finally, we prove that the traditional Church encoding of finite words into lambda-terms can be extended to profinite words, and leads to a homeomorphism between the space of profinite words and the space of profinite lambda-terms of the corresponding Church type.
△ Less
Submitted 18 November, 2023; v1 submitted 29 January, 2023;
originally announced January 2023.
-
A Performance Verification Methodology for Resource Allocation Heuristics
Authors:
Saksham Goel,
Benjamin Mikek,
Jehad Aly,
Venkat Arun,
Ahmed Saeed,
Aditya Akella
Abstract:
Performance verification is a nascent but promising tool for understanding the performance and limitations of heuristics under realistic assumptions. Bespoke performance verification tools have already demonstrated their value in settings like congestion control and packet scheduling. In this paper, we aim to emphasize the broad applicability and utility of performance verification. To that end, w…
▽ More
Performance verification is a nascent but promising tool for understanding the performance and limitations of heuristics under realistic assumptions. Bespoke performance verification tools have already demonstrated their value in settings like congestion control and packet scheduling. In this paper, we aim to emphasize the broad applicability and utility of performance verification. To that end, we highlight the design principles of performance verification. Then, we leverage that understanding to develop a set of easy-to-follow guidelines that are applicable to a wide range of resource allocation heuristics. In particular, we introduce Virelay, a framework that enables heuristic designers to express the behavior of their algorithms and their assumptions about the system in an environment that resembles a discrete-event simulator. We demonstrate the utility and ease-of-use of Virelay by applying it to six diverse case studies. We produce bounds on the performance of classical algorithms, work stealing and SRPT scheduling, under practical assumptions. We demonstrate Virelay's expressiveness by capturing existing models for congestion control and packet scheduling, and we verify the observation that TCP unfairness can cause some ML training workloads to spontaneously converge to a state of high network utilization. Finally, we use Virelay to identify two bugs in the Linux CFS load balancer.
△ Less
Submitted 28 February, 2024; v1 submitted 10 January, 2023;
originally announced January 2023.
-
A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations
Authors:
Sohan Rudra,
Saksham Goel,
Anirban Santara,
Claudio Gentile,
Laurent Perron,
Fei Xia,
Vikas Sindhwani,
Carolina Parada,
Gaurav Aggarwal
Abstract:
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static obj…
▽ More
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Decomposing the Fundamentals of Creepy Stories
Authors:
Sakshi Goel,
Haripriya Dharmala,
Yuchen Zhang,
Keith Burghardt
Abstract:
Fear is a universal concept; people crave it in urban legends, scary movies, and modern stories. Open questions remain, however, about why these stories are scary and more generally what scares people. In this study, we explore these questions by analyzing tens of thousands of scary stories on forums (known as subreddits) in a social media website, Reddit. We first explore how writing styles have…
▽ More
Fear is a universal concept; people crave it in urban legends, scary movies, and modern stories. Open questions remain, however, about why these stories are scary and more generally what scares people. In this study, we explore these questions by analyzing tens of thousands of scary stories on forums (known as subreddits) in a social media website, Reddit. We first explore how writing styles have evolved to keep these stories fresh before we analyze the stable core techniques writers use to make stories scary. We find that writers have changed the themes of their stories over years from haunted houses to school-related themes, body horror, and diseases. Yet some features remain stable; words associated with pseudo-human nouns, such as clown or devil are more common in scary stories than baselines. In addition, we collect a range of datasets that annotate sentences containing fear. We use these data to develop a high-accuracy fear detection neural network model, which is used to quantify where people express fear in scary stories. We find that sentences describing fear, and words most often seen in scary stories, spike at particular points in a story, possibly as a way to keep the readers on the edge of their seats until the story's conclusion. These results provide a new understanding of how authors cater to their readers, and how fear may manifest in stories.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Transformers Learn Shortcuts to Automata
Authors:
Bingbin Liu,
Jordan T. Ash,
Surbhi Goel,
Akshay Krishnamurthy,
Cyril Zhang
Abstract:
Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are learned by these shallow and non-recurrent models? We find t…
▽ More
Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are learned by these shallow and non-recurrent models? We find that a low-depth Transformer can represent the computations of any finite-state automaton (thus, any bounded-memory algorithm), by hierarchically reparameterizing its recurrent dynamics. Our theoretical results characterize shortcut solutions, whereby a Transformer with $o(T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$. We find that polynomial-sized $O(\log T)$-depth solutions always exist; furthermore, $O(1)$-depth simulators are surprisingly common, and can be understood using tools from Krohn-Rhodes theory and circuit complexity. Empirically, we perform synthetic experiments by training Transformers to simulate a wide variety of automata, and show that shortcut solutions can be learned via standard training. We further investigate the brittleness of these solutions and propose potential mitigations.
△ Less
Submitted 2 May, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
On duality and model theory for polyadic spaces
Authors:
Sam van Gool,
Jérémie Marquès
Abstract:
This paper is a study of first-order coherent logic from the point of view of duality and categorical logic. We prove a duality theorem between coherent hyperdoctrines and open polyadic Priestley spaces, which we subsequently apply to prove completeness, omitting types, and Craig interpolation theorems for coherent or intuitionistic logic. Our approach emphasizes the role of interpolation and open…
▽ More
This paper is a study of first-order coherent logic from the point of view of duality and categorical logic. We prove a duality theorem between coherent hyperdoctrines and open polyadic Priestley spaces, which we subsequently apply to prove completeness, omitting types, and Craig interpolation theorems for coherent or intuitionistic logic. Our approach emphasizes the role of interpolation and openness properties, and allows for a modular, syntax-free treatment of these model-theoretic results. As further applications of the same method, we prove completeness theorems for constant domain and Gödel-Dummett intuitionistic predicate logics.
△ Less
Submitted 31 October, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms
Authors:
Surbhi Goel,
Sham Kakade,
Adam Tauman Kalai,
Cyril Zhang
Abstract:
Neural networks (NNs) struggle to efficiently solve certain problems, such as learning parities, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized program. For example, on parity problems, the N…
▽ More
Neural networks (NNs) struggle to efficiently solve certain problems, such as learning parities, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized program. For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight sharing between layers and convolutional weight sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more natural and powerful than either alone, particularly for concisely parameterizing discrete algorithms.
△ Less
Submitted 15 January, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Authors:
Boaz Barak,
Benjamin L. Edelman,
Surbhi Goel,
Sham Kakade,
Eran Malach,
Cyril Zhang
Abstract:
There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resources modulate statistical capacity, far less is known about their effect on the computational problem of model training. This work conducts such an exploration through the lens of learning a $k$-spars…
▽ More
There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resources modulate statistical capacity, far less is known about their effect on the computational problem of model training. This work conducts such an exploration through the lens of learning a $k$-sparse parity of $n$ bits, a canonical discrete search problem which is statistically easy but computationally hard. Empirically, we find that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately $n^{O(k)}$ iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD "stumbles in the dark" until it finds the hidden set of features (a natural algorithm which also runs in $n^{O(k)}$ time). Instead, we show that SGD gradually amplifies the sparse solution via a Fourier gap in the population gradient, making continual progress that is invisible to loss and error metrics.
△ Less
Submitted 15 January, 2023; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Causal Conceptions of Fairness and their Consequences
Authors:
Hamed Nilforoshan,
Johann Gaebler,
Ravi Shroff,
Sharad Goel
Abstract:
Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) th…
▽ More
Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions \emph{almost always} -- in a measure theoretic sense -- result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. For example, in the case of college admissions decisions, policies constrained to satisfy causal fairness definitions would be disfavored by every stakeholder with neutral or positive preferences for both academic preparedness and diversity. Indeed, under a prominent definition of causal fairness, we prove the resulting policies require admitting all students with the same probability, regardless of academic qualifications or group membership. Our results highlight formal limitations and potential adverse consequences of common mathematical notions of causal fairness.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.