-
Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior
Authors:
Nathan P. Lawrence,
Philip D. Loewen,
Shuyuan Wang,
Michael G. Forbes,
R. Bhushan Gopaluni
Abstract:
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of…
▽ More
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Perhaps of independent interest, we formulate and analyze the stability of such data-driven models in the presence of noise. The Youla-Kucera approach requires a stable "parameter" for controller design. For the training of reinforcement learning agents, the set of all stable linear operators is given explicitly through a matrix factorization approach. Moreover, a nonlinear extension is given using a neural network to express a parameterized set of stable operators, which enables seamless integration with standard deep learning libraries. Finally, we show how these ideas can also be applied to tune fixed-structure controllers.
△ Less
Submitted 21 March, 2024; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Reinforcement Learning with Partial Parametric Model Knowledge
Authors:
Shuyuan Wang,
Philip D. Loewen,
Nathan P. Lawrence,
Michael G. Forbes,
R. Bhushan Gopaluni
Abstract:
We adapt reinforcement learning (RL) methods for continuous control to bridge the gap between complete ignorance and perfect knowledge of the environment. Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes inspiration from both model-free RL and model-based control. It uses incomplete information from a partial model and retains RL's data-driven adaption towards optimal pe…
▽ More
We adapt reinforcement learning (RL) methods for continuous control to bridge the gap between complete ignorance and perfect knowledge of the environment. Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes inspiration from both model-free RL and model-based control. It uses incomplete information from a partial model and retains RL's data-driven adaption towards optimal performance. The linear quadratic regulator provides a case study; numerical experiments demonstrate the effectiveness and resulting benefits of the proposed method.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
A modular framework for stabilizing deep reinforcement learning control
Authors:
Nathan P. Lawrence,
Philip D. Loewen,
Shuyuan Wang,
Michael G. Forbes,
R. Bhushan Gopaluni
Abstract:
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of…
▽ More
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Using a neural network to express a parameterized set of nonlinear stable operators enables seamless integration with standard deep learning libraries. We demonstrate the approach on a realistic simulation of a two-tank system.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Meta-Reinforcement Learning for Adaptive Control of Second Order Systems
Authors:
Daniel G. McClement,
Nathan P. Lawrence,
Michael G. Forbes,
Philip D. Loewen,
Johan U. Backström,
R. Bhushan Gopaluni
Abstract:
Meta-learning is a branch of machine learning which aims to synthesize data from a distribution of related tasks to efficiently solve new ones. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that t…
▽ More
Meta-learning is a branch of machine learning which aims to synthesize data from a distribution of related tasks to efficiently solve new ones. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that takes advantage of known, offline information for training, such as a model structure. The meta-RL agent is trained over a distribution of model parameters, rather than a single model, enabling the agent to automatically adapt to changes in the process dynamics while maintaining performance. A key design element is the ability to leverage model-based information offline during training, while maintaining a model-free policy structure for interacting with new environments. Our previous work has demonstrated how this approach can be applied to the industrially-relevant problem of tuning proportional-integral controllers to control first order processes. In this work, we briefly reintroduce our methodology and demonstrate how it can be extended to proportional-integral-derivative controllers and second order systems.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach
Authors:
Daniel G. McClement,
Nathan P. Lawrence,
Johan U. Backstrom,
Philip D. Loewen,
Michael G. Forbes,
R. Bhushan Gopaluni
Abstract:
Meta-learning is a branch of machine learning which trains neural network models to synthesize a wide variety of data in order to rapidly solve new problems. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control s…
▽ More
Meta-learning is a branch of machine learning which trains neural network models to synthesize a wide variety of data in order to rapidly solve new problems. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that can be used to tune proportional--integral controllers. Our meta-RL agent has a recurrent structure that accumulates "context" to learn a system's dynamics through a hidden state variable in closed-loop. This architecture enables the agent to automatically adapt to changes in the process dynamics. In tests reported here, the meta-RL agent was trained entirely offline on first order plus time delay systems, and produced excellent results on novel systems drawn from the same distribution of process dynamics used for training. A key design element is the ability to leverage model-based information offline during training in simulated environments while maintaining a model-free policy structure for interacting with novel processes where there is uncertainty regarding the true process dynamics. Meta-learning is a promising approach for constructing sample-efficient intelligent controllers.
△ Less
Submitted 19 September, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Ideals, Determinants, and Straightening: Proving and Using Lower Bounds for Polynomial Ideals
Authors:
Robert Andrews,
Michael A. Forbes
Abstract:
We show that any nonzero polynomial in the ideal generated by the $r \times r$ minors of an $n \times n$ matrix $X$ can be used to efficiently approximate the determinant. For any nonzero polynomial $f$ in this ideal, we construct a small depth-three $f$-oracle circuit that approximates the determinant of size $Θ(r^{1/3})$ in the sense of border complexity. For many classes of algebraic circuits,…
▽ More
We show that any nonzero polynomial in the ideal generated by the $r \times r$ minors of an $n \times n$ matrix $X$ can be used to efficiently approximate the determinant. For any nonzero polynomial $f$ in this ideal, we construct a small depth-three $f$-oracle circuit that approximates the determinant of size $Θ(r^{1/3})$ in the sense of border complexity. For many classes of algebraic circuits, this implies that every nonzero polynomial in the ideal generated by $r \times r$ minors is at least as hard to approximately compute as the determinant of size $Θ(r^{1/3})$. We also prove an analogous result for the Pfaffian of a $2n \times 2n$ skew-symmetric matrix and the ideal generated by Pfaffians of $2r \times 2r$ principal submatrices.
This answers a recent question of Grochow about complexity in polynomial ideals in the setting of border complexity. We give several applications of our result, two of which are highlighted below.
$\bullet$ We prove super-polynomial lower bounds for Ideal Proof System refutations computed by low-depth circuits. This extends the recent breakthrough low-depth circuit lower bounds of Limaye, Srinivasan, and Tavenas to the setting of proof complexity. For many natural circuit classes, we show that the approximative proof complexity of our hard instance is governed by the approximative circuit complexity of the determinant.
$\bullet$ We construct new hitting set generators for polynomial-size low-depth circuits. For any $\varepsilon > 0$, we construct generators with seed length $O(n^\varepsilon)$ that attain a near-optimal tradeoff between their seed length and degree, and are computable by low-depth circuits of near-linear size (with respect to the size of their output). This matches the seed length of the generators recently obtained by Limaye, Srinivasan, and Tavenas, but improves on the generator's degree and circuit complexity.
△ Less
Submitted 27 October, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Deep Reinforcement Learning with Shallow Controllers: An Experimental Application to PID Tuning
Authors:
Nathan P. Lawrence,
Michael G. Forbes,
Philip D. Loewen,
Daniel G. McClement,
Johan U. Backstrom,
R. Bhushan Gopaluni
Abstract:
Deep reinforcement learning (RL) is an optimization-driven framework for producing control strategies for general dynamical systems without explicit reliance on process models. Good results have been reported in simulation. Here we demonstrate the challenges in implementing a state of the art deep RL algorithm on a real physical system. Aspects include the interplay between software and existing h…
▽ More
Deep reinforcement learning (RL) is an optimization-driven framework for producing control strategies for general dynamical systems without explicit reliance on process models. Good results have been reported in simulation. Here we demonstrate the challenges in implementing a state of the art deep RL algorithm on a real physical system. Aspects include the interplay between software and existing hardware; experiment design and sample efficiency; training subject to input constraints; and interpretability of the algorithm and control law. At the core of our approach is the use of a PID controller as the trainable RL policy. In addition to its simplicity, this approach has several appealing features: No additional hardware needs to be added to the control system, since a PID controller can easily be implemented through a standard programmable logic controller; the control law can easily be initialized in a "safe'' region of the parameter space; and the final product -- a well-tuned PID controller -- has a form that practitioners can reason about and deploy with confidence.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Can Machines Learn Morality? The Delphi Experiment
Authors:
Liwei Jiang,
Jena D. Hwang,
Chandra Bhagavatula,
Ronan Le Bras,
Jenny Liang,
Jesse Dodge,
Keisuke Sakaguchi,
Maxwell Forbes,
Jon Borchardt,
Saadia Gabriel,
Yulia Tsvetkov,
Oren Etzioni,
Maarten Sap,
Regina Rini,
Ye** Choi
Abstract:
As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications,…
▽ More
As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications, which poses a seemingly impossible challenge: teaching machines moral sense, while humanity continues to grapple with it.
To explore this challenge, we introduce Delphi, an experimental framework based on deep neural networks trained directly to reason about descriptive ethical judgments, e.g., "hel** a friend" is generally good, while "hel** a friend spread fake news" is not. Empirical results shed novel insights on the promises and limits of machine ethics; Delphi demonstrates strong generalization capabilities in the face of novel ethical situations, while off-the-shelf neural network models exhibit markedly poor judgment including unjust biases, confirming the need for explicitly teaching machines moral sense.
Yet, Delphi is not perfect, exhibiting susceptibility to pervasive biases and inconsistencies. Despite that, we demonstrate positive use cases of imperfect Delphi, including using it as a component model within other imperfect AI systems. Importantly, we interpret the operationalization of Delphi in light of prominent ethical theories, which leads us to important future research questions.
△ Less
Submitted 12 July, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Authors:
Yao Dou,
Maxwell Forbes,
Rik Koncel-Kedziorski,
Noah A. Smith,
Ye** Choi
Abstract:
Modern neural language models can produce remarkably fluent and grammatical text. So much, in fact, that recent work by Clark et al. (2021) has reported that conventional crowdsourcing can no longer reliably distinguish between machine-authored (GPT-3) and human-authored writing. As errors in machine generations become ever subtler and harder to spot, it poses a new challenge to the research commu…
▽ More
Modern neural language models can produce remarkably fluent and grammatical text. So much, in fact, that recent work by Clark et al. (2021) has reported that conventional crowdsourcing can no longer reliably distinguish between machine-authored (GPT-3) and human-authored writing. As errors in machine generations become ever subtler and harder to spot, it poses a new challenge to the research community for robust machine text evaluation. We propose a new framework called Scarecrow for scrutinizing machine text via crowd annotation. To support the broad range of real machine errors that can be identified by laypeople, the ten error categories of Scarecrow -- such as redundancy, commonsense errors, and incoherence -- are identified through several rounds of crowd annotation experiments without a predefined ontology. We then use Scarecrow to collect over 41k error spans in human-written and machine-generated paragraphs of English language news text. We isolate factors for detailed analysis, including parameter count, training data, and various decoding-time configurations. Our approach successfully quantifies measurable gaps between human authored text and generations from models of several sizes, including fourteen configurations of GPT-3. In addition, our analysis unveils new insights, with detailed rationales provided by laypeople, e.g., that the commonsense capabilities have been improving with larger models while math capabilities have not, and that the choices of simple decoding hyperparameters can make remarkable differences on the perceived quality of machine text. We release our training material, annotation toolkit and dataset at https://yao-dou.github.io/scarecrow/.
△ Less
Submitted 7 March, 2022; v1 submitted 2 July, 2021;
originally announced July 2021.
-
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Authors:
Jack Hessel,
Ari Holtzman,
Maxwell Forbes,
Ronan Le Bras,
Ye** Choi
Abstract:
Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in contrast to the reference-free manner in which humans assess caption quality.
In this paper, we report the surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs f…
▽ More
Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in contrast to the reference-free manner in which humans assess caption quality.
In this paper, we report the surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references. Experiments spanning several corpora demonstrate that our new reference-free metric, CLIPScore, achieves the highest correlation with human judgements, outperforming existing reference-based metrics like CIDEr and SPICE. Information gain experiments demonstrate that CLIPScore, with its tight focus on image-text compatibility, is complementary to existing reference-based metrics that emphasize text-text similarities. Thus, we also present a reference-augmented version, RefCLIPScore, which achieves even higher correlation. Beyond literal description tasks, several case studies reveal domains where CLIPScore performs well (clip-art images, alt-text rating), but also where it is relatively weaker in comparison to reference-based metrics, e.g., news captions that require richer contextual knowledge.
△ Less
Submitted 23 March, 2022; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Almost Surely Stable Deep Dynamics
Authors:
Nathan P. Lawrence,
Philip D. Loewen,
Michael G. Forbes,
Johan U. Backström,
R. Bhushan Gopaluni
Abstract:
We introduce a method for learning provably stable deep neural network based dynamic models from observed data. Specifically, we consider discrete-time stochastic dynamic models, as they are of particular interest in practical applications such as estimation and control. However, these aspects exacerbate the challenge of guaranteeing stability. Our method works by embedding a Lyapunov neural netwo…
▽ More
We introduce a method for learning provably stable deep neural network based dynamic models from observed data. Specifically, we consider discrete-time stochastic dynamic models, as they are of particular interest in practical applications such as estimation and control. However, these aspects exacerbate the challenge of guaranteeing stability. Our method works by embedding a Lyapunov neural network into the dynamic model, thereby inherently satisfying the stability criterion. To this end, we propose two approaches and apply them in both the deterministic and stochastic settings: one exploits convexity of the Lyapunov function, while the other enforces stability through an implicit output layer. We demonstrate the utility of each approach through numerical examples.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
A Meta-Reinforcement Learning Approach to Process Control
Authors:
Daniel G. McClement,
Nathan P. Lawrence,
Philip D. Loewen,
Michael G. Forbes,
Johan U. Backström,
R. Bhushan Gopaluni
Abstract:
Meta-learning is a branch of machine learning which aims to quickly adapt models, such as neural networks, to perform new tasks by learning an underlying structure across related tasks. In essence, models are being trained to learn new tasks effectively rather than master a single task. Meta-learning is appealing for process control applications because the perturbations to a process required to t…
▽ More
Meta-learning is a branch of machine learning which aims to quickly adapt models, such as neural networks, to perform new tasks by learning an underlying structure across related tasks. In essence, models are being trained to learn new tasks effectively rather than master a single task. Meta-learning is appealing for process control applications because the perturbations to a process required to train an AI controller can be costly and unsafe. Additionally, the dynamics and control objectives are similar across many different processes, so it is feasible to create a generalizable controller through meta-learning capable of quickly adapting to different systems. In this work, we construct a deep reinforcement learning (DRL) based controller and meta-train the controller using a latent context variable through a separate embedding neural network. We test our meta-algorithm on its ability to adapt to new process dynamics as well as different control objectives on the same process. In both cases, our meta-learning algorithm adapts very quickly to new tasks, outperforming a regular DRL controller trained from scratch. Meta-learning appears to be a promising approach for constructing more intelligent and sample-efficient controllers.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations
Authors:
Yao Dou,
Maxwell Forbes,
Ari Holtzman,
Ye** Choi
Abstract:
We study conversational dialog in which there are many possible responses to a given history. We present the MultiTalk Dataset, a corpus of over 320,000 sentences of written conversational dialog that balances a high branching factor (10) with several conversation turns (6) through selective branch continuation. We make multiple contributions to study dialog generation in the highly branching sett…
▽ More
We study conversational dialog in which there are many possible responses to a given history. We present the MultiTalk Dataset, a corpus of over 320,000 sentences of written conversational dialog that balances a high branching factor (10) with several conversation turns (6) through selective branch continuation. We make multiple contributions to study dialog generation in the highly branching setting. In order to evaluate a diverse set of generations, we propose a simple scoring algorithm, based on bipartite graph matching, to optimally incorporate a set of diverse references. We study multiple language generation tasks at different levels of predictive conversation depth, using textual attributes induced automatically from pretrained classifiers. Our culminating task is a challenging theory of mind problem, a controllable generation task which requires reasoning about the expected reaction of the listener.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences
Authors:
Denis Emelin,
Ronan Le Bras,
Jena D. Hwang,
Maxwell Forbes,
Ye** Choi
Abstract:
In social settings, much of human behavior is governed by unspoken rules of conduct. For artificial systems to be fully integrated into social environments, adherence to such norms is a central prerequisite. We investigate whether contemporary NLG models can function as behavioral priors for systems deployed in social settings by generating action hypotheses that achieve predefined goals under mor…
▽ More
In social settings, much of human behavior is governed by unspoken rules of conduct. For artificial systems to be fully integrated into social environments, adherence to such norms is a central prerequisite. We investigate whether contemporary NLG models can function as behavioral priors for systems deployed in social settings by generating action hypotheses that achieve predefined goals under moral constraints. Moreover, we examine if models can anticipate likely consequences of (im)moral actions, or explain why certain actions are preferable by generating relevant norms. For this purpose, we introduce 'Moral Stories', a crowd-sourced dataset of structured, branching narratives for the study of grounded, goal-oriented social reasoning. Finally, we propose decoding strategies that effectively combine multiple expert models to significantly improve the quality of generated actions, consequences, and norms compared to strong baselines, e.g. though abductive reasoning.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Edited Media Understanding: Reasoning About Implications of Manipulated Images
Authors:
Jeff Da,
Maxwell Forbes,
Rowan Zellers,
Anthony Zheng,
Jena D. Hwang,
Antoine Bosselut,
Ye** Choi
Abstract:
Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems.…
▽ More
Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems.
We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. We introduce a dataset for our task, EMU, with 48k question-answer pairs written in rich natural language. We evaluate a wide variety of vision-and-language models for our task, and introduce a new model PELICAN, which builds upon recent progress in pretrained multimodal representations. Our model obtains promising results on our dataset, with humans rating its answers as accurate 40.35% of the time. At the same time, there is still much work to be done -- humans prefer human-annotated captions 93.56% of the time -- and we provide analysis that highlights areas for further progress.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Social Chemistry 101: Learning to Reason about Social and Moral Norms
Authors:
Maxwell Forbes,
Jena D. Hwang,
Vered Shwartz,
Maarten Sap,
Ye** Choi
Abstract:
Social norms -- the unspoken commonsense rules about acceptable social behavior -- are crucial in understanding the underlying causes and intents of people's actions in narratives. For example, underlying an action such as "wanting to call cops on my neighbors" are social norms that inform our conduct, such as "It is expected that you report crimes."
We present Social Chemistry, a new conceptual…
▽ More
Social norms -- the unspoken commonsense rules about acceptable social behavior -- are crucial in understanding the underlying causes and intents of people's actions in narratives. For example, underlying an action such as "wanting to call cops on my neighbors" are social norms that inform our conduct, such as "It is expected that you report crimes."
We present Social Chemistry, a new conceptual formalism to study people's everyday social norms and moral judgments over a rich spectrum of real life situations described in natural language. We introduce Social-Chem-101, a large-scale corpus that catalogs 292k rules-of-thumb such as "it is rude to run a blender at 5am" as the basic conceptual units. Each rule-of-thumb is further broken down with 12 different dimensions of people's judgments, including social judgments of good and bad, moral foundations, expected cultural pressure, and assumed legality, which together amount to over 4.5 million annotations of categorical labels and free-text descriptions.
Comprehensive empirical results based on state-of-the-art neural models demonstrate that computational modeling of social norms is a promising research direction. Our model framework, Neural Norm Transformer, learns and generalizes Social-Chem-101 to successfully reason about previously unseen situations, generating relevant (and potentially novel) attribute-aware social rules-of-thumb.
△ Less
Submitted 16 August, 2021; v1 submitted 1 November, 2020;
originally announced November 2020.
-
Paragraph-level Commonsense Transformers with Recurrent Memory
Authors:
Saadia Gabriel,
Chandra Bhagavatula,
Vered Shwartz,
Ronan Le Bras,
Maxwell Forbes,
Ye** Choi
Abstract:
Human understanding of narrative texts requires making commonsense inferences beyond what is stated explicitly in the text. A recent model, COMET, can generate such implicit commonsense inferences along several dimensions such as pre- and post-conditions, motivations, and mental states of the participants. However, COMET was trained on commonsense inferences of short phrases, and is therefore disc…
▽ More
Human understanding of narrative texts requires making commonsense inferences beyond what is stated explicitly in the text. A recent model, COMET, can generate such implicit commonsense inferences along several dimensions such as pre- and post-conditions, motivations, and mental states of the participants. However, COMET was trained on commonsense inferences of short phrases, and is therefore discourse-agnostic. When presented with each sentence of a multi-sentence narrative, it might generate inferences that are inconsistent with the rest of the narrative.
We present the task of discourse-aware commonsense inference. Given a sentence within a narrative, the goal is to generate commonsense inferences along predefined dimensions, while maintaining coherence with the rest of the narrative. Such large-scale paragraph-level annotation is hard to get and costly, so we use available sentence-level annotations to efficiently and automatically construct a distantly supervised corpus.
Using this corpus, we train PARA-COMET, a discourse-aware model that incorporates paragraph-level information to generate coherent commonsense inferences from narratives. PARA-COMET captures both semantic knowledge pertaining to prior world knowledge, and episodic knowledge involving how current events relate to prior and future events in a narrative. Our results show that PARA-COMET outperforms the sentence-level baselines, particularly in generating inferences that are both coherent and novel.
△ Less
Submitted 2 February, 2021; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Optimal PID and Antiwindup Control Design as a Reinforcement Learning Problem
Authors:
Nathan P. Lawrence,
Gregory E. Stewart,
Philip D. Loewen,
Michael G. Forbes,
Johan U. Backstrom,
R. Bhushan Gopaluni
Abstract:
Deep reinforcement learning (DRL) has seen several successful applications to process control. Common methods rely on a deep neural network structure to model the controller or process. With increasingly complicated control structures, the closed-loop stability of such methods becomes less clear. In this work, we focus on the interpretability of DRL control methods. In particular, we view linear f…
▽ More
Deep reinforcement learning (DRL) has seen several successful applications to process control. Common methods rely on a deep neural network structure to model the controller or process. With increasingly complicated control structures, the closed-loop stability of such methods becomes less clear. In this work, we focus on the interpretability of DRL control methods. In particular, we view linear fixed-structure controllers as shallow neural networks embedded in the actor-critic framework. PID controllers guide our development due to their simplicity and acceptance in industrial practice. We then consider input saturation, leading to a simple nonlinear control structure. In order to effectively operate within the actuator limits we then incorporate a tuning parameter for anti-windup compensation. Finally, the simplicity of the controller allows for straightforward initialization. This makes our method inherently stabilizing, both during and after training, and amenable to known operational PID gains.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
Reinforcement Learning based Design of Linear Fixed Structure Controllers
Authors:
Nathan P. Lawrence,
Gregory E. Stewart,
Philip D. Loewen,
Michael G. Forbes,
Johan U. Backstrom,
R. Bhushan Gopaluni
Abstract:
Reinforcement learning has been successfully applied to the problem of tuning PID controllers in several applications. The existing methods often utilize function approximation, such as neural networks, to update the controller parameters at each time-step of the underlying process. In this work, we present a simple finite-difference approach, based on random search, to tuning linear fixed-structu…
▽ More
Reinforcement learning has been successfully applied to the problem of tuning PID controllers in several applications. The existing methods often utilize function approximation, such as neural networks, to update the controller parameters at each time-step of the underlying process. In this work, we present a simple finite-difference approach, based on random search, to tuning linear fixed-structure controllers. For clarity and simplicity, we focus on PID controllers. Our algorithm operates on the entire closed-loop step response of the system and iteratively improves the PID gains towards a desired closed-loop response. This allows for embedding stability requirements into the reward function without any modeling procedures.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
Neural Naturalist: Generating Fine-Grained Image Comparisons
Authors:
Maxwell Forbes,
Christine Kaeser-Chen,
Piyush Sharma,
Serge Belongie
Abstract:
We introduce the new Birds-to-Words dataset of 41k sentences describing fine-grained differences between photographs of birds. The language collected is highly detailed, while remaining understandable to the everyday observer (e.g., "heart-shaped face," "squat body"). Paragraph-length descriptions naturally adapt to varying levels of taxonomic and visual distance---drawn from a novel stratified sa…
▽ More
We introduce the new Birds-to-Words dataset of 41k sentences describing fine-grained differences between photographs of birds. The language collected is highly detailed, while remaining understandable to the everyday observer (e.g., "heart-shaped face," "squat body"). Paragraph-length descriptions naturally adapt to varying levels of taxonomic and visual distance---drawn from a novel stratified sampling approach---with the appropriate level of detail. We propose a new model called Neural Naturalist that uses a joint image encoding and comparative module to generate comparative language, and evaluate the results with humans who must use the descriptions to distinguish real images.
Our results indicate promising potential for neural models to explain differences in visual embedding space using natural language, as well as a concrete path for machine learning to aid citizen scientists in their effort to preserve biodiversity.
△ Less
Submitted 13 November, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Do Neural Language Representations Learn Physical Commonsense?
Authors:
Maxwell Forbes,
Ari Holtzman,
Ye** Choi
Abstract:
Humans understand language based on the rich background knowledge about how the physical world works, which in turn allows us to reason about the physical world through language. In addition to the properties of objects (e.g., boats require fuel) and their affordances, i.e., the actions that are applicable to them (e.g., boats can be driven), we can also reason about if-then inferences between wha…
▽ More
Humans understand language based on the rich background knowledge about how the physical world works, which in turn allows us to reason about the physical world through language. In addition to the properties of objects (e.g., boats require fuel) and their affordances, i.e., the actions that are applicable to them (e.g., boats can be driven), we can also reason about if-then inferences between what properties of objects imply the kind of actions that are applicable to them (e.g., that if we can drive something then it likely requires fuel).
In this paper, we investigate the extent to which state-of-the-art neural language representations, trained on a vast amount of natural language text, demonstrate physical commonsense reasoning. While recent advancements of neural language models have demonstrated strong performance on various types of natural language inference tasks, our study based on a dataset of over 200k newly collected annotations suggests that neural language representations still only learn associations that are explicitly written down.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
The Curious Case of Neural Text Degeneration
Authors:
Ari Holtzman,
Jan Buys,
Li Du,
Maxwell Forbes,
Ye** Choi
Abstract:
Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads…
▽ More
Despite considerable advancements with deep neural language models, the enigma of neural text degeneration persists when these models are tested as text generators. The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive.
In this paper, we reveal surprising distributional differences between human text and machine text. In addition, we find that decoding strategies alone can dramatically effect the quality of machine text, even when generated from exactly the same neural language model. Our findings motivate Nucleus Sampling, a simple but effective method to draw the best out of neural generation. By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.
△ Less
Submitted 14 February, 2020; v1 submitted 22 April, 2019;
originally announced April 2019.
-
Pseudorandom Generators for Read-Once Branching Programs, in any Order
Authors:
Michael A. Forbes,
Zander Kelley
Abstract:
A central question in derandomization is whether randomized logspace (RL) equals deterministic logspace (L). To show that RL=L, it suffices to construct explicit pseudorandom generators (PRGs) that fool polynomial-size read-once (oblivious) branching programs (roBPs). Starting with the work of Nisan, pseudorandom generators with seed-length $O(\log^2 n)$ were constructed. Unfortunately, improving…
▽ More
A central question in derandomization is whether randomized logspace (RL) equals deterministic logspace (L). To show that RL=L, it suffices to construct explicit pseudorandom generators (PRGs) that fool polynomial-size read-once (oblivious) branching programs (roBPs). Starting with the work of Nisan, pseudorandom generators with seed-length $O(\log^2 n)$ were constructed. Unfortunately, improving on this seed-length in general has proven challenging and seems to require new ideas.
A recent line of inquiry has suggested focusing on a particular limitation of the existing PRGs, which is that they only fool roBPs when the variables are read in a particular known order, such as $x_1<\cdots<x_n$. In comparison, existentially one can obtain logarithmic seed-length for fooling the set of polynomial-size roBPs that read the variables under any fixed unknown permutation $x_{π(1)}<\cdots<x_{π(n)}$. While recent works have established novel PRGs in this setting for subclasses of roBPs, there were no known $n^{o(1)}$ seed-length explicit PRGs for general polynomial-size roBPs in this setting.
In this work, we follow the "bounded independence plus noise" paradigm of Haramaty, Lee and Viola, and give an improved analysis in the general roBP unknown-order setting. With this analysis we obtain an explicit PRG with seed-length $O(\log^3 n)$ for polynomial-size roBPs reading their bits in an unknown order. Plugging in a recent Fourier tail bound of Chattopadhyay, Hatami, Reingold, and Tal, we can obtain a $\widetilde{O}(\log^2 n)$ seed-length when the roBP is of constant width.
△ Less
Submitted 19 August, 2018;
originally announced August 2018.
-
Balancing Shared Autonomy with Human-Robot Communication
Authors:
Rosario Scalise,
Yonatan Bisk,
Maxwell Forbes,
Daqing Yi,
Ye** Choi,
Siddhartha Srinivasa
Abstract:
Robotic agents that share autonomy with a human should leverage human domain knowledge and account for their preferences when completing a task. This extra knowledge can dramatically improve plan efficiency and user-satisfaction, but these gains are lost if communicating with a robot is taxing and unnatural. In this paper, we show how viewing humanrobot language through the lens of shared autonomy…
▽ More
Robotic agents that share autonomy with a human should leverage human domain knowledge and account for their preferences when completing a task. This extra knowledge can dramatically improve plan efficiency and user-satisfaction, but these gains are lost if communicating with a robot is taxing and unnatural. In this paper, we show how viewing humanrobot language through the lens of shared autonomy explains the efficiency versus cognitive load trade-offs humans make when deciding how cooperative and explicit to make their instructions.
△ Less
Submitted 20 May, 2018;
originally announced May 2018.
-
Learning to Write with Cooperative Discriminators
Authors:
Ari Holtzman,
Jan Buys,
Maxwell Forbes,
Antoine Bosselut,
David Golub,
Ye** Choi
Abstract:
Recurrent Neural Networks (RNNs) are powerful autoregressive sequence models, but when used to generate natural language their output tends to be overly generic, repetitive, and self-contradictory. We postulate that the objective function optimized by RNN language models, which amounts to the overall perplexity of a text, is not expressive enough to capture the notion of communicative goals descri…
▽ More
Recurrent Neural Networks (RNNs) are powerful autoregressive sequence models, but when used to generate natural language their output tends to be overly generic, repetitive, and self-contradictory. We postulate that the objective function optimized by RNN language models, which amounts to the overall perplexity of a text, is not expressive enough to capture the notion of communicative goals described by linguistic principles such as Grice's Maxims. We propose learning a mixture of multiple discriminative models that can be used to complement the RNN generator and guide the decoding process. Human evaluation demonstrates that text generated by our system is preferred over that of baselines by a large margin and significantly enhances the overall coherence, style, and information content of the generated text.
△ Less
Submitted 15 May, 2018;
originally announced May 2018.
-
Spatial Isolation Implies Zero Knowledge Even in a Quantum World
Authors:
Alessandro Chiesa,
Michael A. Forbes,
Tom Gur,
Nicholas Spooner
Abstract:
Zero knowledge plays a central role in cryptography and complexity. The seminal work of Ben-Or et al. (STOC 1988) shows that zero knowledge can be achieved unconditionally for any language in NEXP, as long as one is willing to make a suitable physical assumption: if the provers are spatially isolated, then they can be assumed to be playing independent strategies. Quantum mechanics, however, tells…
▽ More
Zero knowledge plays a central role in cryptography and complexity. The seminal work of Ben-Or et al. (STOC 1988) shows that zero knowledge can be achieved unconditionally for any language in NEXP, as long as one is willing to make a suitable physical assumption: if the provers are spatially isolated, then they can be assumed to be playing independent strategies. Quantum mechanics, however, tells us that this assumption is unrealistic, because spatially-isolated provers could share a quantum entangled state and realize a non-local correlated strategy. The MIP* model captures this setting. In this work we study the following question: does spatial isolation still suffice to unconditionally achieve zero knowledge even in the presence of quantum entanglement? We answer this question in the affirmative: we prove that every language in NEXP has a 2-prover zero knowledge interactive proof that is sound against entangled provers; that is, NEXP \subseteq ZK-MIP*. Our proof consists of constructing a zero knowledge interactive PCP with a strong algebraic structure, and then lifting it to the MIP* model. This lifting relies on a new framework that builds on recent advances in low-degree testing against entangled strategies, and clearly separates classical and quantum tools. Our main technical contribution consists of develo** new algebraic techniques for obtaining unconditional zero knowledge; this includes a zero knowledge variant of the celebrated sumcheck protocol, a key building block in many probabilistic proof systems. A core component of our sumcheck protocol is a new algebraic commitment scheme, whose analysis relies on algebraic complexity theory.
△ Less
Submitted 5 March, 2018;
originally announced March 2018.
-
A PSPACE Construction of a Hitting Set for the Closure of Small Algebraic Circuits
Authors:
Michael A. Forbes,
Amir Shpilka
Abstract:
In this paper we study the complexity of constructing a hitting set for the closure of VP, the class of polynomials that can be infinitesimally approximated by polynomials that are computed by polynomial sized algebraic circuits, over the real or complex numbers. Specifically, we show that there is a PSPACE algorithm that given n,s,r in unary outputs a set of n-tuples over the rationals of size po…
▽ More
In this paper we study the complexity of constructing a hitting set for the closure of VP, the class of polynomials that can be infinitesimally approximated by polynomials that are computed by polynomial sized algebraic circuits, over the real or complex numbers. Specifically, we show that there is a PSPACE algorithm that given n,s,r in unary outputs a set of n-tuples over the rationals of size poly(n,s,r), with poly(n,s,r) bit complexity, that hits all n-variate polynomials of degree-r that are the limit of size-s algebraic circuits. Previously it was known that a random set of this size is a hitting set, but a construction that is certified to work was only known in EXPSPACE (or EXPH assuming the generalized Riemann hypothesis). As a corollary we get that a host of other algebraic problems such as Noether Normalization Lemma, can also be solved in PSPACE deterministically, where earlier only randomized algorithms and EXPSPACE algorithms (or EXPH assuming the generalized Riemann hypothesis) were known.
The proof relies on the new notion of a robust hitting set which is a set of inputs such that any nonzero polynomial that can be computed by a polynomial size algebraic circuit, evaluates to a not too small value on at least one element of the set. Proving the existence of such a robust hitting set is the main technical difficulty in the proof.
Our proof uses anti-concentration results for polynomials, basic tools from algebraic geometry and the existential theory of the reals.
△ Less
Submitted 28 December, 2017;
originally announced December 2017.
-
Verb Physics: Relative Physical Knowledge of Actions and Objects
Authors:
Maxwell Forbes,
Ye** Choi
Abstract:
Learning commonsense knowledge from natural language text is nontrivial due to reporting bias: people rarely state the obvious, e.g., "My house is bigger than me." However, while rarely stated explicitly, this trivial everyday knowledge does influence the way people talk about the world, which provides indirect clues to reason about the world. For example, a statement like, "Tyler entered his hous…
▽ More
Learning commonsense knowledge from natural language text is nontrivial due to reporting bias: people rarely state the obvious, e.g., "My house is bigger than me." However, while rarely stated explicitly, this trivial everyday knowledge does influence the way people talk about the world, which provides indirect clues to reason about the world. For example, a statement like, "Tyler entered his house" implies that his house is bigger than Tyler.
In this paper, we present an approach to infer relative physical knowledge of actions and objects along five dimensions (e.g., size, weight, and strength) from unstructured natural language text. We frame knowledge acquisition as joint inference over two closely related problems: learning (1) relative physical knowledge of object pairs and (2) physical implications of actions when applied to those object pairs. Empirical results demonstrate that it is possible to extract knowledge of actions and objects from language and that joint inference over different types of knowledge improves performance.
△ Less
Submitted 12 July, 2017; v1 submitted 12 June, 2017;
originally announced June 2017.
-
A Zero Knowledge Sumcheck and its Applications
Authors:
Alessandro Chiesa,
Michael A. Forbes,
Nicholas Spooner
Abstract:
Many seminal results in Interactive Proofs (IPs) use algebraic techniques based on low-degree polynomials, the study of which is pervasive in theoretical computer science. Unfortunately, known methods for endowing such proofs with zero knowledge guarantees do not retain this rich algebraic structure.
In this work, we develop algebraic techniques for obtaining zero knowledge variants of proof pro…
▽ More
Many seminal results in Interactive Proofs (IPs) use algebraic techniques based on low-degree polynomials, the study of which is pervasive in theoretical computer science. Unfortunately, known methods for endowing such proofs with zero knowledge guarantees do not retain this rich algebraic structure.
In this work, we develop algebraic techniques for obtaining zero knowledge variants of proof protocols in a way that leverages and preserves their algebraic structure. Our constructions achieve unconditional (perfect) zero knowledge in the Interactive Probabilistically Checkable Proof (IPCP) model of Kalai and Raz [KR08] (the prover first sends a PCP oracle, then the prover and verifier engage in an Interactive Proof in which the verifier may query the PCP).
Our main result is a zero knowledge variant of the sumcheck protocol [LFKN92] in the IPCP model. The sumcheck protocol is a key building block in many IPs, including the protocol for polynomial-space computation due to Shamir [Sha92], and the protocol for parallel computation due to Goldwasser, Kalai, and Rothblum [GKR15]. A core component of our result is an algebraic commitment scheme, whose hiding property is guaranteed by algebraic query complexity lower bounds [AW09,JKRS09]. This commitment scheme can then be used to considerably strengthen our previous work [BCFGRS16] that gives a sumcheck protocol with much weaker zero knowledge guarantees, itself using algebraic techniques based on algorithms for polynomial identity testing [RS05,BW04].
We demonstrate the applicability of our techniques by deriving zero knowledge variants of well-known protocols based on algebraic techniques, including the protocols of Shamir and of Goldwasser, Kalai, and Rothblum, as well as the protocol of Babai, Fortnow, and Lund [BFL91].
△ Less
Submitted 6 April, 2017;
originally announced April 2017.
-
Small hitting-sets for tiny arithmetic circuits or: How to turn bad designs into good
Authors:
Manindra Agrawal,
Michael Forbes,
Sumanta Ghosh,
Nitin Saxena
Abstract:
We show that if we can design poly($s$)-time hitting-sets for $Σ\wedge^aΣΠ^{O(\log s)}$ circuits of size $s$, where $a=ω(1)$ is arbitrarily small and the number of variables, or arity $n$, is $O(\log s)$, then we can derandomize blackbox PIT for general circuits in quasipolynomial time. This also establishes that either E$\not\subseteq$\#P/poly or that VP$\ne$VNP. In fact, we show that one only ne…
▽ More
We show that if we can design poly($s$)-time hitting-sets for $Σ\wedge^aΣΠ^{O(\log s)}$ circuits of size $s$, where $a=ω(1)$ is arbitrarily small and the number of variables, or arity $n$, is $O(\log s)$, then we can derandomize blackbox PIT for general circuits in quasipolynomial time. This also establishes that either E$\not\subseteq$\#P/poly or that VP$\ne$VNP. In fact, we show that one only needs a poly($s$)-time hitting-set against individual-degree $a'=ω(1)$ polynomials that are computable by a size-$s$ arity-$(\log s)$ $ΣΠΣ$ circuit (note: $Π$ fanin may be $s$). Alternatively, we claim that, to understand VP one only needs to find hitting-sets, for depth-$3$, that have a small parameterized complexity. Another tiny family of interest is when we restrict the arity $n=ω(1)$ to be arbitrarily small. We show that if we can design poly($s,μ(n)$)-time hitting-sets for size-$s$ arity-$n$ $ΣΠΣ\wedge$ circuits (resp.~$Σ\wedge^aΣΠ$), where function $μ$ is arbitrary, then we can solve PIT for VP in quasipoly-time, and prove the corresponding lower bounds. Our methods are strong enough to prove a surprising {\em arity reduction} for PIT-- to solve the general problem completely it suffices to find a blackbox PIT with time-complexity $sd2^{O(n)}$. We give several examples of ($\log s$)-variate circuits where a new measure (called cone-size) helps in devising poly-time hitting-sets, but the same question for their $s$-variate versions is open till date: For eg., diagonal depth-$3$ circuits, and in general, models that have a {\em small} partial derivative space. We also introduce a new concept, called cone-closed basis isolation, and provide example models where it occurs, or can be achieved by a small shift.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
Succinct Hitting Sets and Barriers to Proving Algebraic Circuits Lower Bounds
Authors:
Michael A. Forbes,
Amir Shpilka,
Ben Lee Volk
Abstract:
We formalize a framework of algebraically natural lower bounds for algebraic circuits. Just as with the natural proofs notion of Razborov and Rudich for boolean circuit lower bounds, our notion of algebraically natural lower bounds captures nearly all lower bound techniques known. However, unlike the boolean setting, there has been no concrete evidence demonstrating that this is a barrier to obtai…
▽ More
We formalize a framework of algebraically natural lower bounds for algebraic circuits. Just as with the natural proofs notion of Razborov and Rudich for boolean circuit lower bounds, our notion of algebraically natural lower bounds captures nearly all lower bound techniques known. However, unlike the boolean setting, there has been no concrete evidence demonstrating that this is a barrier to obtaining super-polynomial lower bounds for general algebraic circuits, as there is little understanding whether algebraic circuits are expressive enough to support "cryptography" secure against algebraic circuits.
Following a similar result of Williams in the boolean setting, we show that the existence of an algebraic natural proofs barrier is equivalent to the existence of succinct derandomization of the polynomial identity testing problem. That is, whether the coefficient vectors of polylog(N)-degree polylog(N)-size circuits is a hitting set for the class of poly(N)-degree poly(N)-size circuits. Further, we give an explicit universal construction showing that if such a succinct hitting set exists, then our universal construction suffices.
Further, we assess the existing literature constructing hitting sets for restricted classes of algebraic circuits and observe that none of them are succinct as given. Yet, we show how to modify some of these constructions to obtain succinct hitting sets. This constitutes the first evidence supporting the existence of an algebraic natural proofs barrier.
Our framework is similar to the Geometric Complexity Theory (GCT) program of Mulmuley and Sohoni, except that here we emphasize constructiveness of the proofs while the GCT program emphasizes symmetry. Nevertheless, our succinct hitting sets have relevance to the GCT program as they imply lower bounds for the complexity of the defining equations of polynomials computed by small circuits.
△ Less
Submitted 22 July, 2018; v1 submitted 19 January, 2017;
originally announced January 2017.
-
On Probabilistic Checking in Perfect Zero Knowledge
Authors:
Eli Ben-Sasson,
Alessandro Chiesa,
Michael A. Forbes,
Ariel Gabizon,
Michael Riabzev,
Nicholas Spooner
Abstract:
We present the first constructions of single-prover proof systems that achieve perfect zero knowledge (PZK) for languages beyond NP, under no intractability assumptions:
1. The complexity class #P has PZK proofs in the model of Interactive PCPs (IPCPs) [KR08], where the verifier first receives from the prover a PCP and then engages with the prover in an Interactive Proof (IP).
2. The complexit…
▽ More
We present the first constructions of single-prover proof systems that achieve perfect zero knowledge (PZK) for languages beyond NP, under no intractability assumptions:
1. The complexity class #P has PZK proofs in the model of Interactive PCPs (IPCPs) [KR08], where the verifier first receives from the prover a PCP and then engages with the prover in an Interactive Proof (IP).
2. The complexity class NEXP has PZK proofs in the model of Interactive Oracle Proofs (IOPs) [BCS16,RRR16], where the verifier, in every round of interaction, receives a PCP from the prover.
Our constructions rely on succinct simulators that enable us to "simulate beyond NP", achieving exponential savings in efficiency over [BCGV16]. These simulators crucially rely on solving a problem that lies at the intersection of coding theory, linear algebra, and computational complexity, which we call the succinct constraint detection problem, and consists of detecting dual constraints with polynomial support size for codes of exponential block length. Our two results rely on solutions to this problem for fundamental classes of linear codes:
* An algorithm to detect constraints for Reed--Muller codes of exponential length.
* An algorithm to detect constraints for PCPs of Proximity of Reed--Solomon codes [BS08] of exponential degree.
The first algorithm exploits the Raz--Shpilka [RS05] deterministic polynomial identity testing algorithm, and shows, to our knowledge, a first connection of algebraic complexity theory with zero knowledge. Along the way, we give a perfect zero knowledge analogue of the celebrated sumcheck protocol [LFKN92], by leveraging both succinct constraint detection and low-degree testing. The second algorithm exploits the recursive structure of the PCPs of Proximity to show that small-support constraints are "locally" spanned by a small number of small-support constraints.
△ Less
Submitted 12 October, 2016;
originally announced October 2016.
-
Proof Complexity Lower Bounds from Algebraic Circuit Complexity
Authors:
Michael A. Forbes,
Amir Shpilka,
Iddo Tzameret,
Avi Wigderson
Abstract:
We give upper and lower bounds on the power of subsystems of the Ideal Proof System (IPS), the algebraic proof system recently proposed by Grochow and Pitassi, where the circuits comprising the proof come from various restricted algebraic circuit classes. This mimics an established research direction in the boolean setting for subsystems of Extended Frege proofs, where proof-lines are circuits fro…
▽ More
We give upper and lower bounds on the power of subsystems of the Ideal Proof System (IPS), the algebraic proof system recently proposed by Grochow and Pitassi, where the circuits comprising the proof come from various restricted algebraic circuit classes. This mimics an established research direction in the boolean setting for subsystems of Extended Frege proofs, where proof-lines are circuits from restricted boolean circuit classes. Except one, all of the subsystems considered in this paper can simulate the well-studied Nullstellensatz proof system, and prior to this work there were no known lower bounds when measuring proof size by the algebraic complexity of the polynomials (except with respect to degree, or to sparsity).
We give two general methods of converting certain algebraic lower bounds into proof complexity ones. Our methods require stronger notions of lower bounds, which lower bound a polynomial as well as an entire family of polynomials it defines. Our techniques are reminiscent of existing methods for converting boolean circuit lower bounds into related proof complexity results, such as feasible interpolation. We obtain the relevant types of lower bounds for a variety of classes (sparse polynomials, depth-3 powering formulas, read-once oblivious algebraic branching programs, and multilinear formulas), and infer the relevant proof complexity results. We complement our lower bounds by giving short refutations of the previously-studied subset-sum axiom using IPS subsystems, allowing us to conclude strict separations between some of these subsystems.
△ Less
Submitted 16 June, 2016;
originally announced June 2016.
-
Functional lower bounds for arithmetic circuits and connections to boolean circuit complexity
Authors:
Michael A. Forbes,
Mrinal Kumar,
Ramprasad Saptharishi
Abstract:
We say that a circuit $C$ over a field $F$ functionally computes an $n$-variate polynomial $P$ if for every $x \in \{0,1\}^n$ we have that $C(x) = P(x)$. This is in contrast to syntactically computing $P$, when $C \equiv P$ as formal polynomials. In this paper, we study the question of proving lower bounds for homogeneous depth-$3$ and depth-$4$ arithmetic circuits for functional computation. We p…
▽ More
We say that a circuit $C$ over a field $F$ functionally computes an $n$-variate polynomial $P$ if for every $x \in \{0,1\}^n$ we have that $C(x) = P(x)$. This is in contrast to syntactically computing $P$, when $C \equiv P$ as formal polynomials. In this paper, we study the question of proving lower bounds for homogeneous depth-$3$ and depth-$4$ arithmetic circuits for functional computation. We prove the following results :
1. Exponential lower bounds homogeneous depth-$3$ arithmetic circuits for a polynomial in $VNP$.
2. Exponential lower bounds for homogeneous depth-$4$ arithmetic circuits with bounded individual degree for a polynomial in $VNP$.
Our main motivation for this line of research comes from our observation that strong enough functional lower bounds for even very special depth-$4$ arithmetic circuits for the Permanent imply a separation between ${\#}P$ and $ACC$. Thus, improving the second result to get rid of the bounded individual degree condition could lead to substantial progress in boolean circuit complexity. Besides, it is known from a recent result of Kumar and Saptharishi [KS15] that over constant sized finite fields, strong enough average case functional lower bounds for homogeneous depth-$4$ circuits imply superpolynomial lower bounds for homogeneous depth-$5$ circuits.
Our proofs are based on a family of new complexity measures called shifted evaluation dimension, and might be of independent interest.
△ Less
Submitted 13 May, 2016;
originally announced May 2016.
-
Identity Testing and Lower Bounds for Read-$k$ Oblivious Algebraic Branching Programs
Authors:
Matthew Anderson,
Michael A. Forbes,
Ramprasad Saptharishi,
Amir Shpilka,
Ben Lee Volk
Abstract:
Read-$k$ oblivious algebraic branching programs are a natural generalization of the well-studied model of read-once oblivious algebraic branching program (ROABPs). In this work, we give an exponential lower bound of $\exp(n/k^{O(k)})$ on the width of any read-$k$ oblivious ABP computing some explicit multilinear polynomial $f$ that is computed by a polynomial size depth-$3$ circuit. We also study…
▽ More
Read-$k$ oblivious algebraic branching programs are a natural generalization of the well-studied model of read-once oblivious algebraic branching program (ROABPs). In this work, we give an exponential lower bound of $\exp(n/k^{O(k)})$ on the width of any read-$k$ oblivious ABP computing some explicit multilinear polynomial $f$ that is computed by a polynomial size depth-$3$ circuit. We also study the polynomial identity testing (PIT) problem for this model and obtain a white-box subexponential-time PIT algorithm. The algorithm runs in time $2^{\tilde{O}(n^{1-1/2^{k-1}})}$ and needs white box access only to know the order in which the variables appear in the ABP.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.
-
Dimension Expanders via Rank Condensers
Authors:
Michael A. Forbes,
Venkatesan Guruswami
Abstract:
An emerging theory of "linear-algebraic pseudorandomness" aims to understand the linear-algebraic analogs of fundamental Boolean pseudorandom objects where the rank of subspaces plays the role of the size of subsets. In this work, we study and highlight the interrelationships between several such algebraic objects such as subspace designs, dimension expanders, seeded rank condensers, two-source ra…
▽ More
An emerging theory of "linear-algebraic pseudorandomness" aims to understand the linear-algebraic analogs of fundamental Boolean pseudorandom objects where the rank of subspaces plays the role of the size of subsets. In this work, we study and highlight the interrelationships between several such algebraic objects such as subspace designs, dimension expanders, seeded rank condensers, two-source rank condensers, and rank-metric codes. In particular, with the recent construction of near-optimal subspace designs by Guruswami and Kopparty as a starting point, we construct good (seeded) rank condensers (both lossless and lossy versions), which are a small collection of linear maps $\mathbb{F}^n \to \mathbb{F}^t$ for $t \ll n$ such that for every subset of $\mathbb{F}^n$ of small rank, its rank is preserved (up to a constant factor in the lossy case) by at least one of the maps.
We then compose a tensoring operation with our lossy rank condenser to construct constant-degree dimension expanders over polynomially large fields. That is, we give $O(1)$ explicit linear maps $A_i:\mathbb{F}^n\to \mathbb{F}^n$ such that for any subspace $V \subseteq \mathbb{F}^n$ of dimension at most $n/2$, $\dim\bigl( \sum_i A_i(V)\bigr) \ge (1+Ω(1)) \dim(V)$. Previous constructions of such constant-degree dimension expanders were based on Kazhdan's property $T$ (for the case when $\mathbb{F}$ has characteristic zero) or monotone expanders (for every field $\mathbb{F}$); in either case the construction was harder than that of usual vertex expanders. Our construction, on the other hand, is simpler.
Via an equivalence to linear rank-metric codes, we then construct optimal lossless two-source condensers. We then use our seeded rank condensers to obtain near-optimal lossy two-source condensers for constant rank sources.
△ Less
Submitted 26 November, 2014;
originally announced November 2014.
-
Pseudorandomness for Multilinear Read-Once Algebraic Branching Programs, in any Order
Authors:
Michael A. Forbes,
Ramprasad Saptharishi,
Amir Shpilka
Abstract:
We give deterministic black-box polynomial identity testing algorithms for multilinear read-once oblivious algebraic branching programs (ROABPs), in n^(lg^2 n) time. Further, our algorithm is oblivious to the order of the variables. This is the first sub-exponential time algorithm for this model. Furthermore, our result has no known analogue in the model of read-once oblivious boolean branching pr…
▽ More
We give deterministic black-box polynomial identity testing algorithms for multilinear read-once oblivious algebraic branching programs (ROABPs), in n^(lg^2 n) time. Further, our algorithm is oblivious to the order of the variables. This is the first sub-exponential time algorithm for this model. Furthermore, our result has no known analogue in the model of read-once oblivious boolean branching programs with unknown order, as despite recent work there is no known pseudorandom generator for this model with sub-polynomial seed-length (for unbounded-width branching programs).
This result extends and generalizes the result of Forbes and Shpilka that obtained a n^(lg n)-time algorithm when given the order. We also extend and strengthen the work of Agrawal, Saha and Saxena that gave a black-box algorithm running in time exp((lg n)^d) for set-multilinear formulas of depth d. We note that the model of multilinear ROABPs contains the model of set-multilinear algebraic branching programs, which itself contains the model of set-multilinear formulas of arbitrary depth. We obtain our results by recasting, and improving upon, the ideas of Agrawal, Saha and Saxena. We phrase the ideas in terms of rank condensers and Wronskians, and show that our results improve upon the classical multivariate Wronskian, which may be of independent interest.
In addition, we give the first n^(lglg n) black-box polynomial identity testing algorithm for the so called model of diagonal circuits. This model, introduced by Saxena has recently found applications in the work of Mulmuley, as well as in the work of Gupta, Kamath, Kayal, Saptharishi. Previously work had given n^(lg n)-time algorithms for this class. More generally, our result holds for any model computing polynomials whose partial derivatives (of all orders) span a low dimensional linear space.
△ Less
Submitted 22 September, 2013;
originally announced September 2013.
-
On the Locality of Codeword Symbols in Non-Linear Codes
Authors:
Michael Forbes,
Sergey Yekhanin
Abstract:
Consider a possibly non-linear (n,K,d)_q code. Coordinate i has locality r if its value is determined by some r other coordinates. A recent line of work obtained an optimal trade-off between information locality of codes and their redundancy. Further, for linear codes meeting this trade-off, structure theorems were derived. In this work we give a new proof of the locality / redundancy trade-off an…
▽ More
Consider a possibly non-linear (n,K,d)_q code. Coordinate i has locality r if its value is determined by some r other coordinates. A recent line of work obtained an optimal trade-off between information locality of codes and their redundancy. Further, for linear codes meeting this trade-off, structure theorems were derived. In this work we give a new proof of the locality / redundancy trade-off and generalize structure theorems to non-linear codes.
△ Less
Submitted 15 March, 2013;
originally announced March 2013.
-
Explicit Noether Normalization for Simultaneous Conjugation via Polynomial Identity Testing
Authors:
Michael A. Forbes,
Amir Shpilka
Abstract:
Mulmuley recently gave an explicit version of Noether's Normalization lemma for ring of invariants of matrices under simultaneous conjugation, under the conjecture that there are deterministic black-box algorithms for polynomial identity testing (PIT). He argued that this gives evidence that constructing such algorithms for PIT is beyond current techniques. In this work, we show this is not the ca…
▽ More
Mulmuley recently gave an explicit version of Noether's Normalization lemma for ring of invariants of matrices under simultaneous conjugation, under the conjecture that there are deterministic black-box algorithms for polynomial identity testing (PIT). He argued that this gives evidence that constructing such algorithms for PIT is beyond current techniques. In this work, we show this is not the case. That is, we improve Mulmuley's reduction and correspondingly weaken the conjecture regarding PIT needed to give explicit Noether Normalization. We then observe that the weaker conjecture has recently been nearly settled by the authors, who gave quasipolynomial size hitting sets for the class of read-once oblivious algebraic branching programs (ROABPs). This gives the desired explicit Noether Normalization unconditionally, up to quasipolynomial factors.
As a consequence of our proof we give a deterministic parallel polynomial-time algorithm for deciding if two matrix tuples have intersecting orbit closures, under simultaneous conjugation.
We also study the strength of conjectures that Mulmuley requires to obtain similar results as ours. We prove that his conjectures are stronger, in the sense that the computational model he needs PIT algorithms for is equivalent to the well-known algebraic branching program (ABP) model, which is provably stronger than the ROABP model.
Finally, we consider the depth-3 diagonal circuit model as defined by Saxena, as PIT algorithms for this model also have implications in Mulmuley's work. Previous work have given quasipolynomial size hitting sets for this model. In this work, we give a much simpler construction of such hitting sets, using techniques of Shpilka and Volkovich.
△ Less
Submitted 8 March, 2013; v1 submitted 28 February, 2013;
originally announced March 2013.
-
Quasipolynomial-time Identity Testing of Non-Commutative and Read-Once Oblivious Algebraic Branching Programs
Authors:
Michael A. Forbes,
Amir Shpilka
Abstract:
We study the problem of obtaining deterministic black-box polynomial identity testing algorithms (PIT) for algebraic branching programs (ABPs) that are read-once and oblivious. This class has an deterministic white-box polynomial identity testing algorithm (due to Raz and Shpilka), but prior to this work there was no known such black-box algorithm.
The main result of this work gives the first qu…
▽ More
We study the problem of obtaining deterministic black-box polynomial identity testing algorithms (PIT) for algebraic branching programs (ABPs) that are read-once and oblivious. This class has an deterministic white-box polynomial identity testing algorithm (due to Raz and Shpilka), but prior to this work there was no known such black-box algorithm.
The main result of this work gives the first quasi-polynomial sized hitting sets for size S circuits from this class, when the order of the variables is known. As our hitting set is of size exp(lg^2 S), this is analogous (in the terminology of boolean pseudorandomness) to a seed-length of lg^2 S, which is the seed length of the pseudorandom generators of Nisan and Impagliazzo-Nisan-Wigderson for read-once oblivious boolean branching programs.
Our results are stronger for branching programs of bounded width, where we give a hitting set of size exp(lg^2 S/lglg S), corresponding to a seed length of lg^2 S/lglg S. This is in stark contrast to the known results for read-once oblivious boolean branching programs of bounded width, where no pseudorandom generator (or hitting set) with seed length o(lg^2 S) is known.
In follow up work, we strengthened a result of Mulmuley, and showed that derandomizing a particular case of the Noether Normalization Lemma is reducible to black-box PIT of read-once oblivious ABPs. Using the results of the present work, this gives a derandomization of Noether Normalization in that case, which Mulmuley conjectured would difficult due to its relations to problems in algebraic geometry.
We also show that several other circuit classes can be black-box reduced to read-once oblivious ABPs, including set-multilinear ABPs (a generalization of depth-3 set-multilinear formulas), non-commutative ABPs (generalizing non-commutative formulas), and (semi-)diagonal depth-4 circuits (as introduced by Saxena).
△ Less
Submitted 22 September, 2013; v1 submitted 11 September, 2012;
originally announced September 2012.
-
On Identity Testing of Tensors, Low-rank Recovery and Compressed Sensing
Authors:
Michael A. Forbes,
Amir Shpilka
Abstract:
We study the problem of obtaining efficient, deterministic, black-box polynomial identity testing algorithms for depth-3 set-multilinear circuits (over arbitrary fields). This class of circuits has an efficient, deterministic, white-box polynomial identity testing algorithm (due to Raz and Shpilka), but has no known such black-box algorithm. We recast this problem as a question of finding a low-di…
▽ More
We study the problem of obtaining efficient, deterministic, black-box polynomial identity testing algorithms for depth-3 set-multilinear circuits (over arbitrary fields). This class of circuits has an efficient, deterministic, white-box polynomial identity testing algorithm (due to Raz and Shpilka), but has no known such black-box algorithm. We recast this problem as a question of finding a low-dimensional subspace H, spanned by rank 1 tensors, such that any non-zero tensor in the dual space ker(H) has high rank. We obtain explicit constructions of essentially optimal-size hitting sets for tensors of degree 2 (matrices), and obtain quasi-polynomial sized hitting sets for arbitrary tensors (but this second hitting set is less explicit).
We also show connections to the task of performing low-rank recovery of matrices, which is studied in the field of compressed sensing. Low-rank recovery asks (say, over the reals) to recover a matrix M from few measurements, under the promise that M is rank <=r. We also give a formal connection between low-rank recovery and the task of sparse (vector) recovery: any sparse-recovery algorithm that exactly recovers vectors of length n and sparsity 2r, using m non-adaptive measurements, yields a low-rank recovery scheme for exactly recovering nxn matrices of rank <=r, making 2nm non-adaptive measurements. Furthermore, if the sparse-recovery algorithm runs in time τ, then the low-rank recovery algorithm runs in time O(rn^2+nτ). We obtain this reduction using linear-algebraic techniques, and not using convex optimization, which is more commonly seen in compressed sensing algorithms. By using a dual Reed-Solomon code, we are able to (deterministically) construct low-rank recovery schemes taking 4nr measurements over the reals, such that the measurements can be all rank-1 matrices, or all sparse matrices.
△ Less
Submitted 2 November, 2011;
originally announced November 2011.
-
Improved Soundness for QMA with Multiple Provers
Authors:
Alessandro Chiesa,
Michael A. Forbes
Abstract:
We present three contributions to the understanding of QMA with multiple provers:
1) We give a tight soundness analysis of the protocol of [Blier and Tapp, ICQNM '09], yielding a soundness gap Omega(1/N^2). Our improvement is achieved without the use of an instance with a constant soundness gap (i.e., without using a PCP).
2) We give a tight soundness analysis of the protocol of [Chen and Druc…
▽ More
We present three contributions to the understanding of QMA with multiple provers:
1) We give a tight soundness analysis of the protocol of [Blier and Tapp, ICQNM '09], yielding a soundness gap Omega(1/N^2). Our improvement is achieved without the use of an instance with a constant soundness gap (i.e., without using a PCP).
2) We give a tight soundness analysis of the protocol of [Chen and Drucker, ArXiV '10], thereby improving their result from a monolithic protocol where Theta(sqrt(N)) provers are needed in order to have any soundness gap, to a protocol with a smooth trade-off between the number of provers k and a soundness gap Omega(k^2/N), as long as k>=Omega(log N). (And, when k=Theta(sqrt(N)), we recover the original parameters of Chen and Drucker.)
3) We make progress towards an open question of [Aaronson et al., ToC '09] about what kinds of NP-complete problems are amenable to sublinear multiple-prover QMA protocols, by observing that a large class of such examples can easily be derived from results already in the PCP literature - namely, at least the languages recognized by a non-deterministic RAMs in quasilinear time.
△ Less
Submitted 30 January, 2013; v1 submitted 10 August, 2011;
originally announced August 2011.
-
Square root Bound on the Least Power Non-residue using a Sylvester-Vandermonde Determinant
Authors:
Michael Forbes,
Neeraj Kayal,
Rajat Mittal,
Chandan Saha
Abstract:
We give a new elementary proof of the fact that the value of the least $k^{th}$ power non-residue in an arithmetic progression $\{bn+c\}_{n=0,1...}$, over a prime field $\F_p$, is bounded by $7/\sqrt{5} \cdot b \cdot \sqrt{p/k} + 4b + c$. Our proof is inspired by the so called \emph{Stepanov method}, which involves bounding the size of the solution set of a system of equations by constructing a no…
▽ More
We give a new elementary proof of the fact that the value of the least $k^{th}$ power non-residue in an arithmetic progression $\{bn+c\}_{n=0,1...}$, over a prime field $\F_p$, is bounded by $7/\sqrt{5} \cdot b \cdot \sqrt{p/k} + 4b + c$. Our proof is inspired by the so called \emph{Stepanov method}, which involves bounding the size of the solution set of a system of equations by constructing a non-zero low degree auxiliary polynomial that vanishes with high multiplicity on the solution set. The proof uses basic algebra and number theory along with a determinant identity that generalizes both the Sylvester and the Vandermonde determinant.
△ Less
Submitted 23 April, 2011;
originally announced April 2011.
-
Tensor Rank: Some Lower and Upper Bounds
Authors:
Boris Alexeev,
Michael Forbes,
Jacob Tsimerman
Abstract:
The results of Strassen and Raz show that good enough tensor rank lower bounds have implications for algebraic circuit/formula lower bounds.
We explore tensor rank lower and upper bounds, focusing on explicit tensors. For odd d, we construct field-independent explicit 0/1 tensors T:[n]^d->F with rank at least 2n^(floor(d/2))+n-Theta(d log n). This matches (over F_2) or improves (all other fields…
▽ More
The results of Strassen and Raz show that good enough tensor rank lower bounds have implications for algebraic circuit/formula lower bounds.
We explore tensor rank lower and upper bounds, focusing on explicit tensors. For odd d, we construct field-independent explicit 0/1 tensors T:[n]^d->F with rank at least 2n^(floor(d/2))+n-Theta(d log n). This matches (over F_2) or improves (all other fields) known lower bounds for d=3 and improves (over any field) for odd d>3.
We also explore a generalization of permutation matrices, which we denote permutation tensors. We show, by counting, that there exists an order-3 permutation tensor with super-linear rank. We also explore a natural class of permutation tensors, which we call group tensors. For any group G, we define the group tensor T_G^d:G^d->F, by T_G^d(g_1,...,g_d)=1 iff g_1 x ... x g_d=1_G. We give two upper bounds for the rank of these tensors. The first uses representation theory and works over large fields F, showing (among other things) that rank_F(T_G^d)<= |G|^(d/2). We also show that if this upper bound is tight, then super-linear tensor rank lower bounds would follow. The second upper bound uses interpolation and only works for abelian G, showing that over any field F that rank_F(T_G^d)<= O(|G|^(1+log d)log^(d-1)|G|). In either case, this shows that many permutation tensors have far from maximal rank, which is very different from the matrix case and thus eliminates many natural candidates for high tensor rank.
We also explore monotone tensor rank. We give explicit 0/1 tensors T:[n]^d->F that have tensor rank at most dn but have monotone tensor rank exactly n^(d-1). This is a nearly optimal separation.
△ Less
Submitted 31 January, 2011;
originally announced February 2011.