Search | arXiv e-print repository

Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

Authors: Margaret Li, Weijia Shi, Artidoro Pagnoni, Peter West, Ari Holtzman

Abstract: RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base… ▽ More RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base LMs that RLHF adapts. Besides empirically demonstrating this trade-off, we propose a potential explanation: to perform coherent long-form generation, RLHF models restrict randomness via implicit blueprints. In particular, RLHF models concentrate probability on sets of anchor spans that co-occur across multiple generations for the same prompt, serving as textual scaffolding but also limiting a model's ability to generate documents that do not include these spans. We study this trade-off on the most effective current agent models, those aligned with RLHF, while exploring why this may remain a fundamental trade-off between models that act and those that predict, even as alignment techniques improve. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2403.13780 [pdf, other]

Information-Theoretic Distillation for Reference-less Summarization

Authors: Jaehun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Ye** Choi

Abstract: The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models. While increasingly ubiquitous dependence on such large-scale language models is convenient, there remains an important question of whether small-scale models could have achieved competitive results, if we were to se… ▽ More The current winning recipe for automatic summarization is using proprietary large-scale language models (LLMs) such as ChatGPT as is, or imitation learning from them as teacher models. While increasingly ubiquitous dependence on such large-scale language models is convenient, there remains an important question of whether small-scale models could have achieved competitive results, if we were to seek an alternative learning method -- that allows for a more cost-efficient, controllable, yet powerful summarizer. We present InfoSumm, a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization, without relying on either the LLM's capability or human-written references. To achieve this, we first propose a novel formulation of the desiderata of summarization (saliency, faithfulness and brevity) through the lens of mutual information between the original document and the summary. Based on this formulation, we start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization, then self-train the model to optimize for the information-centric measures of ideal summaries. Distilling from the improved teacher, we arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT, without ever relying on ChatGPT's capabilities. Extensive analysis demonstrates that our approach outperforms in-domain supervised models in human evaluation, let alone state-of-the-art unsupervised methods, and wins over ChatGPT in controllable summarization. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13453 [pdf, ps, other]

Memories of Abdus Salam and the early days of supersymmetry

Authors: Peter West

Abstract: I give an account of what it was like to be a PhD student of Abdus Salam and also to take part during the early stages of the development of supersymmetry. I give an account of what it was like to be a PhD student of Abdus Salam and also to take part during the early stages of the development of supersymmetry. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 11 pages. arXiv admin note: text overlap with arXiv:1609.06863

arXiv:2312.05979 [pdf, other]

NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation

Authors: Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Ye** Choi

Abstract: We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOME… ▽ More We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs. The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.04837 [pdf, other]

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Authors: Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Ye** Choi

Abstract: Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applica… ▽ More Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applications that require precise within-image reasoning. We build Localized Visual Commonsense models, which allow users to specify (multiple) regions as input. We train our model by sampling localized commonsense knowledge from a large language model (LLM): specifically, we prompt an LLM to collect commonsense knowledge given a global literal image description and a local literal region description automatically generated by a set of VL models. With a separately trained critic model that selects high-quality examples, we find that training on the localized commonsense corpus can successfully distill existing VL models to support a reference-as-input interface. Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM. △ Less

Submitted 12 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Neurips 2023

arXiv:2311.00059 [pdf, other]

The Generative AI Paradox: "What It Can Create, It May Not Understand"

Authors: Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Ye** Choi

Abstract: The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-exp… ▽ More The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today's generative models relative to intelligence in humans. Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs. This contrasts with humans, for whom basic understanding almost always precedes the ability to generate expert-level outputs. We test this hypothesis through controlled experiments analyzing generation vs. understanding in generative models, across both language and image modalities. Our results show that although models can outperform humans in generation, they consistently fall short of human capabilities in measures of understanding, as well as weaker correlation between generation and understanding performance, and more brittleness to adversarial inputs. Our findings support the hypothesis that models' generative capability may not be contingent upon understanding capability, and call for caution in interpreting artificial intelligence by analogy to human intelligence. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2309.00779 [pdf, other]

doi 10.1609/aaai.v38i18.29970

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Ye** Choi

Abstract: Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve A… ▽ More Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve AI systems to better reflect value pluralism, the first-order challenge is to explore the extent to which AI systems can model pluralistic human values, rights, and duties as well as their interaction. We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations. ValuePrism's contextualized values are generated by GPT-4 and deemed high-quality by human annotators 91% of the time. We conduct a large-scale study with annotators across diverse social and demographic backgrounds to try to understand whose values are represented. With ValuePrism, we build Kaleido, an open, light-weight, and structured language-based multi-task model that generates, explains, and assesses the relevance and valence (i.e., support or oppose) of human values, rights, and duties within a specific context. Humans prefer the sets of values output by our system over the teacher GPT-4, finding them more accurate and with broader coverage. In addition, we demonstrate that Kaleido can help explain variability in human decision-making by outputting contrasting values. Finally, we show that Kaleido's representations transfer to other philosophical frameworks and datasets, confirming the benefit of an explicit, modular, and interpretable approach to value pluralism. We hope that our work will serve as a step to making more explicit the implicit values behind human decision-making and to steering AI systems to make decisions that are more in accordance with them. △ Less

Submitted 2 April, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 38

Journal ref: Vol. 38 No. 18: AAAI-24 Technical Tracks 18; 2024; 19937-19947

arXiv:2308.00189 [pdf, other]

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

Authors: Ari Holtzman, Peter West, Luke Zettlemoyer

Abstract: Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP and is resha** how we interact with computers. What was once a scientific engineering discipline-in which building blocks are stacked one on top of the other-is arguably already a complex systems science, in which emergent behaviors are sought out to support previously unimagined use cases.… ▽ More Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP and is resha** how we interact with computers. What was once a scientific engineering discipline-in which building blocks are stacked one on top of the other-is arguably already a complex systems science, in which emergent behaviors are sought out to support previously unimagined use cases. Despite the ever increasing number of benchmarks that measure task performance, we lack explanations of what behaviors language models exhibit that allow them to complete these tasks in the first place. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance, to guide mechanistic explanations and help future-proof analytic research. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: 15 pages, 7 figures

arXiv:2306.00924 [pdf, other]

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

Authors: Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Ye** Choi, Yulia Tsvetkov

Abstract: Theory of Mind (ToM)$\unicode{x2014}$the ability to reason about the mental states of other people$\unicode{x2014}$is a key element of our social intelligence. Yet, despite their ever more impressive performance, large-scale neural language models still lack basic theory of mind capabilities out-of-the-box. We posit that simply scaling up models will not imbue them with theory of mind due to the i… ▽ More Theory of Mind (ToM)$\unicode{x2014}$the ability to reason about the mental states of other people$\unicode{x2014}$is a key element of our social intelligence. Yet, despite their ever more impressive performance, large-scale neural language models still lack basic theory of mind capabilities out-of-the-box. We posit that simply scaling up models will not imbue them with theory of mind due to the inherently symbolic and implicit nature of the phenomenon, and instead investigate an alternative: can we design a decoding-time algorithm that enhances theory of mind of off-the-shelf neural language models without explicit supervision? We present SymbolicToM, a plug-and-play approach to reason about the belief states of multiple characters in reading comprehension tasks via explicit symbolic representation. More concretely, our approach tracks each entity's beliefs, their estimation of other entities' beliefs, and higher-order levels of reasoning, all through graphical representations, allowing for more precise and interpretable reasoning than previous approaches. Empirical results on the well-known ToMi benchmark (Le et al., 2019) demonstrate that SymbolicToM dramatically enhances off-the-shelf neural networks' theory of mind in a zero-shot setting while showing robust out-of-distribution performance compared to supervised baselines. Our work also reveals spurious patterns in existing theory of mind benchmarks, emphasizing the importance of out-of-distribution evaluation and methods that do not overfit a particular dataset. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Journal ref: ACL 2023

arXiv:2305.18654 [pdf, other]

Faith and Fate: Limits of Transformers on Compositionality

Authors: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Ye** Choi

Abstract: Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the li… ▽ More Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily develo** systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with\,increased\,task\,complexity. △ Less

Submitted 31 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 10 pages + appendix (40 pages)

arXiv:2305.16635 [pdf, other]

Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing

Authors: Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Ye** Choi

Abstract: We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization, that distills a high-quality dataset and model from a low-quality teacher that itself cannot perform these tasks. Unlike prior works that rely on an extreme-scale teacher model (e.g., GPT3) or task-specific architecture, we hypothesize and verify the paraphrastic proximity intrinsic to pre-trained LM… ▽ More We present Impossible Distillation, a novel framework for paraphrasing and sentence summarization, that distills a high-quality dataset and model from a low-quality teacher that itself cannot perform these tasks. Unlike prior works that rely on an extreme-scale teacher model (e.g., GPT3) or task-specific architecture, we hypothesize and verify the paraphrastic proximity intrinsic to pre-trained LMs (e.g., GPT2), where paraphrases occupy a proximal subspace in the LM distribution. By identifying and distilling generations from these subspaces, Impossible Distillation produces a high-quality dataset and model even from GPT2-scale LMs. We evaluate our method on multiple benchmarks spanning unconstrained / syntax-controlled paraphrase generation and sentence summarization. Our model with 770M parameters consistently outperforms strong baselines, including models distilled from ChatGPT, and sometimes, even ChatGPT itself. Also, we find that our distilled dataset from 1.5B LMs exhibits higher diversity and fidelity than up to 13 times larger datasets. △ Less

Submitted 5 April, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: NAACL 2024

arXiv:2305.15065 [pdf, other]

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Authors: Ximing Lu, Faeze Brahman, Peter West, Jaehun Jang, Khyathi Chandu, Abhilasha Ravichander, Lianhui Qin, Prithviraj Ammanabrolu, Liwei Jiang, Sahana Ramnath, Nouha Dziri, Jillian Fisher, Bill Yuchen Lin, Skyler Hallinan, Xiang Ren, Sean Welleck, Ye** Choi

Abstract: While extreme-scale language models have demonstrated exceptional performance on a variety of language tasks, the degree of control over these language models through pure prompting can often be limited. Directly fine-tuning such language models can be effective for tailoring them, but it can be either extremely costly (e.g., GPT-3) or not even feasible for the broader community (e.g., GPT-4). W… ▽ More While extreme-scale language models have demonstrated exceptional performance on a variety of language tasks, the degree of control over these language models through pure prompting can often be limited. Directly fine-tuning such language models can be effective for tailoring them, but it can be either extremely costly (e.g., GPT-3) or not even feasible for the broader community (e.g., GPT-4). We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model such as GPT-3 without fine-tuning it. IPA guides a large base model during decoding time through a lightweight policy adapter trained to optimize an arbitrary user objective with reinforcement learning. On five challenging text generation tasks, such as toxicity reduction and lexically constrained generation, IPA consistently brings significant improvements over off-the-shelf language models. It outperforms competitive baseline methods, sometimes even including expensive fine-tuning. In particular, tailoring GPT-2 with IPA can outperform GPT-3, while tailoring GPT-3 with IPA brings a major performance boost over GPT-3 (and sometimes even over GPT-4). Our promising results highlight the potential of IPA as a lightweight alternative to tailoring extreme-scale language models. △ Less

Submitted 6 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: EMNLP 2023

arXiv:2305.02884 [pdf, ps, other]

Carrollian conformal fields and flat holography

Authors: Kevin Nguyen, Peter West

Abstract: The null conformal boundary $\mathscr{I}$ of Minkowski spacetime $\mathbb{M}$ plays a special role in scattering theory, as it is the locus where massless particle states are most naturally defined. We construct quantum fields on $\mathscr{I}$ which create these massless states from the vacuum and transform covariantly under Poincaré symmetries. Since the latter symmetries act as Carrollian confor… ▽ More The null conformal boundary $\mathscr{I}$ of Minkowski spacetime $\mathbb{M}$ plays a special role in scattering theory, as it is the locus where massless particle states are most naturally defined. We construct quantum fields on $\mathscr{I}$ which create these massless states from the vacuum and transform covariantly under Poincaré symmetries. Since the latter symmetries act as Carrollian conformal isometries of $\mathscr{I}$, these quantum fields are Carrollian conformal fields. This group theoretic construction is intrinsic to $\mathscr{I}$ by contrast to existing treatments in the literature. However we also show that the standard relativistic massless quantum fields in $\mathbb{M}$, when pulled back to $\mathscr{I}$, provide a realisation of these Carrollian conformal fields. This correspondence between bulk and boundary fields should constitute a basic entry in the dictionary of flat holography. Finally we show that $\mathscr{I}$ provides a natural parametrisation of the massless particles as described by irreducible representations of the Poincaré group, and that in an appropriate conjugate basis they indeed transform like Carrollian conformal fields. △ Less

Submitted 26 August, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: 21 pages + appendix; v2: additional references and comments; v3: published version

arXiv:2304.14399 [pdf, other]

We're Afraid Language Models Aren't Modeling Ambiguity

Authors: Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Ye** Choi

Abstract: Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are increasingly employed as dialogue interfaces and writing aids, handling ambiguous language is critical to their success. We characterize ambiguit… ▽ More Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are increasingly employed as dialogue interfaces and writing aids, handling ambiguous language is critical to their success. We characterize ambiguity in a sentence by its effect on entailment relations with another sentence, and collect AmbiEnt, a linguist-annotated benchmark of 1,645 examples with diverse kinds of ambiguity. We design a suite of tests based on AmbiEnt, presenting the first evaluation of pretrained LMs to recognize ambiguity and disentangle possible meanings. We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset. Finally, to illustrate the value of ambiguity-sensitive tools, we show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity. We encourage the field to rediscover the importance of ambiguity for NLP. △ Less

Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: EMNLP 2023 camera-ready

arXiv:2302.02199 [pdf, ps, other]

doi 10.1142/S0217751X23500458

Spacetime and large local transformations

Authors: Peter West

Abstract: We argue that the existence of solitons in theories in which local symmetries are spontaneously broken requires spacetime to be enlarged by additional coordinates that are associated with large local transformations. In the context of gravity theories the usual coordinates of spacetime can be thought of arising in this way. E theory automatically contains such an enlarged spacetime. We propose tha… ▽ More We argue that the existence of solitons in theories in which local symmetries are spontaneously broken requires spacetime to be enlarged by additional coordinates that are associated with large local transformations. In the context of gravity theories the usual coordinates of spacetime can be thought of arising in this way. E theory automatically contains such an enlarged spacetime. We propose that spacetime appears in an underlying theory when the local symmetries are spontaneously broken. △ Less

Submitted 26 April, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

Comments: A dedication to Lars Brink and references 45 and 46 added. A typo in equation (1.2.1) corrected

arXiv:2301.09617 [pdf, other]

Fully transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study

Authors: Sophia J. Wagner, Daniel Reisenbüchler, Nicholas P. West, Jan Moritz Niehues, Gregory Patrick Veldhuizen, Philip Quirke, Heike I. Grabsch, Piet A. van den Brandt, Gordon G. A. Hutchins, Susan D. Richman, Tanwei Yuan, Rupert Langer, Josien Christina Anna Jenniskens, Kelly Offermans, Wolfram Mueller, Richard Gray, Stephen B. Gruber, Joel K. Greenson, Gad Rennert, Joseph D. Bonner, Daniel Schmolze, Jacqueline A. James, Maurice B. Loughrey, Manuel Salto-Tellez, Hermann Brenner , et al. (6 additional authors not shown)

Abstract: Background: Deep learning (DL) can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. For example, a DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications, but… ▽ More Background: Deep learning (DL) can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. For example, a DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications, but have not been used for biomarker prediction in cancer at a large scale. In addition, most DL approaches have been trained on small patient cohorts, which limits their clinical utility. Methods: In this study, we developed a new fully transformer-based pipeline for end-to-end biomarker prediction from pathology slides. We combine a pre-trained transformer encoder and a transformer network for patch aggregation, capable of yielding single and multi-target prediction at patient level. We train our pipeline on over 9,000 patients from 10 colorectal cancer cohorts. Results: A fully transformer-based approach massively improves the performance, generalizability, data efficiency, and interpretability as compared with current state-of-the-art algorithms. After training on a large multicenter cohort, we achieve a sensitivity of 0.97 with a negative predictive value of 0.99 for MSI prediction on surgical resection specimens. We demonstrate for the first time that resection specimen-only training reaches clinical-grade performance on endoscopic biopsy tissue, solving a long-standing diagnostic problem. Interpretation: A fully transformer-based end-to-end pipeline trained on thousands of pathology slides yields clinical-grade performance for biomarker prediction on surgical resections and biopsies. Our new methods are freely available under an open source license. △ Less

Submitted 1 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: Updated Figure 2 and Table A.5

arXiv:2301.03354 [pdf]

Action needed to make carbon offsets from tropical forest conservation work for climate change mitigation

Authors: Thales A. P. West, Sven Wunder, Erin O. Sills, Jan Börner, Sami W. Rifai, Alexandra N. Neidermeier, Andreas Kontoleon

Abstract: Carbon offsets from voluntarily avoided deforestation projects are generated based on performance vis-à-vis ex-ante deforestation baselines. We examined the impacts of 27 forest conservation projects in six countries on three continents using synthetic control methods for causal inference. We compare the project baselines with ex-post counterfactuals based on observed deforestation in control site… ▽ More Carbon offsets from voluntarily avoided deforestation projects are generated based on performance vis-à-vis ex-ante deforestation baselines. We examined the impacts of 27 forest conservation projects in six countries on three continents using synthetic control methods for causal inference. We compare the project baselines with ex-post counterfactuals based on observed deforestation in control sites. Our findings show that most projects have not reduced deforestation. For projects that did, reductions were substantially lower than claimed. Methodologies for constructing deforestation baselines for carbon-offset interventions thus need urgent revisions in order to correctly attribute reduced deforestation to the conservation interventions, thus maintaining both incentives for forest conservation and the integrity of global carbon accounting. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.10465 [pdf, other]

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Authors: Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Ye** Choi

Abstract: Data scarcity has been a long standing issue in the field of open-domain social dialogue. To quench this thirst, we present SODA: the first publicly available, million-scale high-quality social dialogue dataset. By contextualizing social commonsense knowledge from a knowledge graph, we are able to distill an exceptionally broad spectrum of social interactions from a large language model. Human eva… ▽ More Data scarcity has been a long standing issue in the field of open-domain social dialogue. To quench this thirst, we present SODA: the first publicly available, million-scale high-quality social dialogue dataset. By contextualizing social commonsense knowledge from a knowledge graph, we are able to distill an exceptionally broad spectrum of social interactions from a large language model. Human evaluation shows that conversations in SODA are more consistent, specific, and (surprisingly) natural than those in prior human-authored datasets. Using SODA, we train COSMO: a generalizable conversation model that is significantly more natural and consistent on unseen datasets than best-performing conversation models (e.g., GODEL, BlenderBot-1, Koala, Vicuna). Experiments reveal COSMO is sometimes even preferred to the original human-written gold responses. Additionally, our results shed light on the distinction between knowledge-enriched conversations and natural social chitchats. We plan to make our data, model, and code public. △ Less

Submitted 23 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: EMNLP 2023. Dataset, model, and code can be found at https://hyunw.kim/sodaverse

arXiv:2212.09246 [pdf, other]

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

Authors: Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Ye** Choi

Abstract: Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation al… ▽ More Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation algorithms? The key intellectual challenge is to design a learning algorithm that achieve a competitive level of commonsense acquisition, without relying on the benefits of scale. In particular, we study generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale teacher model with two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date. △ Less

Submitted 26 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2211.00053 [pdf, other]

Generating Sequences by Learning to Self-Correct

Authors: Sean Welleck, Ximing Lu, Peter West, Faeze Brahman, Tianxiao Shen, Daniel Khashabi, Ye** Choi

Abstract: Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content. Language models, whether fine-tuned or prompted with few-shot demonstrations, frequently violate these constraints, and lack a mechanism to iteratively revise their outputs. Moreover, some powerful language models are of extr… ▽ More Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content. Language models, whether fine-tuned or prompted with few-shot demonstrations, frequently violate these constraints, and lack a mechanism to iteratively revise their outputs. Moreover, some powerful language models are of extreme scale or inaccessible, making it inefficient, if not infeasible, to update their parameters for task-specific adaptation. We present Self-Correction, an approach that decouples an imperfect base generator (an off-the-shelf language model or supervised sequence-to-sequence model) from a separate corrector that learns to iteratively correct imperfect generations. To train the corrector, we propose an online training procedure that can use either scalar or natural language feedback on intermediate imperfect generations. We show that Self-Correction improves upon the base generator in three diverse generation tasks - mathematical program synthesis, lexically-constrained generation, and toxicity control - even when the corrector is much smaller than the base generator. △ Less

Submitted 31 October, 2022; originally announced November 2022.

arXiv:2210.14801 [pdf, ps, other]

doi 10.1016/j.physletb.2023.138060

Universal derivation of the asymptotic charges of bosonic massless particles

Authors: Kevin Nguyen, Peter West

Abstract: We present a unified treatment of the conserved asymptotic charges associated with any bosonic massless particle in any spacetime dimension. In particular we provide master formulae for the asymptotic charges and the central extensions in the corresponding charge algebras. These formulae can be explicitly evaluated for any given theory. For illustration we apply them to electromagnetism and gravit… ▽ More We present a unified treatment of the conserved asymptotic charges associated with any bosonic massless particle in any spacetime dimension. In particular we provide master formulae for the asymptotic charges and the central extensions in the corresponding charge algebras. These formulae can be explicitly evaluated for any given theory. For illustration we apply them to electromagnetism and gravity, thereby recovering earlier results. △ Less

Submitted 23 June, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: 9 pages, References added and commented on

arXiv:2210.13800 [pdf, other]

Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation

Authors: Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov, Ye** Choi

Abstract: We present Referee, a novel framework for sentence summarization that can be trained reference-free (i.e., requiring no gold summaries for supervision), while allowing direct control for compression ratio. Our work is the first to demonstrate that reference-free, controlled sentence summarization is feasible via the conceptual framework of Symbolic Knowledge Distillation (West et al., 2022), where… ▽ More We present Referee, a novel framework for sentence summarization that can be trained reference-free (i.e., requiring no gold summaries for supervision), while allowing direct control for compression ratio. Our work is the first to demonstrate that reference-free, controlled sentence summarization is feasible via the conceptual framework of Symbolic Knowledge Distillation (West et al., 2022), where latent knowledge in pre-trained language models is distilled via explicit examples sampled from the teacher models, further purified with three types of filters: length, fidelity, and Information Bottleneck. Moreover, we uniquely propose iterative distillation of knowledge, where student models from the previous iteration of distillation serve as teacher models in the next iteration. Starting off from a relatively modest set of GPT3-generated summaries, we demonstrate how iterative knowledge distillation can lead to considerably smaller, but better summarizers with sharper controllability. A useful by-product of this iterative distillation process is a high-quality dataset of sentence-summary pairs with varying degrees of compression ratios. Empirical results demonstrate that the final student models vastly outperform the much larger GPT3-Instruct model in terms of the controllability of compression ratios, without compromising the quality of resulting summarization. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Journal ref: Empirical Methods in Natural Language Processing 2022 (EMNLP 2022)

arXiv:2208.11501 [pdf, ps, other]

doi 10.1007/JHEP12(2022)152

Higher dualisations of linearised gravity and the $A_1^{+++}$ algebra

Authors: Nicolas Boulanger, Paul P. Cook, Josh A. O'Connor, Peter West

Abstract: The non-linear realisation based on $A_1^{+++}$ is known to describe gravity in terms of both the graviton and the dual graviton. We extend this analysis at the linearised level to find the equations of motion for the first higher dual description of gravity that it contains. We also give a systematic method for finding the additional fields beyond those in the non-linear realisation that are requ… ▽ More The non-linear realisation based on $A_1^{+++}$ is known to describe gravity in terms of both the graviton and the dual graviton. We extend this analysis at the linearised level to find the equations of motion for the first higher dual description of gravity that it contains. We also give a systematic method for finding the additional fields beyond those in the non-linear realisation that are required to construct actions for all of the possible dual descriptions of gravity in the non-linear realisation. We show that these additional fields are closely correlated with the second fundamental representation of $A_1^{+++}\,$. △ Less

Submitted 12 June, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: 46 pages, no figures. Published version. One reference added, some content moved to an appendix

arXiv:2208.08234 [pdf, ps, other]

doi 10.1142/S0217751X22502086

Conserved asymptotic charges for any massless particle

Authors: Kevin Nguyen, Peter West

Abstract: We compute the conserved charges associated with the asymptotic symmetries of massless particles by examining their free theory in Minkowski spacetime. We give a procedure to systematically deduce the fall off of the massless fields at spatial infinity and show that it has a universal behaviour when expressed in tangent space. We do this for generic massless particles. We do not impose gauge fixin… ▽ More We compute the conserved charges associated with the asymptotic symmetries of massless particles by examining their free theory in Minkowski spacetime. We give a procedure to systematically deduce the fall off of the massless fields at spatial infinity and show that it has a universal behaviour when expressed in tangent space. We do this for generic massless particles. We do not impose gauge fixing conditions which allows us to uncover new nonzero charges for the graviton beyond the well-known supertranslation charges. We also compute conserved charges in the dual formulations of certain low spin particles and argue that this leads to an infinite number of new conserved charges. △ Less

Submitted 20 June, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: 25 pages. In this new version we added an acknowledgement, namely The work of KN is supported by the ERC Consolidator Grant N. 681908, Quantum black holes: A microscopic window into the microstructure of gravity

arXiv:2205.13636 [pdf, other]

Quark: Controllable Text Generation with Reinforced Unlearning

Authors: Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Ye** Choi

Abstract: Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning… ▽ More Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO (Schulman et al. 2017), while relying only on standard language modeling primitives. △ Less

Submitted 16 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Journal ref: NeurIPS 2022 (Oral Selection)

arXiv:2203.10133 [pdf, other]

Probing Factually Grounded Content Transfer with Factual Ablation

Authors: Peter West, Chris Quirk, Michel Galley, Ye** Choi

Abstract: Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality--it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challe… ▽ More Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality--it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified--to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines. △ Less

Submitted 28 March, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

arXiv:2203.08609 [pdf]

doi 10.1126/sciadv.abq6321

The Sabatier principle for Battery Anodes: Chemical Kinetics and Reversible Electrodeposition at Heterointerfaces

Authors: **gxu Zheng, Yue Deng, Wenzao Li, Jiefu Yin, Patrick J. West, Tian Tang, Xiao Tong, David C. Bock, Shuo **, Qing Zhao, Regina Garcia-Mendez, Kenneth J. Takeuchi, Esther S. Takeuchi, Amy C. Marschilok, Lynden A. Archer

Abstract: How surface chemistry influences reactions occurring thereupon has been a long-standing question of broad scientific and technological interest for centuries. Recently, it has re-emerged as a critical question in a subdiscipline of chemistry - electrochemistry at heterointerphases, where the answers have implications for both how, and in what forms, humanity stores the rising quantities of renewab… ▽ More How surface chemistry influences reactions occurring thereupon has been a long-standing question of broad scientific and technological interest for centuries. Recently, it has re-emerged as a critical question in a subdiscipline of chemistry - electrochemistry at heterointerphases, where the answers have implications for both how, and in what forms, humanity stores the rising quantities of renewable electric power generated from solar and wind installations world-wide. Here we consider the relation between the surface chemistry at such interphases and the reversibility of electrochemical transformations at a rechargeable battery electrode. Conventional wisdom holds that stronger chemical interaction between the metal deposits and electrode promotes reversibility. We report instead that a moderate strength of chemical interaction between the deposit and the substrate, neither too weak nor too strong, enables highest reversibility and stability of the plating/strip** redox processes at a battery anode. Analogous to the empirical Sabatier principle for chemical heterogeneous catalysis, our finding arises from the confluence of competing processes - one driven by electrochemistry and the other by chemical alloying. Based on experimental evaluation of metal plating/strip** systems in battery anodes of contemporary interest, we show that such knowledge provides a powerful tool for designing key materials in highly reversible electrochemical energy storage technologies based on earth-abundant, low-cost metals. △ Less

Submitted 25 September, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: accepted at Science Advances, in press

arXiv:2202.01106 [pdf, ps, other]

doi 10.1142/S0217751X22500518

The string little algebra

Authors: Keith Glennon, Peter West

Abstract: We consider the string, like point particles and branes, to be an irreducible representation of the semi-direct product of the Cartan involution invariant subalgebra of E11 and its vector representation. We show that the subalgebra that preserves the string charges, the string little algebra, is essentially the Borel subalgebra of E9. We also show that the known string physical states carry a repr… ▽ More We consider the string, like point particles and branes, to be an irreducible representation of the semi-direct product of the Cartan involution invariant subalgebra of E11 and its vector representation. We show that the subalgebra that preserves the string charges, the string little algebra, is essentially the Borel subalgebra of E9. We also show that the known string physical states carry a representation of parts of this algebra. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2201.06874 [pdf, ps, other]

doi 10.1142/S0217732322300051

The role of the 1.5 order formalism and the gauging of spacetime groups in the development of gravity and supergravity theories

Authors: Ali H. Chamseddine, Peter West

Abstract: The 1.5 formalism played a key role in the discovery of supergravity and it has been used to prove the invariance of essentially all supergravity theories under local supersymmetry. It emerged from the gauging of the super Poincare group to find supergravity. We review both of these developments as well as the auxiliary fields for simple supergravity and its most general coupling to matter using t… ▽ More The 1.5 formalism played a key role in the discovery of supergravity and it has been used to prove the invariance of essentially all supergravity theories under local supersymmetry. It emerged from the gauging of the super Poincare group to find supergravity. We review both of these developments as well as the auxiliary fields for simple supergravity and its most general coupling to matter using the tensor calculus. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 14 pages

arXiv:2112.08726 [pdf, other]

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

Authors: Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Ye** Choi

Abstract: The dominant paradigm for neural text generation is left-to-right decoding from autoregressive language models. Constrained or controllable generation under complex lexical constraints, however, requires foresight to plan ahead feasible future paths. Drawing inspiration from the A* search algorithm, we propose NeuroLogic A*esque, a decoding algorithm that incorporates heuristic estimates of futu… ▽ More The dominant paradigm for neural text generation is left-to-right decoding from autoregressive language models. Constrained or controllable generation under complex lexical constraints, however, requires foresight to plan ahead feasible future paths. Drawing inspiration from the A* search algorithm, we propose NeuroLogic A*esque, a decoding algorithm that incorporates heuristic estimates of future cost. We develop efficient lookahead heuristics that are efficient for large-scale language models, making our method a drop-in replacement for common techniques such as beam search and top-k sampling. To enable constrained generation, we build on NeuroLogic decoding (Lu et al., 2021), combining its flexibility in incorporating logical constraints with A*esque estimates of future constraint satisfaction. Our approach outperforms competitive baselines on five generation tasks, and achieves new state-of-the-art performance on table-to-text generation, constrained machine translation, and keyword-constrained generation. The improvements are particularly notable on tasks that require complex constraint satisfaction or in few-shot or zero-shot settings. NeuroLogic A*esque illustrates the power of decoding for improving and enabling new capabilities of large-scale language models. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2110.08387 [pdf, other]

Generated Knowledge Prompting for Commonsense Reasoning

Authors: Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Ye** Choi, Hannaneh Hajishirzi

Abstract: It remains an open question whether incorporating external knowledge benefits commonsense reasoning while maintaining the flexibility of pretrained sequence models. To investigate this question, we develop generated knowledge prompting, which consists of generating knowledge from a language model, then providing the knowledge as additional input when answering a question. Our method does not requi… ▽ More It remains an open question whether incorporating external knowledge benefits commonsense reasoning while maintaining the flexibility of pretrained sequence models. To investigate this question, we develop generated knowledge prompting, which consists of generating knowledge from a language model, then providing the knowledge as additional input when answering a question. Our method does not require task-specific supervision for knowledge integration, or access to a structured knowledge base, yet it improves performance of large-scale, state-of-the-art models on four commonsense reasoning tasks, achieving state-of-the-art results on numerical commonsense (NumerSense), general commonsense (CommonsenseQA 2.0), and scientific commonsense (QASC) benchmarks. Generated knowledge prompting highlights large-scale language models as flexible sources of external knowledge for improving commonsense reasoning. Our code is available at https://github.com/liujch1998/GKP △ Less

Submitted 28 September, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: ACL 2022 main conference

arXiv:2110.07178 [pdf, other]

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

Authors: Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Ye** Choi

Abstract: The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowl… ▽ More The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the neural model. We also distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models. △ Less

Submitted 28 November, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2109.13986 [pdf, other]

Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics

Authors: Sean Welleck, Peter West, Jize Cao, Ye** Choi

Abstract: Neural sequence models trained with maximum likelihood estimation have led to breakthroughs in many tasks, where success is defined by the gap between training and test performance. However, their ability to achieve stronger forms of generalization remains unclear. We consider the problem of symbolic mathematical integration, as it requires generalizing systematically beyond the test set. We devel… ▽ More Neural sequence models trained with maximum likelihood estimation have led to breakthroughs in many tasks, where success is defined by the gap between training and test performance. However, their ability to achieve stronger forms of generalization remains unclear. We consider the problem of symbolic mathematical integration, as it requires generalizing systematically beyond the test set. We develop a methodology for evaluating generalization that takes advantage of the problem domain's structure and access to a verifier. Despite promising in-distribution performance of sequence-to-sequence models in this domain, we demonstrate challenges in achieving robustness, compositionality, and out-of-distribution generalization, through both carefully constructed manual test suites and a genetic algorithm that automatically finds large collections of failures in a controllable manner. Our investigation highlights the difficulty of generalizing well with the predominant modeling and learning approach, and the importance of evaluating beyond the test set, across different aspects of generalization. △ Less

Submitted 24 February, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: AAAI 2022

arXiv:2108.02247 [pdf]

Quasi-HfO$_x$/ AlO$_y$ and AlO$_y$/ HfO$_x$ Based Memristor Devices: Role of Bi-layered Oxides in Digital Set and Analog Reset Switching

Authors: Pradip Basnet, Erik Anderson, Bhaswar Chakrabarti, Matthew P. West, Fabia Farlin Athena, Eric M. Vogel

Abstract: Understanding the resistive switching behavior, or the resistance change, of oxide-based memristor devices, is critical to predicting their responses with known electrical inputs. Also, with the known electrical response of a memristor, one can confirm its usefulness in non-volatile memory and/or in artificial neural networks. Although bi- or multi-layered oxides have been reported to improve the… ▽ More Understanding the resistive switching behavior, or the resistance change, of oxide-based memristor devices, is critical to predicting their responses with known electrical inputs. Also, with the known electrical response of a memristor, one can confirm its usefulness in non-volatile memory and/or in artificial neural networks. Although bi- or multi-layered oxides have been reported to improve the switching performance, compared to the single oxide layer, the detailed explanation about why the switching can easily be improved for some oxides combinations is still missing. Herein, we fabricated two types of bi-layered heterostructure devices, quasi-HfO$_x$/AlO$_y$ and AlO$_y$/HfO$_x$ sandwiched between Au electrodes, and their electrical responses are investigated. For a deeper understanding of the switching mechanism, the performance of a HfOx only device is also considered, which serves as a control device. The role of bi-layered heterostructures is investigated using both the experimental and simulated results. Our results suggest that synergistic switching performance can be achieved with a proper combination of these materials and/or devices. These results open the avenue for designing more efficient double- or multi-layers memristor devices for an analog response. △ Less

Submitted 2 October, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 7 pages, 5 figures

arXiv:2104.08315 [pdf, other]

Surface Form Competition: Why the Highest Probability Answer Isn't Always Right

Authors: Ari Holtzman, Peter West, Vered Shwartz, Ye** Choi, Luke Zettlemoyer

Abstract: Large language models have shown promising results in zero-shot settings (Brown et al.,2020; Radford et al., 2019). For example, they can perform multiple choice tasks simply by conditioning on a question and selecting the answer with the highest probability. However, ranking by string probability can be problematic due to surface form competition-wherein different surface forms compete for prob… ▽ More Large language models have shown promising results in zero-shot settings (Brown et al.,2020; Radford et al., 2019). For example, they can perform multiple choice tasks simply by conditioning on a question and selecting the answer with the highest probability. However, ranking by string probability can be problematic due to surface form competition-wherein different surface forms compete for probability mass, even if they represent the same underlying concept, e.g. "computer" and "PC." Since probability mass is finite, this lowers the probability of the correct answer, due to competition from other strings that are valid answers (but not one of the multiple choice options). We introduce Domain Conditional Pointwise Mutual Information, an alternative scoring function that directly compensates for surface form competition by simply reweighing each option according to a term that is proportional to its a priori likelihood within the context of the specific zero-shot task. It achieves consistent gains in zero-shot performance over both calibrated (Zhao et al., 2021) and uncalibrated scoring functions on all GPT-2 and GPT-3 models over a variety of multiple choice datasets. △ Less

Submitted 20 November, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

arXiv:2102.02152 [pdf, ps, other]

doi 10.1142/S0217751X21500962

The massless irreducible representation in E theory and how bosons can appear as spinors

Authors: Keith Glennon, Peter West

Abstract: We study in detail the irreducible representation of E theory that corresponds to massless particles. This has little algebra Ic(E9) and contains 128 physical states that belong to the spinor representation of SO(16). These are the degrees of freedom of maximal supergravity in eleven dimensions. This smaller number of the degrees of freedom, compared to what might be expected, is due to an infinit… ▽ More We study in detail the irreducible representation of E theory that corresponds to massless particles. This has little algebra Ic(E9) and contains 128 physical states that belong to the spinor representation of SO(16). These are the degrees of freedom of maximal supergravity in eleven dimensions. This smaller number of the degrees of freedom, compared to what might be expected, is due to an infinite number of duality relations which in turn can be traced to the existence of a subaglebra of Ic(E9) which forms an ideal and annihilates the representation. We explain how these features are inherited into the covariant theory. We also comment on the remarkable similarity between how the bosons and fermions arise in E theory. △ Less

Submitted 3 February, 2021; originally announced February 2021.

arXiv:2012.09050 [pdf, ps, other]

doi 10.1142/S0217732321500760

Supersymmetry anomalies and the Wess-Zumino Model in a supergravity background

Authors: Giorgos Eleftheriou, Peter West

Abstract: We briefly recall the procedure for computing the Ward Identities in the presence of a regulator which violates the symmetry being considered. We compute the first non-trivial correction to the supersymmetry Ward Identity of the Wess-Zumino model in the presence of background supergravity using dimensional regularisation. We find that the result can be removed using a finite local counter term and… ▽ More We briefly recall the procedure for computing the Ward Identities in the presence of a regulator which violates the symmetry being considered. We compute the first non-trivial correction to the supersymmetry Ward Identity of the Wess-Zumino model in the presence of background supergravity using dimensional regularisation. We find that the result can be removed using a finite local counter term and so there is no supersymmetry anomaly. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: seven pages

arXiv:2011.14243 [pdf, other]

Srifty: Swift and Thrifty Distributed Training on the Cloud

Authors: Liang Luo, Peter West, Arvind Krishnamurthy, Luis Ceze

Abstract: Finding the best VM configuration is key to achieve lower cost and higher throughput, two primary concerns in cloud-based distributed neural network (NN) training today. Optimal VM selection that meets user constraints requires efficiently navigating a large search space while controlling for the performance variance associated with sharing cloud instances and networks. In this work, we characteri… ▽ More Finding the best VM configuration is key to achieve lower cost and higher throughput, two primary concerns in cloud-based distributed neural network (NN) training today. Optimal VM selection that meets user constraints requires efficiently navigating a large search space while controlling for the performance variance associated with sharing cloud instances and networks. In this work, we characterize this variance in the context of distributed NN training and present results of a comprehensive throughput and cost-efficiency study we conducted across a wide array of instances to prune for the optimal VM search space. Using insights from these studies, we built Srifty, a system that combines runtime profiling with learned performance models to accurately predict training performance and find the best VM choice that satisfies user constraints, potentially leveraging both heterogeneous setups and spot instances. We integrated Srifty with PyTorch and evaluated it on Amazon EC2. We conducted a large-scale generalization study of Srifty across more than 2K training setups on EC2. Our results show that Srifty achieves an iteration latency prediction error of 8%, and its VM instance recommendations offer significant throughput gain and cost reduction while satisfying user constraints compared to existing solutions in complex, real-world scenarios. △ Less

Submitted 1 July, 2022; v1 submitted 28 November, 2020; originally announced November 2020.

arXiv:2010.12884 [pdf, other]

NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

Authors: Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, Ye** Choi

Abstract: Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large a… ▽ More Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large amounts of task-specific examples. We propose NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models -- supervised or not -- to generate fluent text while satisfying complex lexical constraints. Our approach is powerful yet efficient. It handles any set of lexical constraints that is expressible under predicate logic, while its asymptotic runtime is equivalent to conventional beam search. Empirical results on four benchmarks show that NeuroLogic Decoding outperforms previous approaches, including algorithms that handle a subset of our constraints. Moreover, we find that unsupervised models with NeuroLogic Decoding often outperform supervised models with conventional decoding, even when the latter is based on considerably larger networks. Our results suggest the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms. △ Less

Submitted 20 April, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: NAACL 2021

arXiv:2010.08566 [pdf, other]

Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models

Authors: Peter West, Ximing Lu, Ari Holtzman, Chandra Bhagavatula, Jena Hwang, Ye** Choi

Abstract: Publicly available, large pretrained LanguageModels (LMs) generate text with remarkable quality, but only sequentially from left to right. As a result, they are not immediately applicable to generation tasks that break the unidirectional assumption, such as paraphrasing or text-infilling, necessitating task-specific supervision. In this paper, we present Reflective Decoding, a novel unsupervised… ▽ More Publicly available, large pretrained LanguageModels (LMs) generate text with remarkable quality, but only sequentially from left to right. As a result, they are not immediately applicable to generation tasks that break the unidirectional assumption, such as paraphrasing or text-infilling, necessitating task-specific supervision. In this paper, we present Reflective Decoding, a novel unsupervised algorithm that allows for direct application of unidirectional LMs to non-sequential tasks. Our 2-step approach requires no supervision or even parallel corpora, only two off-the-shelf pretrained LMs in opposite directions: forward and backward. First, in the contextualization step, we use LMs to generate ensembles of past and future contexts which collectively capture the input (e.g. the source sentence for paraphrasing). Second, in the reflection step, we condition on these "context ensembles", generating outputs that are compatible with them. Comprehensive empirical results demonstrate that Reflective Decoding outperforms strong unsupervised baselines on both paraphrasing and abductive text infilling, significantly narrowing the gap between unsupervised and supervised methods. Reflective Decoding surpasses multiple supervised baselines on various metrics including human evaluation. △ Less

Submitted 24 December, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

arXiv:2010.05906 [pdf, other]

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Authors: Lianhui Qin, Vered Shwartz, Peter West, Chandra Bhagavatula, Jena Hwang, Ronan Le Bras, Antoine Bosselut, Ye** Choi

Abstract: Abductive and counterfactual reasoning, core abilities of everyday human cognition, require reasoning about what might have happened at time t, while conditioning on multiple contexts from the relative past and future. However, simultaneous incorporation of past and future contexts using generative language models (LMs) can be challenging, as they are trained either to condition only on the past c… ▽ More Abductive and counterfactual reasoning, core abilities of everyday human cognition, require reasoning about what might have happened at time t, while conditioning on multiple contexts from the relative past and future. However, simultaneous incorporation of past and future contexts using generative language models (LMs) can be challenging, as they are trained either to condition only on the past context or to perform narrowly scoped text-infilling. In this paper, we propose DeLorean, a new unsupervised decoding algorithm that can flexibly incorporate both the past and future contexts using only off-the-shelf, left-to-right language models and no supervision. The key intuition of our algorithm is incorporating the future through back-propagation, during which, we only update the internal representation of the output while fixing the model parameters. By alternating between forward and backward propagation, DeLorean can decode the output representation that reflects both the left and right contexts. We demonstrate that our approach is general and applicable to two nonmonotonic reasoning tasks: abductive text generation and counterfactual story revision, where DeLorean outperforms a range of unsupervised and some supervised methods, based on automatic and human evaluation. △ Less

Submitted 2 August, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2009.09961 [pdf, other]

Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference

Authors: Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff

Abstract: Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by… ▽ More Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by previous social media studies. Evaluating causal methods is challenging, as ground truth counterfactuals are almost never available. Presently, no empirical evaluation framework for causal methods using text exists, and as such, practitioners must select their methods without guidance. We contribute the first such framework, which consists of five tasks drawn from real world studies. Our framework enables the evaluation of any casual inference method using text. Across 648 experiments and two datasets, we evaluate every commonly used causal inference method and identify their strengths and weaknesses to inform social media researchers seeking to use such methods, and guide future improvements. We make all tasks, data, and models public to inform applications and encourage additional research. △ Less

Submitted 6 May, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

Comments: to appear at ICWSM 2022

arXiv:2007.11925 [pdf, ps, other]

doi 10.1016/j.physletb.2020.135744

Kac-Moody algebras and the cosmological constant

Authors: Peter West

Abstract: We show that the theory of gravity constructed from the non-linear realisation of the semi-direct product of the Kac-Moody algebra A1+++ with its vector representation does not allow a cosmological constant. The existence of a cosmological constant in this theory is related to the breaking of the gravitational duality symmetry. We show that the theory of gravity constructed from the non-linear realisation of the semi-direct product of the Kac-Moody algebra A1+++ with its vector representation does not allow a cosmological constant. The existence of a cosmological constant in this theory is related to the breaking of the gravitational duality symmetry. △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:2006.02383 [pdf, ps, other]

doi 10.1016/j.physletb.2020.135714

The non-linear dual gravity equation of motion in eleven dimensions

Authors: Keith Glennon, Peter West

Abstract: We derive the non-linear dual graviton equation of motion in eleven dimensions in the context of E theory. We derive the non-linear dual graviton equation of motion in eleven dimensions in the context of E theory. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:2004.05483 [pdf, other]

Unsupervised Commonsense Question Answering with Self-Talk

Authors: Vered Shwartz, Peter West, Ronan Le Bras, Chandra Bhagavatula, Ye** Choi

Abstract: Natural language understanding involves reading between the lines with implicit background knowledge. Current systems either rely on pre-trained language models as the sole implicit source of world knowledge, or resort to external knowledge bases (KBs) to incorporate additional relevant knowledge. We propose an unsupervised framework based on self-talk as a novel alternative to multiple-choice com… ▽ More Natural language understanding involves reading between the lines with implicit background knowledge. Current systems either rely on pre-trained language models as the sole implicit source of world knowledge, or resort to external knowledge bases (KBs) to incorporate additional relevant knowledge. We propose an unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks. Inspired by inquiry-based discovery learning (Bruner, 1961), our approach inquires language models with a number of information seeking questions such as "$\textit{what is the definition of ...}$" to discover additional background knowledge. Empirical results demonstrate that the self-talk procedure substantially improves the performance of zero-shot language model baselines on four out of six commonsense benchmarks, and competes with models that obtain knowledge from external KBs. While our approach improves performance on several benchmarks, the self-talk induced knowledge even when leading to correct answers is not always seen as useful by human judges, raising interesting questions about the inner-workings of pre-trained language models for commonsense reasoning. △ Less

Submitted 15 September, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

Comments: EMNLP 2020

arXiv:2004.03363 [pdf, ps, other]

doi 10.1142/S0217751X20500682

Gravity, Dual Gravity and A1+++

Authors: Keith Glennon, Peter West

Abstract: We construct the non-linear realisation of the semi-direct product of the very extended algebra A1+++ and its vector representation. This theory has an infinite number of fields that depend on a spacetime with an infinite number of coordinates. Discarding all except the lowest level field and coordinates the dynamics is just Einstein's equation for the graviton field. We show that the gravity fiel… ▽ More We construct the non-linear realisation of the semi-direct product of the very extended algebra A1+++ and its vector representation. This theory has an infinite number of fields that depend on a spacetime with an infinite number of coordinates. Discarding all except the lowest level field and coordinates the dynamics is just Einstein's equation for the graviton field. We show that the gravity field is related to the dual graviton field by a duality relation and we also derive the equation of motion for the dual gravity field. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: 27 pages

arXiv:1912.03545 [pdf]

doi 10.1039/C9TC06736A

Substrate Dependent Resistive Switching in Amorphous-HfOx Memristors: An Experimental and Computational Investigation

Authors: Pradip Basnet, Darshan G Pahinkar, Matthew P. West, Christopher J. Perini, Samuel Graham, Eric M. Vogel

Abstract: While two-terminal HfOX (x<2) memristor devices have been studied for ion transport and current evolution, there have been limited reports on the effect of the long-range thermal environment on their performance. In this work, amorphous-HfOX based memristor devices on two different substrates, thin SiO2(280 nm)/Si and glass, with different thermal conductivities in the range from 1.2 to 138 W/m-K… ▽ More While two-terminal HfOX (x<2) memristor devices have been studied for ion transport and current evolution, there have been limited reports on the effect of the long-range thermal environment on their performance. In this work, amorphous-HfOX based memristor devices on two different substrates, thin SiO2(280 nm)/Si and glass, with different thermal conductivities in the range from 1.2 to 138 W/m-K were fabricated. Devices on glass substrates exhibit lower reset voltage, wider memory window and, in turn, a higher performance window. In addition, the devices on glass show better endurance than the devices on the SiO2/Si substrate. These devices also show non-volatile multi-level resistances at relatively low operating voltages which is critical for neuromorphic computing applications. A Multiphysics COMSOL computational model is presented that describes the transport of heat, ions and electrons in these structures. The combined experimental and COMSOL simulation results indicate that the long-range thermal environment can have a significant impact on the operation of HfOx-based memristors and that substrates with low thermal conductivity can enhance switching performance. △ Less

Submitted 1 April, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

Comments: 8 pages, 9 figures. Journal of Materials Chemistry C, 2020

arXiv:1911.03015 [pdf, ps, other]

doi 10.1088/1367-2630/ab6a3a

The metastable Q $^3Δ_2$ state of ThO: A new resource for the ACME electron EDM search

Authors: Xing Wu, Zhen Han, James Chow, Daniel G. Ang, Cole Meisenhelder, Cristian D. Panda, Elizabeth P. West, Gerald Gabrielse, John M. Doyle, David DeMille

Abstract: The best upper limit for the electron electric dipole moment was recently set by the ACME collaboration. This experiment measures an electron spin-precession in a cold beam of ThO molecules in their metastable $H~(^3Δ_1)$ state. Improvement in the statistical and systematic uncertainties is possible with more efficient use of molecules from the source and better magnetometry in the experiment, res… ▽ More The best upper limit for the electron electric dipole moment was recently set by the ACME collaboration. This experiment measures an electron spin-precession in a cold beam of ThO molecules in their metastable $H~(^3Δ_1)$ state. Improvement in the statistical and systematic uncertainties is possible with more efficient use of molecules from the source and better magnetometry in the experiment, respectively. Here, we report measurements of several relevant properties of the long-lived $Q~(^3Δ_2)$ state of ThO, and show that this state is a very useful resource for both these purposes. The $Q$ state lifetime is long enough that its decay during the time of flight in the ACME beam experiment is negligible. The large electric dipole moment measured for the $Q$ state, giving rise to a large linear Stark shift, is ideal for an electrostatic lens that increases the fraction of molecules detected downstream. The measured magnetic moment of the $Q$ state is also large enough to be used as a sensitive co-magnetometer in ACME. Finally, we show that the $Q$ state has a large transition dipole moment to the $C~(^1Π_1)$ state, which allows for efficient population transfer between the ground state $X~(^1Σ^+)$ and the $Q$ state via $X-C-Q$ Stimulated Raman Adiabatic Passage (STIRAP). We demonstrate $90\,$% STIRAP transfer efficiency. In the course of these measurements, we also determine the magnetic moment of $C$ state, the $X\rightarrow C$ transition dipole moment, and branching ratios of decays from the $C$ state. △ Less

Submitted 7 November, 2019; originally announced November 2019.

Comments: 21 pages, 6 figures, 5 pages appendices

Journal ref: New Journal of Physics, 22 023013 (2020)

arXiv:1909.10434 [pdf, other]

doi 10.1093/mnras/stz2706

Clocking the formation of today's largest galaxies: Wide field integral spectroscopy of Brightest Cluster Galaxies and their surroundings

Authors: Louise O. V. Edwards, Matthew Salinas, Steffanie Stanley, Priscilla E. Holguin West, Isabella Trierweiler, Hannah Alpert, Paula Coelho, Saisneha Koppaka, Grant R. Tremblay, Hugo Martel, Yuan Li

Abstract: The formation and evolution of local brightest cluster galaxies (BCGs) is investigated by determining the stellar populations and dynamics from the galaxy core, though the outskirts and into the intracluster light (ICL). Integral spectroscopy of 23 BCGs observed out to 4 r_e is collected and high signal-to-noise regions are identified. Stellar population synthesis codes are used to determine the a… ▽ More The formation and evolution of local brightest cluster galaxies (BCGs) is investigated by determining the stellar populations and dynamics from the galaxy core, though the outskirts and into the intracluster light (ICL). Integral spectroscopy of 23 BCGs observed out to 4 r_e is collected and high signal-to-noise regions are identified. Stellar population synthesis codes are used to determine the age, metallicity, velocity, and velocity dispersion of stars within each region. The intracluster light (ICL) spectra are best modeled with populations that are younger and less metal-rich than those of the BCG cores. The average BCG core age of the sample is 13.3$\pm$ 2.8 Gyr and the average metallicity is [Fe/H] = 0.30 $\pm$ 0.09, whereas for the ICL the average age is 9.2$\pm$3.5 Gyr and the average metallicity is [Fe/H] = 0.18$\pm$0.16. The velocity dispersion profile is seen to be rising or flat in most of the sample (17/23), and those with rising values reach the value of the host cluster's velocity dispersion in several cases. The most extended BCGs are closest to the peak of the cluster's X-ray luminosity. The results are consistent with the idea that the BCG cores and inner regions formed quickly and long ago, with the outer regions and ICL forming more recently, and continuing to assemble through minor merging. Any recent star formation in the BCGs is a minor component, and is associated with the cluster cool core status. △ Less

Submitted 23 September, 2019; originally announced September 2019.

Comments: 22 pages, 21 figures, MNRAS, accepted

arXiv:1909.07405 [pdf, other]

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

Authors: Peter West, Ari Holtzman, Jan Buys, Ye** Choi

Abstract: The principle of the Information Bottleneck (Tishby et al. 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel approach to unsupervised sentence summarization by map** the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence th… ▽ More The principle of the Information Bottleneck (Tishby et al. 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel approach to unsupervised sentence summarization by map** the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence. Our iterative algorithm under the Information Bottleneck objective searches gradually shorter subsequences of the given sentence while maximizing the probability of the next sentence conditioned on the summary. Using only pretrained language models with no direct supervision, our approach can efficiently perform extractive sentence summarization over a large corpus. Building on our unsupervised extractive summarization (BottleSumEx), we then present a new approach to self-supervised abstractive summarization (BottleSumSelf), where a transformer-based language model is trained on the output summaries of our unsupervised method. Empirical results demonstrate that our extractive method outperforms other unsupervised models on multiple automatic metrics. In addition, we find that our self-supervised abstractive model outperforms unsupervised baselines (including our own) by human evaluation along multiple attributes. △ Less

Submitted 20 September, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

Showing 1–50 of 178 results for author: West, P