-
Large Language Model Unlearning via Embedding-Corrupted Prompts
Authors:
Chris Yuhao Liu,
Yaxuan Wang,
Jeffrey Flanigan,
Yang Liu
Abstract:
Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a large language model should not know is important for ensuring alignment and thus safe use. However, accurately and efficiently unlearning knowledge from an LLM remains challenging due to the potential collateral damage caused by the fuzzy boundary between retention and forget…
▽ More
Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a large language model should not know is important for ensuring alignment and thus safe use. However, accurately and efficiently unlearning knowledge from an LLM remains challenging due to the potential collateral damage caused by the fuzzy boundary between retention and forgetting, and the large computational requirements for optimization across state-of-the-art models with hundreds of billions of parameters. In this work, we present Embedding-COrrupted (ECO) Prompts, a lightweight unlearning framework for large language models to address both the challenges of knowledge entanglement and unlearning efficiency. Instead of relying on the LLM itself to unlearn, we enforce an unlearned state during inference by employing a prompt classifier to identify and safeguard prompts to forget. We learn corruptions added to prompt embeddings via zeroth order optimization toward the unlearning objective offline and corrupt prompts flagged by the classifier during inference. We find that these embedding-corrupted prompts not only lead to desirable outputs that satisfy the unlearning objective but also closely approximate the output from a model that has never been trained on the data intended for forgetting. Through extensive experiments on unlearning, we demonstrate the superiority of our method in achieving promising unlearning at nearly zero side effects in general domains and domains closely related to the unlearned ones. Additionally, we highlight the scalability of our method to 100 LLMs, ranging from 0.5B to 236B parameters, incurring no additional cost as the number of parameters increases.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs
Authors:
Brendan King,
Jeffrey Flanigan
Abstract:
Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize unlabelled data and a schema definition are sufficient for building a worki…
▽ More
Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize unlabelled data and a schema definition are sufficient for building a working task-oriented dialogue system, completely unsupervised. Using only (1) a well-defined API schema (2) a set of unlabelled dialogues between a user and agent, we develop a novel approach for inferring turn-level annotations as latent variables using a noisy channel model. We iteratively improve these pseudo-labels with expectation-maximization (EM), and use the inferred labels to train an end-to-end dialogue agent. Evaluating our approach on the MultiWOZ benchmark, our method more than doubles the dialogue success rate of a strong GPT-3.5 baseline.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Future Language Modeling from Temporal Document History
Authors:
Changmao Li,
Jeffrey Flanigan
Abstract:
Predicting the future is of great interest across many aspects of human activity. Businesses are interested in future trends, traders are interested in future stock prices, and companies are highly interested in future technological breakthroughs. While there are many automated systems for predicting future numerical data, such as weather, stock prices, and demand for products, there is relatively…
▽ More
Predicting the future is of great interest across many aspects of human activity. Businesses are interested in future trends, traders are interested in future stock prices, and companies are highly interested in future technological breakthroughs. While there are many automated systems for predicting future numerical data, such as weather, stock prices, and demand for products, there is relatively little work in automatically predicting textual data. Humans are interested in textual data predictions because it is a natural format for our consumption, and experts routinely make predictions in a textual format (Christensen et al., 2004; Tetlock & Gardner, 2015; Frick, 2015). However, there has been relatively little formalization of this general problem in the machine learning or natural language processing communities. To address this gap, we introduce the task of future language modeling: probabilistic modeling of texts in the future based on a temporal history of texts. To our knowledge, our work is the first work to formalize the task of predicting the future in this way. We show that it is indeed possible to build future language models that improve upon strong non-temporal language model baselines, opening the door to working on this important, and widely applicable problem.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Task Contamination: Language Models May Not Be Few-Shot Anymore
Authors:
Changmao Li,
Jeffrey Flanigan
Abstract:
Large language models (LLMs) offer impressive performance in various zero-shot and few-shot tasks. However, their success in zero-shot and few-shot settings may be affected by task contamination, a potential limitation that has not been thoroughly examined. This paper investigates how zero-shot and few-shot performance of LLMs has changed chronologically over time. Utilizing GPT-3 series models an…
▽ More
Large language models (LLMs) offer impressive performance in various zero-shot and few-shot tasks. However, their success in zero-shot and few-shot settings may be affected by task contamination, a potential limitation that has not been thoroughly examined. This paper investigates how zero-shot and few-shot performance of LLMs has changed chronologically over time. Utilizing GPT-3 series models and several other recent open-sourced LLMs, and controlling for dataset difficulty, we find that on datasets released before the LLM training data creation date, LLMs perform surprisingly better than on datasets released after. This strongly indicates that, for many LLMs, there exists task contamination on zero-shot and few-shot evaluation for datasets released prior to the LLMs' training data creation date. Additionally, we utilize training data inspection, task example extraction, and a membership inference attack, which reveal further evidence of task contamination. Importantly, we find that for classification tasks with no possibility of task contamination, LLMs rarely demonstrate statistically significant improvements over simple majority baselines, in both zero and few-shot settings.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Understanding the Role of Optimization in Double Descent
Authors:
Chris Yuhao Liu,
Jeffrey Flanigan
Abstract:
The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double descent has been observed in various tasks and architectures, the peak of double descent…
▽ More
The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double descent has been observed in various tasks and architectures, the peak of double descent can sometimes be noticeably absent or diminished, even without explicit regularization, such as weight decay and early stop**. In this paper, we investigate this intriguing phenomenon from the optimization perspective and propose a simple optimization-based explanation for why double descent sometimes occurs weakly or not at all. To the best of our knowledge, we are the first to demonstrate that many disparate factors contributing to model-wise double descent (initialization, normalization, batch size, learning rate, optimization algorithm) are unified from the viewpoint of optimization: model-wise double descent is observed if and only if the optimizer can find a sufficiently low-loss minimum. These factors directly affect the condition number of the optimization problem or the optimizer and thus affect the final minimum found by the optimizer, reducing or increasing the height of the double descent peak. We conduct a series of controlled experiments on random feature models and two-layer neural networks under various optimization settings, demonstrating this optimization-based unified view. Our results suggest the following implication: Double descent is unlikely to be a problem for real-world machine learning setups. Additionally, our results help explain the gap between weak double descent peaks in practice and strong peaks observable in carefully designed setups.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
A New Approach Towards Autoformalization
Authors:
Nilay Patel,
Rahul Saha,
Jeffrey Flanigan
Abstract:
Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of bac…
▽ More
Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.
△ Less
Submitted 19 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts
Authors:
Geetanjali Rakshit,
Jeffrey Flanigan
Abstract:
Figurative language is commonplace in natural language, and while making communication memorable and creative, can be difficult to understand. In this work, we investigate the robustness of Question Answering (QA) models on figurative text. Yes/no questions, in particular, are a useful probe of figurative language understanding capabilities of large language models. We propose FigurativeQA, a set…
▽ More
Figurative language is commonplace in natural language, and while making communication memorable and creative, can be difficult to understand. In this work, we investigate the robustness of Question Answering (QA) models on figurative text. Yes/no questions, in particular, are a useful probe of figurative language understanding capabilities of large language models. We propose FigurativeQA, a set of 1000 yes/no questions with figurative and non-figurative contexts, extracted from the domains of restaurant and product reviews. We show that state-of-the-art BERT-based QA models exhibit an average performance drop of up to 15\% points when answering questions from figurative contexts, as compared to non-figurative ones. While models like GPT-3 and ChatGPT are better at handling figurative texts, we show that further performance gains can be achieved by automatically simplifying the figurative contexts into their non-figurative (literal) counterparts. We find that the best overall model is ChatGPT with chain-of-thought prompting to generate non-figurative contexts. Our work provides a promising direction for building more robust QA models with figurative language understanding capabilities.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking
Authors:
Brendan King,
Jeffrey Flanigan
Abstract:
There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances th…
▽ More
There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances the state of the art with three advancements to in-context learning for DST. First, we formulate DST as a Python programming task, explicitly modeling language coreference as variable reference in Python. Second, since in-context learning depends highly on the context examples, we propose a method to retrieve a diverse set of relevant examples to improve performance. Finally, we introduce a novel re-weighting method during decoding that takes into account probabilities of competing surface forms, and produces a more accurate dialogue state prediction. We evaluate our approach using MultiWOZ and achieve state-of-the-art multi-domain joint-goal accuracy in zero and few-shot settings.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Dependency Dialogue Acts -- Annotation Scheme and Case Study
Authors:
Jon Z. Cai,
Brendan King,
Margaret Perkoff,
Shiran Dudy,
Jie Cao,
Marie Grace,
Natalia Wojarnik,
Ananya Ganesh,
James H. Martin,
Martha Palmer,
Marilyn Walker,
Jeffrey Flanigan
Abstract:
In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,…
▽ More
In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Automatic Identification of Motivation for Code-Switching in Speech Transcripts
Authors:
Ritu Belani,
Jeffrey Flanigan
Abstract:
Code-switching, or switching between languages, occurs for many reasons and has important linguistic, sociological, and cultural implications. Multilingual speakers code-switch for a variety of purposes, such as expressing emotions, borrowing terms, making jokes, introducing a new topic, etc. The reason for code-switching may be quite useful for analysis, but is not readily apparent. To remedy thi…
▽ More
Code-switching, or switching between languages, occurs for many reasons and has important linguistic, sociological, and cultural implications. Multilingual speakers code-switch for a variety of purposes, such as expressing emotions, borrowing terms, making jokes, introducing a new topic, etc. The reason for code-switching may be quite useful for analysis, but is not readily apparent. To remedy this situation, we annotate a new dataset of motivations for code-switching in Spanish-English. We build the first system (to our knowledge) to automatically identify a wide range of motivations that speakers code-switch in everyday speech, achieving an accuracy of 75% across all motivations. Additionally, we show that the system can be adapted to new language pairs, achieving 66% accuracy on a new language pair (Hindi-English), demonstrating the cross-lingual applicability of our annotation scheme
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Forming Trees with Treeformers
Authors:
Nilay Patel,
Jeffrey Flanigan
Abstract:
Human language is known to exhibit a nested, hierarchical structure, allowing us to form complex sentences out of smaller pieces. However, many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture -- that is, they don't have an inductive bias toward hierarchical structure. Additionally, Transformers are known to perform poorly on…
▽ More
Human language is known to exhibit a nested, hierarchical structure, allowing us to form complex sentences out of smaller pieces. However, many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture -- that is, they don't have an inductive bias toward hierarchical structure. Additionally, Transformers are known to perform poorly on compositional generalization tasks which require such structures. In this paper, we introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm which learns a composition operator and pooling function to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer and show significant improvements in compositional generalization as well as in downstream tasks such as machine translation, abstractive summarization, and various natural language understanding tasks.
△ Less
Submitted 10 July, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
DocAMR: Multi-Sentence AMR Representation and Evaluation
Authors:
Tahira Naseem,
Austin Blodgett,
Sadhana Kumaravel,
Tim O'Gorman,
Young-Suk Lee,
Jeffrey Flanigan,
Ramón Fernandez Astudillo,
Radu Florian,
Salim Roukos,
Nathan Schneider
Abstract:
Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation. Taking advantage of a super-sentential level of coreference annotation from previous work, we introduce a simple algorithm…
▽ More
Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation. Taking advantage of a super-sentential level of coreference annotation from previous work, we introduce a simple algorithm for deriving a unified graph representation, avoiding the pitfalls of information loss from over-merging and lack of coherence from under-merging. Next, we describe improvements to the Smatch metric to make it tractable for comparing document-level graphs, and use it to re-evaluate the best published document-level AMR parser. We also present a pipeline approach combining the top performing AMR parser and coreference resolution systems, providing a strong baseline for future research.
△ Less
Submitted 6 May, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
ASQ: Automatically Generating Question-Answer Pairs using AMRs
Authors:
Geetanjali Rakshit,
Jeffrey Flanigan
Abstract:
We introduce ASQ, a tool to automatically mine questions and answers from a sentence using the Abstract Meaning Representation (AMR). Previous work has used question-answer pairs to specify the predicate-argument structure of a sentence using natural language, which does not require linguistic expertise or training, and created datasets such as QA-SRL and QAMR, for which the question-answer pair a…
▽ More
We introduce ASQ, a tool to automatically mine questions and answers from a sentence using the Abstract Meaning Representation (AMR). Previous work has used question-answer pairs to specify the predicate-argument structure of a sentence using natural language, which does not require linguistic expertise or training, and created datasets such as QA-SRL and QAMR, for which the question-answer pair annotations were crowdsourced. Our goal is to build a tool (ASQ) that maps from the traditional meaning representation AMR to a question-answer meaning representation (QMR). This enables construction of QMR datasets automatically in various domains using existing high-quality AMR parsers, and provides an automatic map** AMR to QMR for ease of understanding by non-experts. A qualitative evaluation of the output generated by ASQ from the AMR 2.0 data shows that the question-answer pairs are natural and valid, and demonstrate good coverage of the content. We run ASQ on the sentences from the QAMR dataset, to observe that the semantic roles in QAMR are also captured by ASQ. We intend to make this tool and the results publicly available for others to use and build upon.
△ Less
Submitted 20 August, 2021; v1 submitted 20 May, 2021;
originally announced May 2021.
-
Athena: Constructing Dialogues Dynamically with Discourse Constraints
Authors:
Vrindavan Harrison,
Juraj Juraska,
Wen Cui,
Lena Reed,
Kevin K. Bowden,
Jiaqi Wu,
Brian Schwarzmann,
Abteen Ebrahimi,
Rishi Rajasekaran,
Nikhil Varghese,
Max Wechsler-Azen,
Steve Whittaker,
Jeffrey Flanigan,
Marilyn Walker
Abstract:
This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response gen…
▽ More
This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response generators. This allows Athena to procure responses from dynamic sources, such as knowledge graph traversals and feature-based on-the-fly response retrieval methods. After describing the dialogue system architecture, we perform an analysis of conversations that Athena participated in during the 2019 Alexa Prize Competition. We conclude with a report on several user studies we carried out to better understand how individual user characteristics affect system ratings.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures
Authors:
Sheshera Mysore,
Zach Jensen,
Edward Kim,
Kevin Huang,
Haw-Shiuan Chang,
Emma Strubell,
Jeffrey Flanigan,
Andrew McCallum,
Elsa Olivetti
Abstract:
Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as…
▽ More
Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.
△ Less
Submitted 13 July, 2019; v1 submitted 16 May, 2019;
originally announced May 2019.
-
The Green Bank North Celestial Cap Pulsar Survey. IV: Four New Timing Solutions
Authors:
R. J. Aloisi,
A. Cruz,
L. Daniels,
N. Meyers,
R. Roekle,
A. Schuett,
J. K. Swiggum,
M. E. DeCesar,
D. L. Kaplan,
R. S. Lynch,
K. Stovall,
Lina Levin,
A. M. Archibald,
S. Banaszak,
C. M. Biwer,
J. Boyles,
P. Chawla,
L. P. Dartez,
B. Cui,
D. F. Day,
A. J. Ford,
J. Flanigan,
E. Fonseca,
J. W. T. Hessels,
J. Hinojosa
, et al. (18 additional authors not shown)
Abstract:
We present timing solutions for four pulsars discovered in the Green Bank Northern Celestial Cap (GBNCC) survey. All four pulsars are isolated with spin periods between 0.26$\,$s and 1.84$\,$s. PSR J0038$-$2501 has a 0.26$\,$s period and a period derivative of ${7.6} \times {10}^{-19}\,{\rm s\,s}^{-1}$, which is unusually low for isolated pulsars with similar periods. This low period derivative ma…
▽ More
We present timing solutions for four pulsars discovered in the Green Bank Northern Celestial Cap (GBNCC) survey. All four pulsars are isolated with spin periods between 0.26$\,$s and 1.84$\,$s. PSR J0038$-$2501 has a 0.26$\,$s period and a period derivative of ${7.6} \times {10}^{-19}\,{\rm s\,s}^{-1}$, which is unusually low for isolated pulsars with similar periods. This low period derivative may be simply an extreme value for an isolated pulsar or it could indicate an unusual evolution path for PSR J0038$-$2501, such as a disrupted recycled pulsar (DRP) from a binary system or an orphaned central compact object (CCO). Correcting the observed spin-down rate for the Shklovskii effect suggests that this pulsar may have an unusually low space velocity, which is consistent with expectations for DRPs. There is no X-ray emission detected from PSR J0038$-$2501 in an archival swift observation, which suggests that it is not a young orphaned CCO. The high dispersion measure of PSR J1949+3426 suggests a distance of 12.3$\,$kpc. This distance indicates that PSR J1949+3426 is among the most distant 7% of Galactic field pulsars, and is one of the most luminous pulsars.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Toward Abstractive Summarization Using Semantic Representations
Authors:
Fei Liu,
Jeffrey Flanigan,
Sam Thomson,
Norman Sadeh,
Noah A. Smith
Abstract:
We present a novel abstractive summarization framework that draws on the recent development of a treebank for the Abstract Meaning Representation (AMR). In this framework, the source text is parsed to a set of AMR graphs, the graphs are transformed into a summary graph, and then text is generated from the summary graph. We focus on the graph-to-graph transformation that reduces the source semantic…
▽ More
We present a novel abstractive summarization framework that draws on the recent development of a treebank for the Abstract Meaning Representation (AMR). In this framework, the source text is parsed to a set of AMR graphs, the graphs are transformed into a summary graph, and then text is generated from the summary graph. We focus on the graph-to-graph transformation that reduces the source semantic graph into a summary graph, making use of an existing AMR parser and assuming the eventual availability of an AMR-to-text generator. The framework is data-driven, trainable, and not specifically designed for a particular domain. Experiments on gold-standard AMR annotations and system parses show promising results. Code is available at: https://github.com/summarization
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
The Green Bank North Celestial Cap Pulsar Survey III: 45 New Pulsar Timing Solutions
Authors:
Ryan S. Lynch,
Joseph K. Swiggum,
Vlad I. Kondratiev,
David L. Kaplan,
Kevin Stovall,
Emmanuel Fonseca,
Mallory S. E. Roberts,
Lina Levin,
Megan E. DeCesar,
Bingyi Cui,
S. Bradley Cenko,
Pradip Gatkine,
Anne M. Archibald,
Shawn Banaszak,
Christopher M. Biwer,
Jason Boyles,
Pragya Chawla,
Louis P. Dartez,
David Day,
Anthony J. Ford,
Joseph Flanigan,
Jason W. T. Hessels,
Jesus Hinojosa,
Fredrick A. Jenet,
Chen Karako-Argaman
, et al. (15 additional authors not shown)
Abstract:
We provide timing solutions for 45 radio pulsars discovered by the Robert C. Byrd Green Bank Telescope. These pulsars were found in the Green Bank North Celestial Cap pulsar survey, an all-GBT-sky survey being carried out at a frequency of 350 MHz. We include pulsar timing data from the Green Bank Telescope and Low Frequency Array. Our sample includes five fully recycled millisecond pulsars (MSPs,…
▽ More
We provide timing solutions for 45 radio pulsars discovered by the Robert C. Byrd Green Bank Telescope. These pulsars were found in the Green Bank North Celestial Cap pulsar survey, an all-GBT-sky survey being carried out at a frequency of 350 MHz. We include pulsar timing data from the Green Bank Telescope and Low Frequency Array. Our sample includes five fully recycled millisecond pulsars (MSPs, three of which are in a binary system), a new relativistic double neutron star system, an intermediate mass binary pulsar, a mode-changing pulsar, a 138-ms pulsar with a very low magnetic field, and several nulling pulsars. We have measured two post-Keplerian parameters and thus the masses of both objects in the double neutron star system. We also report a tentative companion mass measurement via Shapiro delay in a binary MSP. Two of the MSPs can be timed with high precision and have been included in pulsar timing arrays being used to search for low-frequency gravitational waves, while a third MSP is a member of the black widow class of binaries. Proper motion is measurable in five pulsars and we provide an estimate of their space velocity. We report on an optical counterpart to a new black widow system and provide constraints on the optical counterparts to other binary MSPs. We also present a preliminary analysis of nulling pulsars in our sample. These results demonstrate the scientific return of long timing campaigns on pulsars of all types.
△ Less
Submitted 13 May, 2018;
originally announced May 2018.
-
The Green Bank Northern Celestial Cap Pulsar Survey II: The Discovery and Timing of Ten Pulsars
Authors:
A. M. Kawash,
M. A. McLaughlin,
D. L. Kaplan,
M. E. DeCesar,
L. Levin,
D. R. Lorimer,
R. S. Lynch,
K. Stovall,
J. K. Swiggum,
E. Fonseca,
A. M. Archibald,
S. Banaszak,
C. M. Biwer,
J. Boyles,
B. Cui,
L. P. Dartez,
D. Day,
S. Ernst,
A. J. Ford,
J. Flanigan,
S. A. Heatherly,
J. W. T. Hessels,
J. Hinojosa,
F. A. Jenet,
C. Karako-Argaman
, et al. (19 additional authors not shown)
Abstract:
We present timing solutions for ten pulsars discovered in 350 MHz searches with the Green Bank Telescope. Nine of these were discovered in the Green Bank Northern Celestial Cap survey and one was discovered by students in the Pulsar Search Collaboratory program in analysis of drift-scan data. Following discovery and confirmation with the Green Bank Telescope, timing has yielded phase-connected sol…
▽ More
We present timing solutions for ten pulsars discovered in 350 MHz searches with the Green Bank Telescope. Nine of these were discovered in the Green Bank Northern Celestial Cap survey and one was discovered by students in the Pulsar Search Collaboratory program in analysis of drift-scan data. Following discovery and confirmation with the Green Bank Telescope, timing has yielded phase-connected solutions with high precision measurements of rotational and astrometric parameters. Eight of the pulsars are slow and isolated, including PSR J0930$-$2301, a pulsar with nulling fraction lower limit of $\sim$30\% and nulling timescale of seconds to minutes. This pulsar also shows evidence of mode changing. The remaining two pulsars have undergone recycling, accreting material from binary companions, resulting in higher spin frequencies. PSR J0557$-$2948 is an isolated, 44 \rm{ms} pulsar that has been partially recycled and is likely a former member of a binary system which was disrupted by a second supernova. The paucity of such so-called `disrupted binary pulsars' (DRPs) compared to double neutron star (DNS) binaries can be used to test current evolutionary scenarios, especially the kicks imparted on the neutron stars in the second supernova. There is some evidence that DRPs have larger space velocities, which could explain their small numbers. PSR J1806+2819 is a 15 \rm{ms} pulsar in a 44 day orbit with a low mass white dwarf companion. We did not detect the companion in archival optical data, indicating that it must be older than 1200 Myr.
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
Search for transient gravitational waves in coincidence with short duration radio transients during 2007-2013
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
others,
:,
B. P. Abbott,
R. Abbott,
T. D. Abbott,
M. R. Abernathy,
F. Acernese,
K. Ackley,
C. Adams,
T. Adams,
P. Addesso,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
B. Allen,
A. Allocca
, et al. (977 additional authors not shown)
Abstract:
We present an archival search for transient gravitational-wave bursts in coincidence with 27 single pulse triggers from Green Bank Telescope pulsar surveys, using the LIGO, Virgo and GEO interferometer network. We also discuss a check for gravitational-wave signals in coincidence with Parkes Fast Radio Bursts using similar methods. Data analyzed in these searches were collected between 2007 and 20…
▽ More
We present an archival search for transient gravitational-wave bursts in coincidence with 27 single pulse triggers from Green Bank Telescope pulsar surveys, using the LIGO, Virgo and GEO interferometer network. We also discuss a check for gravitational-wave signals in coincidence with Parkes Fast Radio Bursts using similar methods. Data analyzed in these searches were collected between 2007 and 2013. Possible sources of emission of both short-duration radio signals and transient gravitational-wave emission include starquakes on neutron stars, binary coalescence of neutron stars, and cosmic string cusps. While no evidence for gravitational-wave emission in coincidence with these radio transients was found, the current analysis serves as a prototype for similar future searches using more sensitive second-generation interferometers.
△ Less
Submitted 21 June, 2016; v1 submitted 5 May, 2016;
originally announced May 2016.
-
The Physics of the B Factories
Authors:
A. J. Bevan,
B. Golob,
Th. Mannel,
S. Prell,
B. D. Yabsley,
K. Abe,
H. Aihara,
F. Anulli,
N. Arnaud,
T. Aushev,
M. Beneke,
J. Beringer,
F. Bianchi,
I. I. Bigi,
M. Bona,
N. Brambilla,
J. B rodzicka,
P. Chang,
M. J. Charles,
C. H. Cheng,
H. -Y. Cheng,
R. Chistov,
P. Colangelo,
J. P. Coleman,
A. Drutskoy
, et al. (2009 additional authors not shown)
Abstract:
This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C.
Please note that version 3 on the archive is the auxiliary…
▽ More
This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C.
Please note that version 3 on the archive is the auxiliary version of the Physics of the B Factories book. This uses the notation alpha, beta, gamma for the angles of the Unitarity Triangle. The nominal version uses the notation phi_1, phi_2 and phi_3. Please cite this work as Eur. Phys. J. C74 (2014) 3026.
△ Less
Submitted 31 October, 2015; v1 submitted 24 June, 2014;
originally announced June 2014.
-
The Green Bank Northern Celestial Cap Pulsar Survey - I: Survey Description, Data Analysis, and Initial Results
Authors:
K. Stovall,
R. S. Lynch,
S. M. Ransom,
A. M. Archibald,
S. Banaszak,
C. M. Biwer,
J. Boyles,
L. P. Dartez,
D. Day,
A. J. Ford,
J. Flanigan,
A. Garcia,
J. W. T. Hessels,
J. Hinojosa,
F. A. Jenet,
D. L. Kaplan,
C. Karako-Argaman,
V. M. Kaspi,
V. I. Kondratiev,
S. Leake,
D. R. Lorimer,
G. Lunsford,
J. G. Martinez,
A. Mata,
M. A. McLaughlin
, et al. (7 additional authors not shown)
Abstract:
We describe an ongoing search for pulsars and dispersed pulses of radio emission, such as those from rotating radio transients (RRATs) and fast radio bursts (FRBs), at 350 MHz using the Green Bank Telescope. With the Green Bank Ultimate Pulsar Processing Instrument, we record 100 MHz of bandwidth divided into 4,096 channels every 81.92 $μs$. This survey will cover the entire sky visible to the Gre…
▽ More
We describe an ongoing search for pulsars and dispersed pulses of radio emission, such as those from rotating radio transients (RRATs) and fast radio bursts (FRBs), at 350 MHz using the Green Bank Telescope. With the Green Bank Ultimate Pulsar Processing Instrument, we record 100 MHz of bandwidth divided into 4,096 channels every 81.92 $μs$. This survey will cover the entire sky visible to the Green Bank Telescope ($δ> -40^\circ$, or 82% of the sky) and outside of the Galactic Plane will be sensitive enough to detect slow pulsars and low dispersion measure ($<$30 $\mathrm{pc\,cm^{-3}}$) millisecond pulsars (MSPs) with a 0.08 duty cycle down to 1.1 mJy. For pulsars with a spectral index of $-$1.6, we will be 2.5 times more sensitive than previous and ongoing surveys over much of our survey region. Here we describe the survey, the data analysis pipeline, initial discovery parameters for 62 pulsars, and timing solutions for 5 new pulsars. PSR J0214$+$5222 is an MSP in a long-period (512 days) orbit and has an optical counterpart identified in archival data. PSR J0636$+$5129 is an MSP in a very short-period (96 minutes) orbit with a very low mass companion (8 $M_\mathrm{J}$). PSR J0645$+$5158 is an isolated MSP with a timing residual RMS of 500 ns and has been added to pulsar timing array experiments. PSR J1434$+$7257 is an isolated, intermediate-period pulsar that has been partially recycled. PSR J1816$+$4510 is an eclipsing MSP in a short-period orbit (8.7 hours) and may have recently completed its spin-up phase.
△ Less
Submitted 19 June, 2014;
originally announced June 2014.
-
Searching for pulsars using image pattern recognition
Authors:
W. W. Zhu,
A. Berndsen,
E. C. Madsen,
M. Tan,
I. H. Stairs,
A. Brazier,
P. Lazarus,
R. Lynch,
P. Scholz,
K. Stovall,
S. M. Ransom,
S. Banaszak,
C. M. Biwer,
S. Cohen,
L. P. Dartez,
J. Flanigan,
G. Lunsford,
J. G. Martinez,
A. Mata,
M. Rohr,
A. Walker,
B. Allen,
N. D. R. Bhat,
S. Bogdanov,
F. Camilo
, et al. (19 additional authors not shown)
Abstract:
In this paper, we present a novel artificial intelligence (AI) program that identifies pulsars from recent surveys using image pattern recognition with deep neural nets---the PICS (Pulsar Image-based Classification System) AI. The AI mimics human experts and distinguishes pulsars from noise and interferences by looking for patterns from candidate. The information from each pulsar candidate is synt…
▽ More
In this paper, we present a novel artificial intelligence (AI) program that identifies pulsars from recent surveys using image pattern recognition with deep neural nets---the PICS (Pulsar Image-based Classification System) AI. The AI mimics human experts and distinguishes pulsars from noise and interferences by looking for patterns from candidate. The information from each pulsar candidate is synthesized in four diagnostic plots, which consist of up to thousands pixel of image data. The AI takes these data from each candidate as its input and uses thousands of such candidates to train its ~9000 neurons. Different from other pulsar selection programs which use pre-designed patterns, the PICS AI teaches itself the salient features of different pulsars from a set of human-labeled candidates through machine learning. The deep neural networks in this AI system grant it superior ability in recognizing various types of pulsars as well as their harmonic signals. The trained AI's performance has been validated with a large set of candidates different from the training set. In this completely independent test, PICS ranked 264 out of 277 pulsar-related candidates, including all 56 previously known pulsars, to the top 961 (1%) of 90008 test candidates, missing only 13 harmonics. The first non-pulsar candidate appears at rank 187, following 45 pulsars and 141 harmonics. In other words, 100% of the pulsars were ranked in the top 1% of all candidates, while 80% were ranked higher than any noise or interference. The performance of this system can be improved over time as more training data are accumulated. This AI system has been integrated into the PALFA survey pipeline and has discovered six new pulsars to date.
△ Less
Submitted 17 December, 2013; v1 submitted 3 September, 2013;
originally announced September 2013.
-
PEACE: Pulsar Evaluation Algorithm for Candidate Extraction -- A software package for post-analysis processing of pulsar survey candidates
Authors:
K. J. Lee,
K. Stovall,
F. A. Jenet,
J. Martinez,
L. P. Dartez,
A. Mata,
G. Lunsford,
S. Cohen,
C. . M. Biwer,
M. Rohr,
J. Flanigan,
A. Walker,
S. Banaszak,
B. Allen,
E. D. Barr,
N. D. R. Bhat,
S. Bogdanov,
A. Brazier,
F. Camilo,
D. J. Champion,
S. Chatterjee,
J. Cordes,
F. Crawford,
J. Deneva,
G. Desvignes
, et al. (19 additional authors not shown)
Abstract:
Modern radio pulsar surveys produce a large volume of prospective candidates, the majority of which are polluted by human-created radio frequency interference or other forms of noise. Typically, large numbers of candidates need to be visually inspected in order to determine if they are real pulsars. This process can be labor intensive. In this paper, we introduce an algorithm called PEACE (Pulsar…
▽ More
Modern radio pulsar surveys produce a large volume of prospective candidates, the majority of which are polluted by human-created radio frequency interference or other forms of noise. Typically, large numbers of candidates need to be visually inspected in order to determine if they are real pulsars. This process can be labor intensive. In this paper, we introduce an algorithm called PEACE (Pulsar Evaluation Algorithm for Candidate Extraction) which improves the efficiency of identifying pulsar signals. The algorithm ranks the candidates based on a score function. Unlike popular machine-learning based algorithms, no prior training data sets are required. This algorithm has been applied to data from several large-scale radio pulsar surveys. Using the human-based ranking results generated by students in the Arecibo Remote Command enter programme, the statistical performance of PEACE was evaluated. It was found that PEACE ranked 68% of the student-identified pulsars within the top 0.17% of sorted candidates, 95% within the top 0.34%, and 100% within the top 3.7%. This clearly demonstrates that PEACE significantly increases the pulsar identification rate by a factor of about 50 to 1000. To date, PEACE has been directly responsible for the discovery of 47 new pulsars, 5 of which are millisecond pulsars that may be useful for pulsar timing based gravitational-wave detection projects.
△ Less
Submitted 2 May, 2013;
originally announced May 2013.
-
The hunt for new pulsars with the Green Bank Telescope
Authors:
Ryan S. Lynch,
Anne M. Archibald,
Shawn Banaszak,
Alison Becker,
Aaron Berndsen,
Chris Biwer,
Jason Boyles,
Rogerio F. Cardoso,
Angus Cherry,
Louis P. Dartez,
David Day,
Courtney R. Epstein,
Joe Flanigan,
Anthony Ford,
Alejandro Garcia,
Jason W. T. Hessels,
Fredrick A. Jenet,
David L. Kaplan,
Chen Karako-Argaman,
Victoria M. Kaspi,
Vladislav I. Kondratiev,
Duncan R. Lorimer,
Grady Lunsford,
Jose Martinez,
Maura A. McLaughlin
, et al. (11 additional authors not shown)
Abstract:
The Green Bank Telescope (GBT) is the largest fully steerable radio telescope in the world and is one of our greatest tools for discovering and studying radio pulsars. Over the last decade, the GBT has successfully found over 100 new pulsars through large-area surveys. Here I discuss the two most recent---the GBT 350 MHz Drift-scan survey and the Green Bank North Celestial Cap survey. The primary…
▽ More
The Green Bank Telescope (GBT) is the largest fully steerable radio telescope in the world and is one of our greatest tools for discovering and studying radio pulsars. Over the last decade, the GBT has successfully found over 100 new pulsars through large-area surveys. Here I discuss the two most recent---the GBT 350 MHz Drift-scan survey and the Green Bank North Celestial Cap survey. The primary science goal of both surveys is to find interesting individual pulsars, including young pulsars, rotating radio transients, exotic binary systems, and especially bright millisecond pulsars (MSPs) suitable for inclusion in Pulsar Timing Arrays, which are trying to directly detect gravitational waves. These two surveys have combined to discover 85 pulsars to date, among which are 14 MSPs and many unique and fascinating systems. I present highlights from these surveys and discuss future plans. I also discuss recent results from targeted GBT pulsar searches of globular clusters and Fermi sources.
△ Less
Submitted 21 March, 2013;
originally announced March 2013.