Skip to main content

Showing 1–28 of 28 results for author: Gildea, D

.
  1. arXiv:2211.03922  [pdf, other

    cs.CL

    Strictly Breadth-First AMR Parsing

    Authors: Chen Yu, Daniel Gildea

    Abstract: AMR parsing is the task that maps a sentence to an AMR semantic graph automatically. We focus on the breadth-first strategy of this task, which was proposed recently and achieved better performance than other strategies. However, current models under this strategy only \emph{encourage} the model to produce the AMR graph in breadth-first order, but \emph{cannot guarantee} this. To solve this proble… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  2. Hierarchical Context Tagging for Utterance Rewriting

    Authors: Lisa **, Linfeng Song, Lifeng **, Dong Yu, Daniel Gildea

    Abstract: Utterance rewriting aims to recover coreferences and omitted information from the latest turn of a multi-turn dialogue. Recently, methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. This is due to a tagger's smaller search space as it can only copy tokens from the dialogue context. However, these methods may suffer from lo… ▽ More

    Submitted 7 August, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Update terms for span index attention in Eq. 6 and add appendix. 10 pages, AAAI 2022

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 36(10) (2022) 10849-10857

  3. arXiv:2108.12304  [pdf, other

    cs.CL

    Latent Tree Decomposition Parsers for AMR-to-Text Generation

    Authors: Lisa **, Daniel Gildea

    Abstract: Graph encoders in AMR-to-text generation models often rely on neighborhood convolutions or global vertex attention. While these approaches apply to general graphs, AMRs may be amenable to encoders that target their tree-like structure. By clustering edges into a hierarchy, a tree decomposition summarizes graph structure. Our model encodes a derivation forest of tree decompositions and extracts an… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 9 pages

  4. arXiv:2108.12300  [pdf, other

    cs.CL

    Tree Decomposition Attention for AMR-to-Text Generation

    Authors: Lisa **, Daniel Gildea

    Abstract: Text generation from AMR requires map** a semantic graph to a string that it annotates. Transformer-based graph encoders, however, poorly capture vertex dependencies that may benefit sequence prediction. To impose order on an encoder, we locally constrain vertex self-attention using a graph's tree decomposition. Instead of forming a full query-key bipartite graph, we restrict attention to vertic… ▽ More

    Submitted 1 September, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 9 pages

  5. arXiv:2011.14460  [pdf, other

    cs.DS

    AWLCO: All-Window Length Co-Occurrence

    Authors: Joshua Sobel, Noah Bertram, Chen Ding, Fatemeh Nargesian, Daniel Gildea

    Abstract: Analyzing patterns in a sequence of events has applications in text analysis, computer programming, and genomics research. In this paper, we consider the all-window-length analysis model which analyzes a sequence of events with respect to windows of all lengths. We study the exact co-occurrence counting problem for the all-window-length analysis model. Our first algorithm is an offline algorithm t… ▽ More

    Submitted 29 November, 2020; originally announced November 2020.

    ACM Class: F.2.0; E.m

  6. arXiv:2006.04232  [pdf, other

    cs.CL cs.LO

    Tensors over Semirings for Latent-Variable Weighted Logic Programs

    Authors: Esma Balkir, Daniel Gildea, Shay Cohen

    Abstract: Semiring parsing is an elegant framework for describing parsers by using semiring weighted logic programs. In this paper we present a generalization of this concept: latent-variable semiring parsing. With our framework, any semiring weighted logic program can be latentified by transforming weights from scalar values of a semiring to rank-n arrays, or tensors, of semiring values, allowing the model… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

    Comments: Accepted to IWPT

  7. arXiv:2002.00037  [pdf, ps, other

    cs.CL

    Unsupervised Bilingual Lexicon Induction Across Writing Systems

    Authors: Parker Riley, Daniel Gildea

    Abstract: Recent embedding-based methods in unsupervised bilingual lexicon induction have shown good results, but generally have not leveraged orthographic (spelling) information, which can be helpful for pairs of related languages. This work augments a state-of-the-art method with orthographic features, and extends prior work in this space by proposing methods that can learn and utilize orthographic corres… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  8. arXiv:1912.01682  [pdf, ps, other

    cs.CL

    AMR-to-Text Generation with Cache Transition Systems

    Authors: Lisa **, Daniel Gildea

    Abstract: Text generation from AMR involves emitting sentences that reflect the meaning of their AMR annotations. Neural sequence-to-sequence models have successfully been used to decode strings from flattened graphs (e.g., using depth-first or random traversal). Such models often rely on attention-based decoders to map AMR node to English token sequences. Instead of linearizing AMR, we directly encode its… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

  9. arXiv:1911.04123  [pdf, other

    cs.CL

    Leveraging Dependency Forest for Neural Medical Relation Extraction

    Authors: Linfeng Song, Yue Zhang, Daniel Gildea, Mo Yu, Zhiguo Wang, **song Su

    Abstract: Medical relation extraction discovers relations between entity mentions in text, such as research articles. For this task, dependency syntax has been recognized as a crucial source of features. Yet in the medical domain, 1-best parse trees suffer from relatively low accuracies, diminishing their usefulness. We investigate a method to alleviate this problem by utilizing dependency forests. Forests… ▽ More

    Submitted 16 December, 2019; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: EMNLP 2020, with "correct" source-code address: http://github.com/freesunshine0316/dep-forest-re

  10. arXiv:1906.03940  [pdf, other

    cs.MM cs.CL

    Predicting TED Talk Ratings from Language and Prosody

    Authors: Md Iftekhar Tanveer, Md Kamrul Hassan, Daniel Gildea, M. Ehsan Hoque

    Abstract: We use the largest open repository of public speaking---TED Talks---to predict the ratings of the online viewers. Our dataset contains over 2200 TED Talk transcripts (includes over 200 thousand sentences), audio features and the associated meta information including about 5.5 Million ratings from spontaneous visitors of the website. We propose three neural network architectures and compare with st… ▽ More

    Submitted 20 May, 2019; originally announced June 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.08392

  11. arXiv:1905.10726  [pdf, other

    cs.CL

    SemBleu: A Robust Metric for AMR Parsing Evaluation

    Authors: Linfeng Song, Daniel Gildea

    Abstract: Evaluating AMR parsing accuracy involves comparing pairs of AMR graphs. The major evaluation metric, SMATCH (Cai and Knight, 2013), searches for one-to-one map**s between the nodes of two AMRs with a greedy hill-climbing algorithm, which leads to search errors. We propose SEMBLEU, a robust metric that extends BLEU (Papineni et al., 2002) to AMRs. It does not suffer from search errors and conside… ▽ More

    Submitted 30 May, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: ACL 2019 camera ready

  12. arXiv:1905.08392  [pdf, other

    cs.LG cs.CL stat.ML

    A Causality-Guided Prediction of the TED Talk Ratings from the Speech-Transcripts using Neural Networks

    Authors: Md Iftekhar Tanveer, Md Kamrul Hasan, Daniel Gildea, M. Ehsan Hoque

    Abstract: Automated prediction of public speaking performance enables novel systems for tutoring public speaking skills. We use the largest open repository---TED Talks---to predict the ratings provided by the online viewers. The dataset contains over 2200 talk transcripts and the associated meta information including over 5.5 million ratings from spontaneous visitors to the website. We carefully removed the… ▽ More

    Submitted 20 May, 2019; originally announced May 2019.

  13. Semantic Neural Machine Translation using AMR

    Authors: Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, **song Su

    Abstract: It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.

    Comments: Transaction of ACL 2019

    Journal ref: Transactions of the Association for Computational Linguistics, 7, pages19-31, 2019

  14. arXiv:1810.09609  [pdf, other

    cs.CL

    Neural Transition-based Syntactic Linearization

    Authors: Linfeng Song, Yue Zhang, Daniel Gildea

    Abstract: The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art performance. Recent work shows that a multi-layer LSTM language model outperforms competitive statistical syntactic linearization systems without using syntax.… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: INLG 2018

  15. arXiv:1809.02040  [pdf, other

    cs.CL cs.AI

    Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

    Authors: Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, Daniel Gildea

    Abstract: Multi-hop reading comprehension focuses on one type of factoid question, where a system needs to properly integrate multiple pieces of evidence to correctly answer a question. Previous work approximates global evidence with local coreference information, encoding coreference chains with DAG-styled GRU layers within a gated-attention reader. However, coreference is limited in providing information… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

  16. arXiv:1808.09101  [pdf, other

    cs.CL

    N-ary Relation Extraction using Graph State LSTM

    Authors: Linfeng Song, Yue Zhang, Zhiguo Wang, Daniel Gildea

    Abstract: Cross-sentence $n$-ary relation extraction detects relations among $n$ entities across multiple sentences. Typical methods formulate an input as a \textit{document graph}, integrating various intra-sentential and inter-sentential dependencies. The current state-of-the-art method splits the input graph into two DAGs, adopting a DAG-structured LSTM for each. Though being able to model rich linguisti… ▽ More

    Submitted 27 August, 2018; originally announced August 2018.

    Comments: EMNLP 18 camera ready

  17. arXiv:1805.02473  [pdf, other

    cs.CL

    A Graph-to-Sequence Model for AMR-to-Text Generation

    Authors: Linfeng Song, Yue Zhang, Zhiguo Wang, Daniel Gildea

    Abstract: The problem of AMR-to-text generation is to recover a text representing the same meaning as an input AMR graph. The current state-of-the-art method uses a sequence-to-sequence model, leveraging LSTM for encoding a linearized AMR structure. Although being able to model non-local semantic information, a sequence LSTM can lose information from the AMR graph structure, and thus faces challenges with l… ▽ More

    Submitted 27 August, 2018; v1 submitted 7 May, 2018; originally announced May 2018.

    Comments: ACL 2018 camera-ready, Proceedings of ACL 2018 with updated performance

  18. arXiv:1702.05053  [pdf, ps, other

    cs.CL

    Addressing the Data Sparsity Issue in Neural AMR Parsing

    Authors: Xiaochang Peng, Chuan Wang, Daniel Gildea, Nianwen Xue

    Abstract: Neural attention models have achieved great success in different NLP tasks. How- ever, they have not fulfilled their promise on the AMR parsing task due to the data sparsity issue. In this paper, we de- scribe a sequence-to-sequence model for AMR parsing and present different ways to tackle the data sparsity problem. We show that our methods achieve significant improvement over a baseline neural a… ▽ More

    Submitted 16 February, 2017; originally announced February 2017.

    Comments: Accepted by EACL-17

  19. arXiv:1702.00500  [pdf, other

    cs.CL

    AMR-to-text Generation with Synchronous Node Replacement Grammar

    Authors: Linfeng Song, Xiaochang Peng, Yue Zhang, Zhiguo Wang, Daniel Gildea

    Abstract: This paper addresses the task of AMR-to-text generation by leveraging synchronous node replacement grammar. During training, graph-to-string rules are learned using a heuristic extraction algorithm. At test time, a graph transducer is applied to collapse input AMRs and generate output sentences. Evaluated on SemEval-2016 Task 8, our method gives a BLEU score of 25.62, which is the best reported so… ▽ More

    Submitted 28 April, 2017; v1 submitted 1 February, 2017; originally announced February 2017.

    Comments: camera-ready version of ACL 2017

  20. arXiv:1609.07451  [pdf, other

    cs.CL

    AMR-to-text generation as a Traveling Salesman Problem

    Authors: Linfeng Song, Yue Zhang, Xiaochang Peng, Zhiguo Wang, Daniel Gildea

    Abstract: The task of AMR-to-text generation is to generate grammatical text that sustains the semantic meaning for a given AMR graph. We at- tack the task by first partitioning the AMR graph into smaller fragments, and then generating the translation for each fragment, before finally deciding the order by solving an asymmetric generalized traveling salesman problem (AGTSP). A Maximum Entropy classifier is… ▽ More

    Submitted 23 September, 2016; originally announced September 2016.

    Comments: accepted by EMNLP 2016

  21. arXiv:1607.06208  [pdf, ps, other

    cs.CL

    Exploring phrase-compositionality in skip-gram models

    Authors: Xiaochang Peng, Daniel Gildea

    Abstract: In this paper, we introduce a variation of the skip-gram model which jointly learns distributed word vector representations and their way of composing to form phrase embeddings. In particular, we propose a learning procedure that incorporates a phrase-compositionality function which can capture how we want to compose phrases vectors from their component word vectors. Our experiments show improveme… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.

  22. arXiv:1606.05409  [pdf, ps, other

    cs.CL

    Sense Embedding Learning for Word Sense Induction

    Authors: Linfeng Song, Zhiguo Wang, Haitao Mi, Daniel Gildea

    Abstract: Conventional word sense induction (WSI) methods usually represent each instance with discrete linguistic features or cooccurrence features, and train a model for each polysemous word individually. In this work, we propose to learn sense embeddings for the WSI task. In the training stage, our method induces several sense centroids (embedding) for each polysemous word. In the testing stage, our meth… ▽ More

    Submitted 22 June, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: 6 pages, no figures in *SEM 2016

  23. arXiv:1510.02823  [pdf, other

    cs.CL

    Human languages order information efficiently

    Authors: Daniel Gildea, T. Florian Jaeger

    Abstract: Most languages use the relative order between words to encode meaning relations. Languages differ, however, in what orders they use and how these orders are mapped onto different meanings. We test the hypothesis that, despite these differences, human languages might constitute different `solutions' to common pressures of language use. Using Monte Carlo simulations over data from five languages, we… ▽ More

    Submitted 9 October, 2015; originally announced October 2015.

  24. arXiv:1508.02142  [pdf, other

    cs.CL

    Feature-based Decipherment for Large Vocabulary Machine Translation

    Authors: Iftekhar Naim, Daniel Gildea

    Abstract: Orthographic similarities across languages provide a strong signal for probabilistic decipherment, especially for closely related language pairs. The existing decipherment models, however, are not well-suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computat… ▽ More

    Submitted 10 August, 2015; originally announced August 2015.

  25. arXiv:1504.08342  [pdf, ps, other

    cs.CL cs.FL

    Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

    Authors: Shay B. Cohen, Daniel Gildea

    Abstract: We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time $O(n^{ωd})$ where $M(m) = O(m^ω)$ is the running time for $m \times m$ matrix multiplication and $d$ is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that thi… ▽ More

    Submitted 8 March, 2016; v1 submitted 30 April, 2015; originally announced April 2015.

  26. arXiv:1504.03425  [pdf, ps, other

    cs.HC cs.AI cs.CL

    Automated Analysis and Prediction of Job Interview Performance

    Authors: Iftekhar Naim, M. Iftekhar Tanveer, Daniel Gildea, Mohammed, Hoque

    Abstract: We present a computational framework for automatically quantifying verbal and nonverbal behaviors in the context of job interviews. The proposed framework is trained by analyzing the videos of 138 interview sessions with 69 internship-seeking undergraduates at the Massachusetts Institute of Technology (MIT). Our automated analysis includes facial expressions (e.g., smiles, head gestures, facial tr… ▽ More

    Submitted 14 April, 2015; originally announced April 2015.

    Comments: 14 pages, 8 figures, 6 tables

  27. arXiv:1311.6421  [pdf, ps, other

    cs.FL cs.CL

    Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies

    Authors: Pierluigi Crescenzi, Daniel Gildea, Andrea Marino, Gianluca Rossi, Giorgio Satta

    Abstract: Synchronous Context-Free Grammars (SCFGs), also known as syntax-directed translation schemata, are unlike context-free grammars in that they do not have a binary normal form. In general, parsing with SCFGs takes space and time polynomial in the length of the input strings, but with the degree of the polynomial depending on the permutations of the SCFG rules. We consider linear parsing strategies,… ▽ More

    Submitted 25 November, 2013; originally announced November 2013.

  28. arXiv:1206.6427  [pdf

    cs.LG stat.ML

    Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients

    Authors: Iftekhar Naim, Daniel Gildea

    Abstract: The speed of convergence of the Expectation Maximization (EM) algorithm for Gaussian mixture model fitting is known to be dependent on the amount of overlap among the mixture components. In this paper, we study the impact of mixing coefficients on the convergence of EM. We show that when the mixture components exhibit some overlap, the convergence of EM becomes slower as the dynamic range among th… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)