Skip to main content

Showing 1–37 of 37 results for author: Nyberg, E

.
  1. arXiv:2404.02359  [pdf, ps, other

    cs.LG

    Attribution Regularization for Multimodal Paradigms

    Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Jonathan Francis, Eric Nyberg

    Abstract: Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information. Additionally, the influence of a single modality often dom… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  2. arXiv:2404.02353  [pdf, other

    cs.CV cs.AI cs.LG

    Semantic Augmentation in Images using Language

    Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Tanmay Girish Kulkarni, Jonathan Francis, Eric Nyberg

    Abstract: Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to tr… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  3. arXiv:2403.10534  [pdf, other

    cs.CV cs.AI

    VISREAS: Complex Visual Reasoning with Unanswerable Questions

    Authors: Syeda Nahida Akter, Sangwu Lee, Yingshan Chang, Yonatan Bisk, Eric Nyberg

    Abstract: Verifying a question's validity before answering is crucial in real-world applications, where users may provide imperfect instructions. In this scenario, an ideal model should address the discrepancies in the query and convey them to the users rather than generating the best possible answer. Addressing this requirement, we introduce a new compositional visual question-answering dataset, VISREAS, t… ▽ More

    Submitted 22 February, 2024; originally announced March 2024.

    Comments: 18 pages, 14 figures, 5 tables

  4. arXiv:2401.08025  [pdf, other

    cs.AI cs.CL cs.LG

    Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination

    Authors: Syeda Nahida Akter, Aman Madaan, Sangwu Lee, Yiming Yang, Eric Nyberg

    Abstract: The potential of Vision-Language Models (VLMs) often remains underutilized in handling complex text-based problems, particularly when these problems could benefit from visual representation. Resonating with humans' ability to solve complex text-based problems by (1) creating a visual diagram from the problem and (2) deducing what steps they need to take to solve it, we propose Self-Imagine. We lev… ▽ More

    Submitted 21 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 18 pages, 9 figures, 12 tables

  5. arXiv:2305.14577  [pdf, other

    cs.LG cs.CL

    Difference-Masking: Choosing What to Mask in Continued Pretraining

    Authors: Alex Wilf, Syeda Nahida Akter, Leena Mathur, Paul Pu Liang, Sheryl Mathew, Mengrou Shou, Eric Nyberg, Louis-Philippe Morency

    Abstract: The self-supervised objective of masking-and-predicting has led to promising performance gains on a variety of downstream tasks. However, while most approaches randomly mask tokens, there is strong intuition that deciding what to mask can substantially improve learning outcomes. We investigate this in continued pretraining setting in which pretrained models continue to pretrain on domain-specific… ▽ More

    Submitted 17 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  6. arXiv:2305.03130  [pdf, other

    cs.CL

    Chain-of-Skills: A Configurable Model for Open-domain Question Answering

    Authors: Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

    Abstract: The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transferability and scalability. In this work, we propose a modular retriever where individual modules correspond to key skills that c… ▽ More

    Submitted 26 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  7. arXiv:2304.13664  [pdf, other

    cs.CL cs.AI

    Using Implicit Feedback to Improve Question Generation

    Authors: Hugo Rodrigues, Eric Nyberg, Luisa Coheur

    Abstract: Question Generation (QG) is a task of Natural Language Processing (NLP) that aims at automatically generating questions from text. Many applications can benefit from automatically generated questions, but often it is necessary to curate those questions, either by selecting or editing them. This task is informative on its own, but it is typically done post-generation, and, thus, the effort is waste… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 27 pages, 8 figures

  8. arXiv:2301.02998  [pdf, other

    cs.IR cs.AI cs.CL

    InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

    Authors: Leonid Boytsov, Preksha Patel, Vivek Sourabh, Riddhi Nisar, Sayani Kundu, Ramya Ramanathan, Eric Nyberg

    Abstract: We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accura… ▽ More

    Submitted 20 February, 2024; v1 submitted 8 January, 2023; originally announced January 2023.

  9. arXiv:2212.11345  [pdf, other

    cs.RO cs.AI cs.CV

    Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation

    Authors: Gyan Tatiya, Jonathan Francis, Luca Bondi, Ingrid Navarro, Eric Nyberg, Jivko Sinapov, Jean Oh

    Abstract: Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding ob… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: 19 pages, 8 figures, 9 tables

  10. arXiv:2212.08729  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Distribution-aware Goal Prediction and Conformant Model-based Planning for Safe Autonomous Driving

    Authors: Jonathan Francis, Bingqing Chen, Weiran Yao, Eric Nyberg, Jean Oh

    Abstract: The feasibility of collecting a large amount of expert demonstrations has inspired growing research interests in learning-to-drive settings, where models learn by imitating the driving behaviour from experts. However, exclusively relying on imitation can limit agents' generalisability to novel scenarios that are outside the support of the training data. In this paper, we address this challenge by… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted: 1st Workshop on Safe Learning for Autonomous Driving, at the International Conference on Machine Learning (ICML 2022); Best Paper Award

  11. arXiv:2210.12338  [pdf, other

    cs.CL

    Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge

    Authors: Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

    Abstract: We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources. The key novelty of our method is the introduction of the intermediary modules into the current retriever-reader pipeline. Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a cha… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  12. arXiv:2208.12848  [pdf, other

    cs.CL

    Coalescing Global and Local Information for Procedural Text Understanding

    Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Eric Nyberg, Alessandro Oltramari

    Abstract: Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs. Prior methods considered a subset of these aspects, resulting in either low precision or low recall.… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

    Comments: COLING 2022

  13. arXiv:2207.01262  [pdf, other

    cs.IR cs.CL

    Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding

    Authors: Leonid Boytsov, David Akinpelu, Tianyi Lin, Fangwei Gao, Yutian Zhao, Jeffrey Huang, Nipun Katyal, Eric Nyberg

    Abstract: We evaluated 20+ Transformer models for ranking of long documents (including recent LongP models trained with FlashAttention) and compared them with a simple FirstP baseline, which applies the same model to the truncated input (at most 512 tokens). We used MS MARCO Documents v1 as a primary training set and evaluated both the zero-shot transferred and fine-tuned models. On MS MARCO, TREC DLs, an… ▽ More

    Submitted 16 June, 2024; v1 submitted 4 July, 2022; originally announced July 2022.

  14. arXiv:2205.09843  [pdf, other

    cs.CL

    Table Retrieval May Not Necessitate Table-specific Model Design

    Authors: Zhiruo Wang, Zhengbao Jiang, Eric Nyberg, Graham Neubig

    Abstract: Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as table-based question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compar… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: 11 pages total, 4 figures

  15. arXiv:2205.02953  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Learn-to-Race Challenge 2022: Benchmarking Safe Learning and Cross-domain Generalisation in Autonomous Racing

    Authors: Jonathan Francis, Bingqing Chen, Siddha Ganju, Sidharth Kathpal, Jyotish Poonganam, Ayush Shivani, Vrushank Vyas, Sahika Genc, Ivan Zhukov, Max Kumskoy, Anirudh Koul, Jean Oh, Eric Nyberg

    Abstract: We present the results of our autonomous racing virtual challenge, based on the newly-released Learn-to-Race (L2R) simulation framework, which seeks to encourage interdisciplinary research in autonomous driving and to help advance the state of the art on a realistic benchmark. Analogous to racing being used to test cutting-edge vehicles, we envision autonomous racing to serve as a particularly cha… ▽ More

    Submitted 10 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: 20 pages, 4 figures, 2 tables

  16. arXiv:2110.08417  [pdf, other

    cs.CL cs.AI

    Open Domain Question Answering with A Unified Knowledge Interface

    Authors: Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

    Abstract: The retriever-reader framework is popular for open-domain question answering (ODQA) due to its ability to use explicit knowledge. Although prior work has sought to increase the knowledge coverage by incorporating structured knowledge beyond text, accessing heterogeneous knowledge sources through a unified interface remains an open question. While data-to-text generation has the potential to serve… ▽ More

    Submitted 19 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ACL 2022 camera ready

  17. arXiv:2110.07699  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Safe Autonomous Racing via Approximate Reachability on Ego-vision

    Authors: Bingqing Chen, Jonathan Francis, Jean Oh, Eric Nyberg, Sylvia L. Herbert

    Abstract: Racing demands each vehicle to drive at its physical limits, when any safety infraction could lead to catastrophic failure. In this work, we study the problem of safe reinforcement learning (RL) for autonomous racing, using the vehicle's ego-camera view and speed as input. Given the nature of the task, autonomous agents need to be able to 1) identify and avoid unsafe scenarios under the complex ve… ▽ More

    Submitted 30 November, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 17 pages, 15 figures, 3 tables

  18. arXiv:2109.02837  [pdf, other

    cs.CL

    Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models

    Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Satoru Ozaki, Eric Nyberg, Alessandro Oltramari

    Abstract: Commonsense reasoning benchmarks have been largely solved by fine-tuning language models. The downside is that fine-tuning may cause models to overfit to task-specific data and thereby forget their knowledge gained during pre-training. Recent works only propose lightweight model updates as models may already possess useful knowledge from past experience, but a challenge remains in understanding wh… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  19. arXiv:2103.11575  [pdf, other

    cs.RO cs.CV cs.LG

    Learn-to-Race: A Multimodal Control Environment for Autonomous Racing

    Authors: James Herman, Jonathan Francis, Siddha Ganju, Bingqing Chen, Anirudh Koul, Abhinav Gupta, Alexey Skabelkin, Ivan Zhukov, Max Kumskoy, Eric Nyberg

    Abstract: Existing research on autonomous driving primarily focuses on urban driving, which is insufficient for characterising the complex driving behaviour underlying high-speed racing. At the same time, existing racing simulation frameworks struggle in capturing realism, with respect to visual rendering, vehicular dynamics, and task objectives, inhibiting the transfer of learning agents to real-world cont… ▽ More

    Submitted 18 August, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to the International Conference on Computer Vision (ICCV 2021); equal contribution - JH and JF; 15 pages, 4 figures

    Journal ref: International Conference on Computer Vision (ICCV), 2021

  20. arXiv:2012.10813  [pdf, other

    cs.CL

    Lexically-constrained Text Generation through Commonsense Knowledge Extraction and Injection

    Authors: Yikang Li, Pulkit Goel, Varsha Kuppur Rajendra, Har Simrat Singh, Jonathan Francis, Kaixin Ma, Eric Nyberg, Alessandro Oltramari

    Abstract: Conditional text generation has been a challenging task that is yet to see human-level performance from state-of-the-art models. In this work, we specifically focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts. Despite advances in other tasks, large pre-trained language models that are fine-tuned on this dataset often produce sen… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: AAAI-CSKG 2021

  21. arXiv:2011.03863  [pdf, other

    cs.CL cs.AI

    Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

    Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari

    Abstract: Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general re… ▽ More

    Submitted 14 December, 2020; v1 submitted 7 November, 2020; originally announced November 2020.

    Comments: AAAI 2021

  22. arXiv:2010.14848  [pdf, other

    cs.IR

    Flexible retrieval with NMSLIB and FlexNeuART

    Authors: Leonid Boytsov, Eric Nyberg

    Abstract: Our objective is to introduce to the NLP community an existing k-NN search library NMSLIB, a new retrieval toolkit FlexNeuART, as well as their integration capabilities. NMSLIB, while being one the fastest k-NN search libraries, is quite generic and supports a variety of distance/similarity functions. Because the library relies on the distance-based structure-agnostic algorithms, it can be further… ▽ More

    Submitted 17 November, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

    Journal ref: 2nd EMNLP Workshop for Natural Language Processing Open Source Software (NLP-OSS), 2020

  23. BARD: A structured technique for group elicitation of Bayesian networks to support analytic reasoning

    Authors: Ann E. Nicholson, Kevin B. Korb, Erik P. Nyberg, Michael Wybrow, Ingrid Zukerman, Steven Mascaro, Shreshth Thakur, Abraham Oshni Alvandi, Jeff Riley, Ross Pearson, Shane Morris, Matthieu Herrmann, A. K. M. Azad, Fergus Bolger, Ulrike Hahn, David Lagnado

    Abstract: In many complex, real-world situations, problem solving and decision making require effective reasoning about causation and uncertainty. However, human reasoning in these cases is prone to confusion and error. Bayesian networks (BNs) are an artificial intelligence technology that models uncertain situations, supporting probabilistic and causal reasoning and decision making. However, to date, BN me… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  24. arXiv:1910.14087  [pdf, other

    cs.CL

    Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering

    Authors: Kaixin Ma, Jonathan Francis, Quanyang Lu, Eric Nyberg, Alessandro Oltramari

    Abstract: Non-extractive commonsense QA remains a challenging AI task, as it requires systems to reason about, synthesize, and gather disparate pieces of information, in order to generate responses to queries. Recent approaches on such tasks show increased performance, only when models are either pre-trained with additional information or when domain-specific heuristics are used, without any special conside… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: EMNLP-COIN 2019

  25. Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

    Authors: Leonid Boytsov, Eric Nyberg

    Abstract: We focus on low-dimensional non-metric search, where tree-based approaches permit efficient and accurate retrieval while having short indexing time. These methods rely on space partitioning and require a pruning rule to avoid visiting unpromising parts. We consider two known data-driven approaches to extend these rules to non-metric spaces: TriGen and a piece-wise linear approximation of the pruni… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

  26. Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

    Authors: Leonid Boytsov, Eric Nyberg

    Abstract: We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space map** and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually i… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

  27. arXiv:1907.10136  [pdf, other

    cs.CL

    Dr.Quad at MEDIQA 2019: Towards Textual Inference and Question Entailment using contextualized representations

    Authors: Vinayshekhar Bannihatti Kumar, Ashwin Srinivasan, Aditi Chaudhary, James Route, Teruko Mitamura, Eric Nyberg

    Abstract: This paper presents the submissions by Team Dr.Quad to the ACL-BioNLP 2019 shared task on Textual Inference and Question Entailment in the Medical Domain. Our system is based on the prior work Liu et al. (2019) which uses a multi-task objective function for textual entailment. In this work, we explore different strategies for generalizing state-of-the-art language understanding models to the speci… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: Accepted in ACL challenge MediQA as part of the BioNLP workshop

  28. arXiv:1907.01643  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    Pentagon at MEDIQA 2019: Multi-task Learning for Filtering and Re-ranking Answers using Language Inference and Question Entailment

    Authors: Hemant Pugaliya, Karan Saxena, Shefali Garg, Sheetal Shalini, Prashant Gupta, Eric Nyberg, Teruko Mitamura

    Abstract: Parallel deep learning architectures like fine-tuned BERT and MT-DNN, have quickly become the state of the art, bypassing previous deep and shallow learning methods by a large margin. More recently, pre-trained models from large related datasets have been able to perform well on many downstream tasks by just fine-tuning on domain-specific datasets . However, using powerful models on non-trivial ta… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  29. arXiv:1806.06972  [pdf, other

    cs.CL cs.AI

    Comparative Analysis of Neural QA models on SQuAD

    Authors: Soumya Wadhwa, Khyathi Raghavi Chandu, Eric Nyberg

    Abstract: The task of Question Answering has gained prominence in the past few decades for testing the ability of machines to understand natural language. Large datasets for Machine Reading have led to the development of neural models that cater to deeper language understanding compared to information retrieval tasks. Different components in these neural architectures are intended to tackle different challe… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: Accepted at Workshop on Machine Reading for Question Answering (MRQA), ACL 2018

  30. arXiv:1805.03830  [pdf, other

    cs.CL cs.AI

    Towards Inference-Oriented Reading Comprehension: ParallelQA

    Authors: Soumya Wadhwa, Varsha Embar, Matthias Grabmair, Eric Nyberg

    Abstract: In this paper, we investigate the tendency of end-to-end neural Machine Reading Comprehension (MRC) models to match shallow patterns rather than perform inference-oriented reasoning on RC benchmarks. We aim to test the ability of these systems to answer questions which focus on referential inference. We propose ParallelQA, a strategy to formulate such questions using parallel passages. We also dem… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

    Comments: Accepted at Workshop on New Forms of Generalization in Deep Learning and Natural Language Processing, NAACL 2018

  31. arXiv:1711.05789  [pdf, other

    cs.CL cs.IR

    CMU LiveMedQA at TREC 2017 LiveQA: A Consumer Health Question Answering System

    Authors: Yuan Yang, **gcheng Yu, Ye Hu, Xiaoyao Xu, Eric Nyberg

    Abstract: In this paper, we present LiveMedQA, a question answering system that is optimized for consumer health question. On top of the general QA system pipeline, we introduce several new features that aim to exploit domain-specific knowledge and entity structures for better performance. This includes a question type/focus analyzer based on deep text classification model, a tree-based knowledge graph for… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: To appear in Proceedings of TREC 2017

  32. arXiv:1709.03010  [pdf, other

    cs.CL

    Steering Output Style and Topic in Neural Response Generation

    Authors: Di Wang, Nebojsa Jojic, Chris Brockett, Eric Nyberg

    Abstract: We propose simple and flexible training and decoding methods for influencing output style and topic in neural encoder-decoder based language generation. This capability is desirable in a variety of applications, including conversational systems, where successful agents need to produce language in a specific style and generate responses steered by a human puppeteer or external knowledge. We decompo… ▽ More

    Submitted 9 September, 2017; originally announced September 2017.

    Comments: EMNLP 2017 camera-ready version

  33. arXiv:1707.01176  [pdf, other

    cs.CL

    CharManteau: Character Embedding Models For Portmanteau Creation

    Authors: Varun Gangal, Harsh Jhamtani, Graham Neubig, Eduard Hovy, Eric Nyberg

    Abstract: Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsuperv… ▽ More

    Submitted 24 July, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

    Comments: Accepted for publication in EMNLP 2017

  34. arXiv:1707.01161  [pdf, other

    cs.CL

    Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence Models

    Authors: Harsh Jhamtani, Varun Gangal, Eduard Hovy, Eric Nyberg

    Abstract: Variations in writing styles are commonly used to adapt the content to a specific context, audience, or purpose. However, applying stylistic variations is still by and large a manual process, and there have been little efforts towards automating it. In this paper we explore automated methods to transform text from modern English to Shakespearean English using an end to end trainable neural model w… ▽ More

    Submitted 20 July, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

    Comments: Accepted at EMNLP 2017 Workshop on Stylistic Variation

  35. arXiv:1703.00572  [pdf, other

    cs.CL

    Structural Embedding of Syntactic Trees for Machine Comprehension

    Authors: Rui Liu, Junjie Hu, Wei Wei, Zi Yang, Eric Nyberg

    Abstract: Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we propose structural embedding of syntactic trees (SEST), an algorithm framework to utilize structured information and encode them into vector representations tha… ▽ More

    Submitted 31 August, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

  36. Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

    Authors: Leonid Boytsov, David Novak, Yury Malkov, Eric Nyberg

    Abstract: Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-forc… ▽ More

    Submitted 31 October, 2016; originally announced October 2016.

  37. arXiv:1506.03163  [pdf, other

    cs.LG cs.DB cs.DS

    Permutation Search Methods are Efficient, Yet Faster Search is Possible

    Authors: Bilegsaikhan Naidan, Leonid Boytsov, Eric Nyberg

    Abstract: We survey permutation-based methods for approximate k-nearest neighbor search. In these methods, every data point is represented by a ranked list of pivots sorted by the distance to this point. Such ranked lists are called permutations. The underpinning assumption is that, for both metric and non-metric spaces, the distance between permutations is a good proxy for the distance between original poi… ▽ More

    Submitted 31 October, 2016; v1 submitted 10 June, 2015; originally announced June 2015.