Skip to main content

Showing 1–44 of 44 results for author: Artzi, Y

.
  1. arXiv:2402.17793  [pdf, other

    cs.AI cs.CL cs.LG

    A Surprising Failure? Multimodal LLMs and the NLVR Challenge

    Authors: Anne Wu, Kianté Brantley, Yoav Artzi

    Abstract: This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR. Given a human-written sentence paired with a synthetic image, this task requires the model to determine the truth value of the sentence with respect to the image. Despite the strong performance demonstrated by these models,… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  2. arXiv:2310.03720  [pdf, other

    cs.LG

    SteP: Stacked LLM Policies for Web Actions

    Authors: Paloma Sodhi, S. R. K. Branavan, Yoav Artzi, Ryan McDonald

    Abstract: Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge,… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: 30 pages, 15 figures

  3. arXiv:2309.02691  [pdf, other

    cs.CL cs.CV

    A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

    Authors: Noriyuki Kojima, Hadar Averbuch-Elor, Yoav Artzi

    Abstract: Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and pr… ▽ More

    Submitted 30 May, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: This was published in TMLR in 2024, on January 24th

  4. arXiv:2307.10323  [pdf, other

    cs.IR cs.CL cs.LG

    IncDSI: Incrementally Updatable Document Retrieval

    Authors: Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, Kilian Q. Weinberger

    Abstract: Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  5. arXiv:2305.12473  [pdf, other

    cs.CL cs.AI cs.LG

    Continually Improving Extractive QA via Human Feedback

    Authors: Ge Gao, Hung-Ting Chen, Yoav Artzi, Eunsol Choi

    Abstract: We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Ou… ▽ More

    Submitted 3 November, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  6. arXiv:2305.06539  [pdf, other

    cs.CL

    Semantic uncertainty guides the extension of conventions to new referents

    Authors: Ron Eliav, Anya Ji, Yoav Artzi, Robert D. Hawkins

    Abstract: A long tradition of studies in psycholinguistics has examined the formation and generalization of ad hoc conventions in reference games, showing how newly acquired conventions for a given target transfer to new referential contexts. However, another axis of generalization remains understudied: how do conventions formed for one target transfer to completely distinct targets, when specific lexical c… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: Proceedings of the 45th Annual Conference of the Cognitive Science Society

  7. arXiv:2303.08127  [pdf, other

    cs.LG cs.AI cs.CL cs.MA

    CB2: Collaborative Natural Language Interaction Research Platform

    Authors: Jacob Sharf, Mustafa Omer Gul, Yoav Artzi

    Abstract: CB2 is a multi-agent platform to study collaborative natural language interaction in a grounded task-oriented scenario. It includes a 3D game environment, a backend server designed to serve trained models to human agents, and various tools and processes to enable scalable studies. We deploy CB2 at https://cb2.ai as a system demonstration with a learned instruction following model.

    Submitted 29 May, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: ACL 2023 Demo paper

  8. arXiv:2212.09710  [pdf, other

    cs.CL cs.AI cs.LG

    Continual Learning for Instruction Following from Realtime Feedback

    Authors: Alane Suhr, Yoav Artzi

    Abstract: We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to imm… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2023 Spotlight paper

  9. arXiv:2211.16492  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Abstract Visual Reasoning with Tangram Shapes

    Authors: Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi

    Abstract: We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to i… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022 long paper

  10. arXiv:2211.01994  [pdf, other

    cs.LG cs.AI cs.CL

    lilGym: Natural Language Visual Reasoning with Reinforcement Learning

    Authors: Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi

    Abstract: We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We introduce a new approach for exact reward computation in every possible world state by annotating all statements with executable Python programs. Each stat… ▽ More

    Submitted 29 May, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: ACL 2023 Long Paper

  11. arXiv:2205.01086  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

    Authors: Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi

    Abstract: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task -- transcribing audio inputs into pseudo subword sequences. This process stands on its own, or can be applied as low-cost second-stage pre-training… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: Code available at https://github.com/asappresearch/wav2seq

  12. arXiv:2203.10079  [pdf, other

    cs.CL

    Simulating Bandit Learning from User Feedback for Extractive Question Answering

    Authors: Ge Gao, Eunsol Choi, Yoav Artzi

    Abstract: We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-p… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  13. arXiv:2111.10367  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

    Authors: Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han

    Abstract: Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in higher-level spoken language understanding tasks, including using end-to-end models, but there are fewer annotated datasets for such tasks. At the same time, rece… ▽ More

    Submitted 29 July, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

    Comments: Updated preprint for SLUE Benchmark v0.2; Toolkit link https://github.com/asappresearch/slue-toolkit

  14. arXiv:2109.13449  [pdf, other

    cs.LG cs.CL cs.CV

    When in Doubt: Improving Classification Performance with Alternating Normalization

    Authors: Menglin Jia, Austin Reiter, Ser-Nam Lim, Yoav Artzi, Claire Cardie

    Abstract: We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification. CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution using the predicted class distributions of high-confidence validation examples. CAN is easily applicable to any probabilistic classifier, with minimal… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021

  15. arXiv:2109.06870  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

    Authors: Felix Wu, Kwangyoun Kim, **g Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi

    Abstract: This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improveme… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Code available at https://github.com/asappresearch/sew

  16. arXiv:2109.04452  [pdf, other

    cs.CL

    Analysis of Language Change in Collaborative Instruction Following

    Authors: Anna Effenberger, Eva Yan, Rhia Singh, Alane Suhr, Yoav Artzi

    Abstract: We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise. Prior work studied such scenarios mostly in the context of reference games, and consistently found that language complexity is reduced along multiple dimensions, such as utterance length, as conventions are formed. In contra… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021 Short Paper

  17. arXiv:2108.07253  [pdf, other

    cs.CV cs.CL cs.LG

    Who's Waldo? Linking People Across Text and Images

    Authors: Claire Yuqing Cui, Apoorv Khandelwal, Yoav Artzi, Noah Snavely, Hadar Averbuch-Elor

    Abstract: We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image. In contrast to prior work in visual grounding, which is predominantly object-based, our new task masks out the names of people in captions in order to encourage methods trained on such image-caption pairs to focus on contextual cues… ▽ More

    Submitted 17 August, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Published in ICCV 2021 (Oral). Project webpage: https://whoswaldo.github.io

  18. arXiv:2108.04812  [pdf, other

    cs.CL cs.AI cs.LG

    Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

    Authors: Noriyuki Kojima, Alane Suhr, Yoav Artzi

    Abstract: We study continual learning for natural language instruction generation, by observing human users' instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system's success communicating its intent. We sh… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: To appear in TACL 2021. The arXiv version is a pre-MIT Press publication version

  19. arXiv:2107.05612  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution

    Authors: Valts Blukis, Chris Paxton, Dieter Fox, Animesh Garg, Yoav Artzi

    Abstract: Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents. However, non-experts are likely to specify such tasks with high-level instructions, which abstract over specific robot actions through several layers of abstraction. We propose that key to bridging this gap between language and robot actions over long execution horizons are persistent re… ▽ More

    Submitted 28 November, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: Presented at CoRL 2021

  20. arXiv:2106.04163  [pdf

    physics.chem-ph cond-mat.supr-con

    Superconducting microresonators for electron spin resonance, the good, the bad, and the future

    Authors: Yaron Artzi, Yakir Yishay, Marco Fanciulli, Moamen Jbara, Aharon Blank

    Abstract: The field of electron spin resonance is in constant need to improve its capabilities. Among other things, this means having better resonators which would provide improved spin sensitivity, as well as enable larger microwave magnetic field power conversion factors. Surface micro resonators, made of small metallic patches on a dielectric substrate, provide very good absolute spin sensitivity and hig… ▽ More

    Submitted 27 August, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  21. arXiv:2011.07384  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Few-shot Object Grounding and Map** for Natural Language Robot Instruction Following

    Authors: Valts Blukis, Ross A. Knepper, Yoav Artzi

    Abstract: We study the problem of learning a robot policy to follow natural language instructions that can be easily extended to reason about new objects. We introduce a few-shot language-conditioned object grounding method trained from augmented reality data that uses exemplars to identify objects and align them to their mentions in instructions. We present a learned map representation that encodes object… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: 4th Conference on Robot Learning (CoRL 2020), Cambridge MA, USA

  22. arXiv:2006.05987  [pdf, other

    cs.CL cs.LG

    Revisiting Few-sample BERT Fine-tuning

    Authors: Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, Yoav Artzi

    Abstract: This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent pract… ▽ More

    Submitted 11 March, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Code available at https://github.com/asappresearch/revisit-bert-finetuning

  23. arXiv:2005.01678  [pdf, other

    cs.CL

    What is Learned in Visually Grounded Neural Syntax Acquisition

    Authors: Noriyuki Kojima, Hadar Averbuch-Elor, Alexander M. Rush, Yoav Artzi

    Abstract: Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified ver… ▽ More

    Submitted 18 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: In ACL 2020

  24. arXiv:2004.02709  [pdf, other

    cs.CL

    Evaluating Models' Local Decision Boundaries via Contrast Sets

    Authors: Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang , et al. (1 additional authors not shown)

    Abstract: Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systemati… ▽ More

    Submitted 1 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  25. arXiv:2001.03671  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

    Authors: Harsh Mehta, Yoav Artzi, Jason Baldridge, Eugene Ie, Piotr Mirowski

    Abstract: The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLea… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  26. arXiv:1911.03598  [pdf, other

    cs.CL cs.HC cs.IR cs.LG

    Interactive Classification by Asking Informative Questions

    Authors: Lili Yu, Howard Chen, Sida Wang, Tao Lei, Yoav Artzi

    Abstract: We study the potential for interaction in natural language classification. We add a limited form of interaction for intent classification, where users provide an initial query using natural language, and the system asks for additional information using binary or multi-choice questions. At each turn, our system decides between asking the most informative question or making the final classification… ▽ More

    Submitted 3 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted at ACL 2020

  27. arXiv:1910.09664  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight

    Authors: Valts Blukis, Yannick Terme, Eyvind Niklasson, Ross A. Knepper, Yoav Artzi

    Abstract: We propose a joint simulation and real-world learning framework for map** navigation instructions and raw first-person observations to continuous control. Our model estimates the need for environment exploration, predicts the likelihood of visiting environment positions during execution, and controls the agent to both explore and visit high-likelihood positions. We introduce Supervised Reinforce… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: Conference on Robot Learning (CoRL) 2019

  28. arXiv:1910.03655  [pdf, other

    cs.CL cs.AI cs.LG

    Executing Instructions in Situated Collaborative Interactions

    Authors: Alane Suhr, Claudia Yan, Charlotte Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi

    Abstract: We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it. This allows the user to adapt to the system abilities by changing their language or deciding to simply accomplish some tasks themselves, and requires the system to effectively recover from errors as the user strategically assigns it new goals. We build a game environment to stu… ▽ More

    Submitted 22 November, 2022; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: EMNLP 2019 long paper

  29. arXiv:1909.10411  [pdf, other

    cs.CL cs.CV

    NLVR2 Visual Bias Analysis

    Authors: Alane Suhr, Yoav Artzi

    Abstract: NLVR2 (Suhr et al., 2019) was designed to be robust for language bias through a data collection process that resulted in each natural language sentence appearing with both true and false labels. The process did not provide a similar measure of control for visual bias. This technical report analyzes the potential for visual bias in NLVR2. We show that some amount of visual bias likely exists. Final… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: Corresponding notebook available at http://lil.nlp.cornell.edu/nlvr/NLVR2BiasAnalysis.html

  30. arXiv:1904.09675  [pdf, other

    cs.CL

    BERTScore: Evaluating Text Generation with BERT

    Authors: Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi

    Abstract: We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning sys… ▽ More

    Submitted 24 February, 2020; v1 submitted 21 April, 2019; originally announced April 2019.

    Comments: Code available at https://github.com/Tiiiger/bert_score; To appear in ICLR2020

  31. arXiv:1811.12354  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

    Authors: Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi

    Abstract: We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a real-life visual urban environment, and then identify a location described in natural language to find a hidden object at the goal position. The data contains 9,326 examples of… ▽ More

    Submitted 16 May, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.00786

    Journal ref: Published in CVPR 2019

  32. arXiv:1811.08824  [pdf, other

    cs.CV cs.RO

    Early Fusion for Goal Directed Robotic Vision

    Authors: Aaron Walsman, Yonatan Bisk, Saadia Gabriel, Dipendra Misra, Yoav Artzi, Ye** Choi, Dieter Fox

    Abstract: Building perceptual systems for robotics which perform well under tight computational budgets requires novel architectures which rethink the traditional computer vision pipeline. Modern vision architectures require the agent to build a summary representation of the entire scene, even if most of the input is irrelevant to the agent's current goal. In this work, we flip this paradigm, by introducing… ▽ More

    Submitted 7 August, 2019; v1 submitted 21 November, 2018; originally announced November 2018.

  33. arXiv:1811.04179  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Map** Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction

    Authors: Valts Blukis, Dipendra Misra, Ross A. Knepper, Yoav Artzi

    Abstract: We propose an approach for map** natural language instructions and raw observations to continuous control of a quadcopter drone. Our model predicts interpretable position-visitation distributions indicating where the agent should go during execution and where it should stop, and uses the predicted distributions to select the actions to execute. This two-step model decomposition allows for simple… ▽ More

    Submitted 10 December, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: Appeared in Conference on Robot Learning 2018

    Journal ref: In Conference on Robot Learning (pp. 505-518) (2018)

  34. arXiv:1811.00491  [pdf, other

    cs.CL cs.CV

    A Corpus for Reasoning About Natural Language Grounded in Photographs

    Authors: Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi

    Abstract: We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually ri… ▽ More

    Submitted 21 July, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

    Comments: ACL 2019 Long Paper

  35. arXiv:1809.00786  [pdf, other

    cs.CL

    Map** Instructions to Actions in 3D Environments with Visual Goal Prediction

    Authors: Dipendra Misra, Andrew Bennett, Valts Blukis, Eyvind Niklasson, Max Shatkhin, Yoav Artzi

    Abstract: We propose to decompose instruction execution to goal prediction and action generation. We design a model that maps raw visual observations to goals using LINGUNET, a language-conditioned image generation network, and then generates the actions required to complete them. Our model is trained from demonstration only without external resources. To evaluate our approach, we introduce two benchmarks f… ▽ More

    Submitted 18 March, 2019; v1 submitted 3 September, 2018; originally announced September 2018.

    Comments: Accepted at EMNLP 2018

  36. arXiv:1806.00047  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning

    Authors: Valts Blukis, Nataly Brukhim, Andrew Bennett, Ross A. Knepper, Yoav Artzi

    Abstract: We introduce a method for following high-level navigation instructions by map** directly from images, instructions and pose estimates to continuous low-level velocity commands for real-time control. The Grounded Semantic Map** Network (GSMN) is a fully-differentiable neural network architecture that builds an explicit semantic map in the world reference frame by incorporating a pinhole camera… ▽ More

    Submitted 31 May, 2018; originally announced June 2018.

    Comments: To appear in Robotics: Science and Systems (RSS), 2018

  37. arXiv:1805.10209  [pdf, other

    cs.CL

    Situated Map** of Sequential Instructions to Actions with Single-step Reward Observation

    Authors: Alane Suhr, Yoav Artzi

    Abstract: We propose a learning approach for map** context-dependent sequential instructions to actions. We address the problem of discourse and state dependencies with an attention-based model that considers both the history of the interaction and the state of the world. To train from start and goal states without access to demonstrations, we propose SESTRA, a learning algorithm that takes advantage of s… ▽ More

    Submitted 8 June, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: ACL 2018 Long Paper

  38. arXiv:1804.11283  [pdf, other

    cs.CL

    Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

    Authors: Max Grusky, Mor Naaman, Yoav Artzi

    Abstract: We present NEWSROOM, a summarization dataset of 1.3 million articles and summaries written by authors and editors in newsrooms of 38 major news publications. Extracted from search and social media metadata between 1998 and 2017, these high-quality summaries demonstrate high diversity of summarization styles. In particular, the summaries combine abstractive and extractive strategies, borrowing word… ▽ More

    Submitted 17 May, 2020; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: Proceedings of NAACL-HLT 2018 (Long Paper)

  39. arXiv:1804.06868  [pdf, other

    cs.CL

    Learning to Map Context-Dependent Sentences to Executable Formal Queries

    Authors: Alane Suhr, Srinivasan Iyer, Yoav Artzi

    Abstract: We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that updates after each turn, and can copy sub-sequences of previously predicted queries during generation. Our approach combines implicit and explicit modeling of references between utterances. We evaluate o… ▽ More

    Submitted 25 April, 2018; v1 submitted 18 April, 2018; originally announced April 2018.

    Comments: NAACL-HLT 2018 Long Paper

  40. arXiv:1801.07357  [pdf, other

    cs.AI

    CHALET: Cornell House Agent Learning Environment

    Authors: Claudia Yan, Dipendra Misra, Andrew Bennnett, Aaron Walsman, Yonatan Bisk, Yoav Artzi

    Abstract: We present CHALET, a 3D house simulator with support for navigation and manipulation. CHALET includes 58 rooms and 10 house configuration, and allows to easily create new house and room layouts. CHALET supports a range of common household activities, including moving objects, toggling appliances, and placing objects inside closeable containers. The environment and actions available are designed to… ▽ More

    Submitted 16 September, 2019; v1 submitted 22 January, 2018; originally announced January 2018.

  41. arXiv:1710.00453  [pdf, other

    cs.CL

    Visual Reasoning with Natural Language

    Authors: Stephanie Zhou, Alane Suhr, Yoav Artzi

    Abstract: Natural language provides a widely accessible and expressive interface for robotic agents. To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world. Such reasoning over language and vision is an open problem that is receiving increasing attention. While existing data sets focus on visual diversity, they do not… ▽ More

    Submitted 1 October, 2017; originally announced October 2017.

    Comments: AAAI NCHRC 2017

  42. arXiv:1709.02755  [pdf, other

    cs.CL cs.NE

    Simple Recurrent Units for Highly Parallelizable Recurrence

    Authors: Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi

    Abstract: Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate tr… ▽ More

    Submitted 7 September, 2018; v1 submitted 8 September, 2017; originally announced September 2017.

    Comments: EMNLP

  43. arXiv:1704.08795  [pdf, other

    cs.CL

    Map** Instructions and Visual Observations to Actions with Reinforcement Learning

    Authors: Dipendra Misra, John Langford, Yoav Artzi

    Abstract: We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network ag… ▽ More

    Submitted 22 July, 2017; v1 submitted 27 April, 2017; originally announced April 2017.

    Comments: In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017

  44. arXiv:1311.3011  [pdf

    cs.CL

    Cornell SPF: Cornell Semantic Parsing Framework

    Authors: Yoav Artzi

    Abstract: The Cornell Semantic Parsing Framework (SPF) is a learning and inference framework for map** natural language to formal representation of its meaning.

    Submitted 8 October, 2016; v1 submitted 12 November, 2013; originally announced November 2013.