Skip to main content

Showing 1–32 of 32 results for author: Herzig, J

.
  1. arXiv:2406.13632  [pdf, other

    cs.CL

    Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations

    Authors: Arie Cattan, Alon Jacovi, Alex Fabrikant, Jonathan Herzig, Roee Aharoni, Hannah Rashkin, Dror Marcus, Avinatan Hassidim, Yossi Matias, Idan Szpektor, Avi Caciularu

    Abstract: Despite recent advancements in Large Language Models (LLMs), their performance on tasks involving long contexts remains sub-optimal. In-Context Learning (ICL) with few-shot examples may be an appealing solution to enhance LLM performance in this scenario; However, naively adding ICL examples with long context introduces challenges, including substantial token overhead added for each few-shot examp… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.03618  [pdf, other

    cs.CL

    TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

    Authors: Avi Caciularu, Alon Jacovi, Eyal Ben-David, Sasha Goldshtein, Tal Schuster, Jonathan Herzig, Gal Elidan, Amir Globerson

    Abstract: Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Website (https://tact-benchmark.github.io), Huggingface (https://huggingface.co/datasets/google/TACT)

  3. arXiv:2405.05904  [pdf, other

    cs.CL

    Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

    Authors: Zorik Gekhman, Gal Yona, Roee Aharoni, Matan Eyal, Amir Feder, Roi Reichart, Jonathan Herzig

    Abstract: When large language models are aligned via supervised fine-tuning, they may encounter new factual information that was not acquired through pre-training. It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of… ▽ More

    Submitted 13 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2404.09971  [pdf, other

    cs.CL

    Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs

    Authors: Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

    Abstract: Large language models (LLMs) are susceptible to hallucination, which sparked a widespread effort to detect and prevent them. Recent work attempts to mitigate hallucinations by intervening in the model's computation during generation, using different setups and heuristics. Those works lack separation between different hallucination causes. In this work, we first introduce an approach for constructi… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  5. arXiv:2402.09631  [pdf, other

    cs.LG cs.CL cs.CY

    Representation Surgery: Theory and Practice of Affine Steering

    Authors: Shashwat Singh, Shauli Ravfogel, Jonathan Herzig, Roee Aharoni, Ryan Cotterell, Ponnurangam Kumaraguru

    Abstract: Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model's representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations in a manner that reduces the probabi… ▽ More

    Submitted 5 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted in ICML 2024

  6. arXiv:2402.00559  [pdf, other

    cs.CL

    A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

    Authors: Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins, Roee Aharoni, Mor Geva

    Abstract: Prompting language models to provide step-by-step answers (e.g., "Chain-of-Thought") is the prominent approach for complex reasoning tasks, where more accurate reasoning chains typically improve downstream task performance. Recent literature discusses automatic methods to verify reasoning to evaluate and improve their correctness. However, no fine-grained step-level datasets are available to enabl… ▽ More

    Submitted 21 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  7. arXiv:2401.01854  [pdf, other

    cs.CL cs.AI cs.LG

    Multilingual Instruction Tuning With Just a Pinch of Multilinguality

    Authors: Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan Szpektor, Reut Tsarfaty, Matan Eyal

    Abstract: As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-follo… ▽ More

    Submitted 21 May, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Findings of ACL 2024

  8. arXiv:2310.10062  [pdf, other

    cs.CL cs.AI

    A Comprehensive Evaluation of Tool-Assisted Generation Strategies

    Authors: Alon Jacovi, Avi Caciularu, Jonathan Herzig, Roee Aharoni, Bernd Bohnet, Mor Geva

    Abstract: A growing area of research investigates augmenting language models with tools (e.g., search engines, calculators) to overcome their shortcomings (e.g., missing or incorrect knowledge, incorrect logical inferences). Various few-shot tool-usage strategies have been proposed. However, there is no systematic and fair comparison across different strategies, or between these strategies and strong baseli… ▽ More

    Submitted 28 December, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings

  9. arXiv:2309.05353  [pdf

    cs.HC eess.AS eess.SY

    Applied design thinking in urban air mobility: creating the airtaxi cabin design of the future from a user perspective

    Authors: F. Reimer, J. Herzig, L. Winkler, J. Biedermann, F. Meller, B. Nagel

    Abstract: In the course of develo** digital and future aviation cabin concepts at the German Aerospace Center, the exploration of user-centered and acceptance-enhancing methods plays a central role. The challenge here is to identify the flexible range of requirements of different user groups for a previously non-existent transport concept, to translate these into a concept and to generate a rapid evaluati… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 13 pages

  10. arXiv:2305.14332  [pdf, other

    cs.CL

    Evaluating and Modeling Attribution for Cross-Lingual Question Answering

    Authors: Benjamin Muller, John Wieting, Jonathan H. Clark, Tom Kwiatkowski, Sebastian Ruder, Livio Baldini Soares, Roee Aharoni, Jonathan Herzig, Xinyi Wang

    Abstract: Trustworthy answer content is abundant in many high-resource languages and is instantly accessible through question answering systems, yet this content can be hard to access for those that do not speak these languages. The leap forward in cross-lingual modeling quality offered by generative language models offers much promise, yet their raw generations often fall short in factuality. To improve tr… ▽ More

    Submitted 15 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Published as a long paper at EMNLP 2023

  11. arXiv:2305.11171  [pdf, other

    cs.CL

    TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

    Authors: Zorik Gekhman, Jonathan Herzig, Roee Aharoni, Chen Elkind, Idan Szpektor

    Abstract: Factual consistency evaluation is often conducted using Natural Language Inference (NLI) models, yet these models exhibit limited success in evaluating summaries. Previous work improved such models with synthetic training data. However, the data is typically based on perturbed human-written summaries, which often differ in their characteristics from real model-generated summaries and have limited… ▽ More

    Submitted 18 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted as a long paper in EMNLP 2023

  12. arXiv:2305.10400  [pdf, other

    cs.CL cs.CV

    What You See is What You Read? Improving Text-Image Alignment Evaluation

    Authors: Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor

    Abstract: Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study methods for automatic text-image alignment evaluation. We first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2023. Website: https://wysiwyr-itm.github.io/

  13. arXiv:2212.10622  [pdf, other

    cs.CL

    mFACE: Multilingual Summarization with Factual Consistency Evaluation

    Authors: Roee Aharoni, Shashi Narayan, Joshua Maynez, Jonathan Herzig, Elizabeth Clark, Mirella Lapata

    Abstract: Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. Several recent efforts attempt to address this by devising models that automatically det… ▽ More

    Submitted 5 January, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: 28 pages with links to released data

  14. arXiv:2212.08037  [pdf, other

    cs.CL

    Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

    Authors: Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster

    Abstract: Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of… ▽ More

    Submitted 10 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  15. arXiv:2205.12665  [pdf, other

    cs.CL

    QAMPARI: An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

    Authors: Samuel Joseph Amouyal, Tomer Wolfson, Ohad Rubin, Ori Yoran, Jonathan Herzig, Jonathan Berant

    Abstract: Existing benchmarks for open-domain question answering (ODQA) typically focus on questions whose answers can be extracted from a single paragraph. By contrast, many natural questions, such as "What players were drafted by the Brooklyn Nets?" have a list of answers. Answering such questions requires retrieving and reading from many passages, in a large corpus. We introduce QAMPARI, an ODQA benchmar… ▽ More

    Submitted 29 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

  16. arXiv:2205.12253  [pdf, other

    cs.CL

    Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

    Authors: Linlu Qiu, Peter Shaw, Panupong Pasupat, Tianze Shi, Jonathan Herzig, Emily Pitler, Fei Sha, Kristina Toutanova

    Abstract: Despite their strong performance on many tasks, pre-trained language models have been shown to struggle on out-of-distribution compositional generalization. Meanwhile, recent work has shown considerable improvements on many NLP tasks from model scaling. Can scaling up model size also improve compositional generalization in semantic parsing? We evaluate encoder-decoder models up to 11B parameters a… ▽ More

    Submitted 24 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  17. arXiv:2204.04991  [pdf, other

    cs.CL

    TRUE: Re-evaluating Factual Consistency Evaluation

    Authors: Or Honovich, Roee Aharoni, Jonathan Herzig, Hagai Taitelbaum, Doron Kukliansy, Vered Cohen, Thomas Scialom, Idan Szpektor, Avinatan Hassidim, Yossi Matias

    Abstract: Grounded text generation systems often generate text that contains factual inconsistencies, hindering their real-world applicability. Automatic factual consistency evaluation may help alleviate this limitation by accelerating evaluation cycles, filtering inconsistent outputs and augmenting training data. While attracting increasing attention, such evaluation metrics are usually developed and evalu… ▽ More

    Submitted 3 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted as a long paper to NAACL 2022 main conference

  18. arXiv:2112.08633  [pdf, other

    cs.CL cs.LG

    Learning To Retrieve Prompts for In-Context Learning

    Authors: Ohad Rubin, Jonathan Herzig, Jonathan Berant

    Abstract: In-context learning is a recent paradigm in natural language understanding, where a large pre-trained language model (LM) observes a test instance and a few training examples as its input, and directly decodes the output without any update to its parameters. However, performance has been shown to strongly depend on the selected training examples (termed prompt). In this work, we propose an efficie… ▽ More

    Submitted 8 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: NAACL-HLT 2022

  19. arXiv:2109.02575  [pdf, other

    cs.CL

    Finding needles in a haystack: Sampling Structurally-diverse Training Sets from Synthetic Data for Compositional Generalization

    Authors: Inbar Oren, Jonathan Herzig, Jonathan Berant

    Abstract: Modern semantic parsers suffer from two principal limitations. First, training requires expensive collection of utterance-program pairs. Second, semantic parsers fail to generalize at test time to new compositions/structures that have not been observed during training. Recent research has shown that automatic generation of synthetic utterance-program pairs can alleviate the first problem, but its… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  20. arXiv:2104.07478  [pdf, other

    cs.CL

    Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

    Authors: Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong Pasupat, Yuan Zhang

    Abstract: Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  21. arXiv:2103.12011  [pdf, other

    cs.CL

    Open Domain Question Answering over Tables via Dense Retrieval

    Authors: Jonathan Herzig, Thomas Müller, Syrine Krichene, Julian Martin Eisenschlos

    Abstract: Recent advances in open-domain QA have led to strong models based on dense retrieval, but only focused on retrieving textual passages. In this work, we tackle open-domain QA over tables for the first time, and show that retrieval can be improved by a retriever designed to handle tabular context. We present an effective pre-training procedure for our retriever and improve retrieval quality with min… ▽ More

    Submitted 9 June, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: NAACL 2021 camera ready

  22. arXiv:2010.05647  [pdf, other

    cs.CL

    Improving Compositional Generalization in Semantic Parsing

    Authors: Inbar Oren, Jonathan Herzig, Nitish Gupta, Matt Gardner, Jonathan Berant

    Abstract: Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional gener… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

  23. arXiv:2009.06040  [pdf, ps, other

    cs.CL

    Span-based Semantic Parsing for Compositional Generalization

    Authors: Jonathan Herzig, Jonathan Berant

    Abstract: Despite the success of sequence-to-sequence (seq2seq) models in semantic parsing, recent work has shown that they fail in compositional generalization, i.e., the ability to generalize to new structures built of components observed during training. In this work, we posit that a span-based parser should lead to better compositional generalization. we propose SpanBasedSP, a parser that predicts a spa… ▽ More

    Submitted 13 June, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

    Comments: ACL 2021 camera ready

  24. arXiv:2004.02349  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    TAPAS: Weakly Supervised Table Parsing via Pre-training

    Authors: Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Martin Eisenschlos

    Abstract: Answering natural language questions over tables is usually seen as a semantic parsing task. To alleviate the collection cost of full logical forms, one popular approach focuses on weak supervision consisting of denotations instead of logical forms. However, training semantic parsers from weak supervision poses difficulties, and in addition, the generated logical forms are only used as an intermed… ▽ More

    Submitted 21 April, 2020; v1 submitted 5 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020

  25. arXiv:1908.11152  [pdf, other

    cs.CL

    A Summarization System for Scientific Documents

    Authors: Shai Erera, Michal Shmueli-Scheuer, Guy Feigenblat, Ora Peled Nakash, Odellia Boni, Haggai Roitman, Doron Cohen, Bar Weiner, Yosi Mass, Or Rivlin, Guy Lev, Achiya Jerbi, Jonathan Herzig, Yufang Hou, Charles Jochim, Martin Gleize, Francesca Bonin, David Konopnicki

    Abstract: We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosin… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: Accepted to EMNLP 2019

  26. arXiv:1908.09940  [pdf, other

    cs.CL cs.AI

    Don't paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing

    Authors: Jonathan Herzig, Jonathan Berant

    Abstract: A major hurdle on the road to conversational interfaces is the difficulty in collecting data that maps language utterances to logical forms. One prominent approach for data collection has been to automatically generate pseudo-language paired with logical forms, and paraphrase the pseudo-language to natural language through crowdsourcing (Wang et al., 2015). However, this data collection procedure… ▽ More

    Submitted 28 August, 2019; v1 submitted 26 August, 2019; originally announced August 2019.

    Comments: EMNLP-IJCNLP 2019

  27. arXiv:1906.01351  [pdf, ps, other

    cs.CL

    TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks

    Authors: Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David Konopnicki

    Abstract: Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summari… ▽ More

    Submitted 13 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019

  28. arXiv:1811.01090  [pdf, other

    cs.CL

    Value-based Search in Execution Space for Map** Instructions to Programs

    Authors: Dor Muhlgay, Jonathan Herzig, Jonathan Berant

    Abstract: Training models to map natural language instructions to programs given target world supervision only requires searching for good programs at training time. Search is commonly done using beam search in the space of partial programs or program trees, but as the length of the instructions grows finding a good program becomes difficult. In this work, we propose a search algorithm that uses the target… ▽ More

    Submitted 19 March, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

  29. arXiv:1811.00937  [pdf, other

    cs.CL cs.AI cs.LG

    CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

    Authors: Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant

    Abstract: When answering a question, people often draw upon their rich world knowledge in addition to the particular context. Recent work has focused primarily on answering questions given some relevant document or context, and required very little general background. To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answerin… ▽ More

    Submitted 15 March, 2019; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: accepted as a long paper at NAACL 2019

  30. arXiv:1804.07918  [pdf, other

    cs.CL cs.AI

    Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing

    Authors: Jonathan Herzig, Jonathan Berant

    Abstract: Building a semantic parser quickly in a new domain is a fundamental challenge for conversational interfaces, as current semantic parsers require expensive supervision and lack the ability to generalize to new domains. In this paper, we introduce a zero-shot approach to semantic parsing that can parse utterances in unseen domains while only being trained on examples in other source domains. First,… ▽ More

    Submitted 22 September, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

    Comments: EMNLP 2018

  31. arXiv:1711.05780  [pdf, ps, other

    cs.CL

    Detecting Egregious Conversations between Customers and Virtual Agents

    Authors: Tommy Sandbank, Michal Shmueli-Scheuer, Jonathan Herzig, David Konopnicki, John Richards, David Piorkowski

    Abstract: Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this pa… ▽ More

    Submitted 16 April, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: NAACL 2018

  32. Neural Semantic Parsing over Multiple Knowledge-bases

    Authors: Jonathan Herzig, Jonathan Berant

    Abstract: A fundamental challenge in develo** semantic parsers is the paucity of strong supervision in the form of language utterances annotated with logical form. In this paper, we propose to exploit structural regularities in language in different domains, and train semantic parsers over multiple knowledge-bases (KBs), while sharing information across datasets. We find that we can substantially improve… ▽ More

    Submitted 24 April, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

    Comments: Accepted to ACL 2017