Skip to main content

Showing 1–29 of 29 results for author: Szpektor, I

.
  1. arXiv:2407.06189  [pdf, other

    cs.CV cs.AI

    Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

    Authors: Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

    Abstract: The performance of Large Vision Language Models (LVLMs) is dependent on the size and quality of their training datasets. Existing video instruction tuning datasets lack diversity as they are derived by prompting large language models with video captions to generate question-answer pairs, and are therefore mostly descriptive. Meanwhile, many labeled video datasets with diverse labels and supervisio… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project page: https://orrzohar.github.io/projects/video-star/

  2. arXiv:2406.13632  [pdf, other

    cs.CL

    Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations

    Authors: Arie Cattan, Alon Jacovi, Alex Fabrikant, Jonathan Herzig, Roee Aharoni, Hannah Rashkin, Dror Marcus, Avinatan Hassidim, Yossi Matias, Idan Szpektor, Avi Caciularu

    Abstract: Despite recent advancements in Large Language Models (LLMs), their performance on tasks involving long contexts remains sub-optimal. In-Context Learning (ICL) with few-shot examples may be an appealing solution to enhance LLM performance in this scenario; However, naively adding ICL examples with long context introduces challenges, including substantial token overhead added for each few-shot examp… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2405.14655  [pdf, other

    cs.LG

    Multi-turn Reinforcement Learning from Preference Human Feedback

    Authors: Lior Shani, Aviv Rosenberg, Asaf Cassel, Oran Lang, Daniele Calandriello, Avital Zipori, Hila Noga, Orgad Keller, Bilal Piot, Idan Szpektor, Avinatan Hassidim, Yossi Matias, Rémi Munos

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models (LLMs) with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks. Existing methods work by emulating the preferences at the single decision (turn) level, limiting their capabilities in settings that require planning or multi-turn interactions to ach… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2405.10122  [pdf, other

    cs.CV

    Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks

    Authors: João Bordalo, Vasco Ramos, Rodrigo Valério, Diogo Glória-Silva, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes

    Abstract: Multistep instructions, such as recipes and how-to guides, greatly benefit from visual aids, such as a series of images that accompany the instruction steps. While Large Language Models (LLMs) have become adept at generating coherent textual steps, Large Vision/Language Models (LVLMs) are less capable of generating accompanying image sequences. The most challenging aspect is that each generated im… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  5. arXiv:2405.04682  [pdf, other

    cs.CV cs.AI cs.LG

    TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

    Authors: Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

    Abstract: Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., 'a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since t… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 21 pages, 12 figures, 8 tables

  6. arXiv:2404.09971  [pdf, other

    cs.CL

    Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs

    Authors: Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

    Abstract: Large language models (LLMs) are prone to hallucinations, which sparked a widespread effort to detect and prevent them. Recent work attempts to mitigate hallucinations by intervening in the model's generation, typically computing representative vectors of hallucinations vs. grounded generations, for steering the model's hidden states away from a hallucinatory state. However, common studies employ… ▽ More

    Submitted 11 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  7. arXiv:2403.06265  [pdf, other

    cs.CL cs.AI cs.LG

    Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance

    Authors: Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty

    Abstract: Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear. In this paper, we argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling where equal probability is assigned to all tokens. We also demonstrate the empirical importance of compression for downstream… ▽ More

    Submitted 22 June, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: EMNLP 2024, Findings

  8. arXiv:2401.01854  [pdf, other

    cs.CL cs.AI cs.LG

    Multilingual Instruction Tuning With Just a Pinch of Multilinguality

    Authors: Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan Szpektor, Reut Tsarfaty, Matan Eyal

    Abstract: As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-follo… ▽ More

    Submitted 21 May, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Findings of ACL 2024

  9. arXiv:2312.03766  [pdf, other

    cs.CL cs.CV

    Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

    Authors: Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

    Abstract: While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment. In this paper, we present a method to provide detailed textual and visual explanation of detected misalignments between text-image pairs. We leverage large language models and visual grounding models to automatically construct a training set that holds… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  10. arXiv:2311.10111  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VideoCon: Robust Video-Language Alignment via Contrast Captions

    Authors: Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover

    Abstract: Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions. Our work addresses this by identifying a broad spectrum of contrast misalignments, such as replacing entities, actions, and flip** event order, which alignment models should be robust against. To this end, we… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 22 pages, 19 Figures, 7 Tables

  11. arXiv:2306.00186  [pdf, other

    cs.CL

    Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

    Authors: Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan Szpektor

    Abstract: Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on textual entailment models to directly address this p… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: ACL 2023

  12. arXiv:2305.15026  [pdf, other

    cs.CV cs.AI

    Transferring Visual Attributes from Natural Language to Verified Image Generation

    Authors: Rodrigo Valerio, Joao Bordalo, Michal Yarom, Yonatan Bitton, Idan Szpektor, Joao Magalhaes

    Abstract: Text to image generation methods (T2I) are widely popular in generating art and other creative artifacts. While visual hallucinations can be a positive factor in scenarios where creativity is appreciated, such artifacts are poorly suited for cases where the generated image needs to be grounded in complex natural language without explicit visual elements. In this paper, we propose to strengthen the… ▽ More

    Submitted 29 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  13. arXiv:2305.11171  [pdf, other

    cs.CL

    TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

    Authors: Zorik Gekhman, Jonathan Herzig, Roee Aharoni, Chen Elkind, Idan Szpektor

    Abstract: Factual consistency evaluation is often conducted using Natural Language Inference (NLI) models, yet these models exhibit limited success in evaluating summaries. Previous work improved such models with synthetic training data. However, the data is typically based on perturbed human-written summaries, which often differ in their characteristics from real model-generated summaries and have limited… ▽ More

    Submitted 18 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted as a long paper in EMNLP 2023

  14. arXiv:2305.10400  [pdf, other

    cs.CL cs.CV

    What You See is What You Read? Improving Text-Image Alignment Evaluation

    Authors: Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor

    Abstract: Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study methods for automatic text-image alignment evaluation. We first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2023. Website: https://wysiwyr-itm.github.io/

  15. arXiv:2212.09682  [pdf, other

    cs.CL

    Multilingual Sequence-to-Sequence Models for Hebrew NLP

    Authors: Matan Eyal, Hila Noga, Roee Aharoni, Idan Szpektor, Reut Tsarfaty

    Abstract: Recent work attributes progress in NLP to large language models (LMs) with increased model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Additionally, previous work on pretrained Hebrew LMs focused on encoder-only models. While the encoder-only architecture is b… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  16. arXiv:2211.05655  [pdf, other

    cs.CL cs.AI cs.LG

    DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

    Authors: Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, Omri Abend

    Abstract: Question answering models commonly have access to two sources of "knowledge" during inference time: (1) parametric knowledge - the factual knowledge encoded in the model weights, and (2) contextual knowledge - external knowledge (e.g., a Wikipedia passage) given to the model to generate a grounded answer. Having these two sources of knowledge entangled together is a core issue for generative QA mo… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 12 pages, 2 figures

  17. arXiv:2209.05401  [pdf, other

    cs.CL cs.CV

    MaXM: Towards Multilingual Visual Question Answering

    Authors: Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut

    Abstract: Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require a considerable amount of resources. In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data gene… ▽ More

    Submitted 24 October, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: EMNLP 2023 (Findings). https://github.com/google-research-datasets/maxm

  18. arXiv:2208.02294  [pdf, other

    cs.CL cs.LG

    Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

    Authors: Deborah Cohen, Moonkyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier, Gal Elidan

    Abstract: Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversa… ▽ More

    Submitted 25 July, 2022; originally announced August 2022.

  19. arXiv:2206.14796  [pdf, other

    cs.CL cs.AI cs.LG

    On the Robustness of Dialogue History Representation in Conversational Question Answering: A Comprehensive Study and a New Prompt-based Method

    Authors: Zorik Gekhman, Nadav Oved, Orgad Keller, Idan Szpektor, Roi Reichart

    Abstract: Most works on modeling the conversation history in Conversational Question Answering (CQA) report a single main result on a common CQA benchmark. While existing models show impressive results on CQA leaderboards, it remains unclear whether they are robust to shifts in setting (sometimes to more realistic ones), training data size (e.g. from large to small sets) and domain. In this work, we design… ▽ More

    Submitted 28 December, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: Accepted for publication at TACL in December 2022. First two authors contributed equally to this work. Our code and data will be released at: https://github.com/zorikg/MarCQAp

  20. A Dataset for Sentence Retrieval for Open-Ended Dialogues

    Authors: Itay Harel, Hagai Taitelbaum, Idan Szpektor, Oren Kurland

    Abstract: We address the task of sentence retrieval for open-ended dialogues. The goal is to retrieve sentences from a document corpus that contain information useful for generating the next turn in a given dialogue. Prior work on dialogue-based retrieval focused on specific types of dialogues: either conversational QA or conversational search. To address a broader scope of this task where any type of dialo… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  21. arXiv:2205.01883  [pdf, other

    cs.CV cs.CL

    All You May Need for VQA are Image Captions

    Authors: Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut

    Abstract: Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. In this paper, we propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question generation. We show that the resultin… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022)

  22. arXiv:2204.04991  [pdf, other

    cs.CL

    TRUE: Re-evaluating Factual Consistency Evaluation

    Authors: Or Honovich, Roee Aharoni, Jonathan Herzig, Hagai Taitelbaum, Doron Kukliansy, Vered Cohen, Thomas Scialom, Idan Szpektor, Avinatan Hassidim, Yossi Matias

    Abstract: Grounded text generation systems often generate text that contains factual inconsistencies, hindering their real-world applicability. Automatic factual consistency evaluation may help alleviate this limitation by accelerating evaluation cycles, filtering inconsistent outputs and augmenting training data. While attracting increasing attention, such evaluation metrics are usually developed and evalu… ▽ More

    Submitted 3 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted as a long paper to NAACL 2022 main conference

  23. arXiv:2104.08202  [pdf, other

    cs.CL

    $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

    Authors: Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend

    Abstract: Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability. Inspired by recent work on evaluating factual consistency in abstractive summarization, we propose an automatic evaluation metric for factual consistency in knowledge-grounded dialogue using automatic… ▽ More

    Submitted 9 September, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021

  24. arXiv:2104.01940  [pdf, ps, other

    cs.CL

    What's the best place for an AI conference, Vancouver or ______: Why completing comparative questions is difficult

    Authors: Avishai Zagoury, Einat Minkov, Idan Szpektor, William W. Cohen

    Abstract: Although large neural language models (LMs) like BERT can be finetuned to yield state-of-the-art results on many NLP tasks, it is often unclear what these models actually learn. Here we study using such LMs to fill in entities in human-authored comparative questions, like ``Which country is older, India or ______?'' -- i.e., we study the ability of neural LMs to ask (not answer) reasonable questio… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: AAAI 2021; preprint

  25. arXiv:2010.02592  [pdf, other

    cs.CL cs.AI cs.LG

    Semantically Driven Sentence Fusion: Modeling and Evaluation

    Authors: Eyal Ben-David, Orgad Keller, Eric Malmi, Idan Szpektor, Roi Reichart

    Abstract: Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutio… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: This paper was accepted to Findings of EMNLP 2020

  26. arXiv:1905.09135  [pdf, other

    cs.CL

    A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy

    Authors: Genady Beryozkin, Yoel Drori, Oren Gilon, Tzvika Hartman, Idan Szpektor

    Abstract: We study a variant of domain adaptation for named-entity recognition where multiple, heterogeneously tagged training sets are available. Furthermore, the test tag-set is not identical to any individual training tag-set. Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of training tags. This setting occurs when various datasets are created… ▽ More

    Submitted 19 June, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: Accepted at ACL 2019

  27. arXiv:1903.07037  [pdf, other

    cs.CL

    Audio De-identification: A New Entity Recognition Task

    Authors: Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, Yossi Matias

    Abstract: Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written… ▽ More

    Submitted 5 May, 2019; v1 submitted 17 March, 2019; originally announced March 2019.

    Comments: Accepted to NAACL 2019 Industry Track

  28. arXiv:1902.10526  [pdf, other

    cs.CL

    DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion

    Authors: Mor Geva, Eric Malmi, Idan Szpektor, Jonathan Berant

    Abstract: Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models. In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion. We author a set of rules fo… ▽ More

    Submitted 18 March, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: NAACL 2019 (camera ready version)

  29. arXiv:1605.02945  [pdf, ps, other

    cs.CL cs.IR

    The Yahoo Query Treebank, V. 1.0

    Authors: Yuval Pinter, Roi Reichart, Idan Szpektor

    Abstract: A description and annotation guidelines for the Yahoo Webscope release of Query Treebank, Version 1.0, May 2016.

    Submitted 11 May, 2016; v1 submitted 10 May, 2016; originally announced May 2016.

    Comments: Co-released with the Webscope Dataset (L-28) and with Pinter et al., Syntactic Parsing of Web Queries with Question Intent, NAACL-HLT 2016