Skip to main content

Showing 1–25 of 25 results for author: Zellers, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  2. arXiv:2212.14578  [pdf, other

    cs.LG cs.AI cs.CL

    MAUVE Scores for Generative Models: Theory and Practice

    Authors: Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Ye** Choi, Zaid Harchaoui

    Abstract: Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and develo** better ones. We present MAUVE, a family of comparison measures between pairs of distributions s… ▽ More

    Submitted 7 December, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

    Comments: Published in Journal of Machine Learning Research

  3. arXiv:2209.06293  [pdf, other

    cs.CL cs.CV

    Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest

    Authors: Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Ye** Choi

    Abstract: Large neural networks can now generate jokes, but do they really "understand" humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of "understanding" a cartoon; key elements are th… ▽ More

    Submitted 6 July, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Journal ref: ACL 2023

  4. arXiv:2206.08916  [pdf, other

    cs.CV

    Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

    Authors: Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

    Abstract: We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. Develo** a single unified model for suc… ▽ More

    Submitted 4 October, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

  5. arXiv:2205.12630  [pdf, other

    cs.CL cs.CV

    Multimodal Knowledge Alignment with Reinforcement Learning

    Authors: Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, JaeSung Park, Ximing Lu, Prithviraj Ammanabrolu, Rowan Zellers, Ronan Le Bras, Gunhee Kim, Ye** Choi

    Abstract: Large language models readily adapt to novel settings, even without task-specific training data. Can their zero-shot capacity be extended to multimodal inputs? In this work, we propose ESPER which extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning. Our key novelty is to use reinforcement learning to align multimodal inputs to language model generatio… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    ACM Class: I.2.7; I.4.9

  6. arXiv:2202.04800  [pdf, other

    cs.CV cs.CL

    The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

    Authors: Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Ye** Choi

    Abstract: Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might as… ▽ More

    Submitted 25 July, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: code, data, models at http://visualabduction.com/

    Journal ref: ECCV 2022

  7. arXiv:2201.02639  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

    Authors: Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Ye** Choi

    Abstract: As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve, a model that represents videos jointly over time -- through a new training objective that learns from audio, subtitles, and video frames. Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet. Ou… ▽ More

    Submitted 13 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: CVPR 2022. Project page at https://rowanzellers.com/merlotreserve

  8. arXiv:2112.08995  [pdf, other

    cs.SD cs.CL cs.CV eess.AS

    Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

    Authors: Yanpeng Zhao, Jack Hessel, Youngjae Yu, Ximing Lu, Rowan Zellers, Ye** Choi

    Abstract: Machines that can represent and describe environmental soundscapes have practical potential, e.g., for audio tagging and captioning systems. Prevailing learning paradigms have been relying on parallel audio-text data, which is, however, scarcely available on the web. We propose VIP-ANT that induces \textbf{A}udio-\textbf{T}ext alignment without using any parallel audio-text data. Our key idea is t… ▽ More

    Submitted 2 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to NAACL 2022. Our code is available at https://github.com/zhaoyanpeng/vipant

  9. arXiv:2112.08726  [pdf, other

    cs.CL

    NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

    Authors: Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Ye** Choi

    Abstract: The dominant paradigm for neural text generation is left-to-right decoding from autoregressive language models. Constrained or controllable generation under complex lexical constraints, however, requires foresight to plan ahead feasible future paths. Drawing inspiration from the A* search algorithm, we propose NeuroLogic A*esque, a decoding algorithm that incorporates heuristic estimates of futu… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  10. arXiv:2106.02636  [pdf, other

    cs.CV cs.CL cs.LG

    MERLOT: Multimodal Neural Script Knowledge Models

    Authors: Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Ye** Choi

    Abstract: As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech -- in an entirely label-free, self-supervised manner. By pretraining with a mix of both frame-level (s… ▽ More

    Submitted 21 October, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: project page at https://rowanzellers.com/merlot; NeurIPS 2021 camera ready

  11. arXiv:2106.00188  [pdf, other

    cs.CL cs.AI

    PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

    Authors: Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Ye** Choi

    Abstract: We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language mode… ▽ More

    Submitted 30 January, 2022; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: ACL 2021 camera ready, project page at https://rowanzellers.com/piglet/

  12. arXiv:2102.01454  [pdf, other

    cs.CL

    MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

    Authors: Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Ye** Choi, Zaid Harchaoui

    Abstract: As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern… ▽ More

    Submitted 23 November, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 (Oral Presentation). Package: https://github.com/krishnap25/mauve

  13. arXiv:2012.04726  [pdf, other

    cs.CL cs.CV

    Edited Media Understanding: Reasoning About Implications of Manipulated Images

    Authors: Jeff Da, Maxwell Forbes, Rowan Zellers, Anthony Zheng, Jena D. Hwang, Antoine Bosselut, Ye** Choi

    Abstract: Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems.… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  14. arXiv:2010.12884  [pdf, other

    cs.CL

    NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

    Authors: Ximing Lu, Peter West, Rowan Zellers, Ronan Le Bras, Chandra Bhagavatula, Ye** Choi

    Abstract: Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large a… ▽ More

    Submitted 20 April, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: NAACL 2021

  15. arXiv:2005.00619  [pdf, other

    cs.CL cs.CV

    Probing Contextual Language Models for Common Ground with Visual Representations

    Authors: Gabriel Ilharco, Rowan Zellers, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: The success of large-scale contextual language models has attracted great interest in probing what is encoded in their representations. In this work, we consider a new question: to what extent contextual representations of concrete nouns are aligned with corresponding visual representations? We design a probing model that evaluates how effective are text-only representations in distinguishing betw… ▽ More

    Submitted 13 April, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: Proceedings of the 2021 North American Chapter of the Association for Computational Linguistics (NAACL 2021)

  16. arXiv:2004.03607  [pdf, other

    cs.CL

    TuringAdvice: A Generative and Dynamic Evaluation of Language Use

    Authors: Rowan Zellers, Ari Holtzman, Elizabeth Clark, Lianhui Qin, Ali Farhadi, Ye** Choi

    Abstract: We propose TuringAdvice, a new challenge task and dataset for language understanding models. Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language. Our evaluation framework tests a fundamental aspect of human language understanding: our ability to use language to resolve open-ended situations by communicating with each other. E… ▽ More

    Submitted 12 April, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: NAACL 2021 camera ready. Project page at https://rowanzellers.com/advice

  17. arXiv:2002.04108  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Adversarial Filters of Dataset Biases

    Authors: Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Ye** Choi

    Abstract: Large neural models have demonstrated human-level performance on language and vision benchmarks, while their performance degrades considerably on adversarial or out-of-distribution samples. This raises the question of whether these models have learned to solve a dataset rather than the underlying task by overfitting to spurious dataset biases. We investigate one recently proposed approach, AFLite,… ▽ More

    Submitted 10 July, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2020

  18. arXiv:1911.11641  [pdf, other

    cs.CL cs.AI cs.LG

    PIQA: Reasoning about Physical Commonsense in Natural Language

    Authors: Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, Ye** Choi

    Abstract: To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems. While recent pretrained models (such as BERT) have made progress on question answering over more abstract domains - such as news articles and encyclopedia entries, where text is plentiful - in more p… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: AAAI 2020

  19. arXiv:1905.12616  [pdf, other

    cs.CL cs.CY

    Defending Against Neural Fake News

    Authors: Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Ye** Choi

    Abstract: Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news. Modern computer security relies on careful threat modeling: identifying potential threats and vulnerabi… ▽ More

    Submitted 11 December, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019 camera ready version. Project page/code/demo at https://rowanzellers.com/grover

  20. arXiv:1905.07830  [pdf, other

    cs.CL

    HellaSwag: Can a Machine Really Finish Your Sentence?

    Authors: Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Ye** Choi

    Abstract: Recent work by Zellers et al. (2018) introduced a new task of commonsense natural language inference: given an event description such as "A woman sits at a piano," a machine must select the most likely followup: "She sets her fingers on the keys." With the introduction of BERT, near human-level performance was reached. Does this mean that machines can perform human level commonsense inference? I… ▽ More

    Submitted 19 May, 2019; originally announced May 2019.

    Comments: ACL 2019. Project page at https://rowanzellers.com/hellaswag

  21. arXiv:1811.10830  [pdf, other

    cs.CV cs.CL

    From Recognition to Cognition: Visual Commonsense Reasoning

    Authors: Rowan Zellers, Yonatan Bisk, Ali Farhadi, Ye** Choi

    Abstract: Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize thi… ▽ More

    Submitted 26 March, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: CVPR 2019 oral. Project page at https://visualcommonsense.com

  22. arXiv:1808.05326  [pdf, other

    cs.CL

    SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

    Authors: Rowan Zellers, Yonatan Bisk, Roy Schwartz, Ye** Choi

    Abstract: Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate what might come next ("then, she examined the engine"). In this paper, we introduce the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning. We present SWAG, a new dataset with 113k multiple choice questions about a rich spectru… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  23. arXiv:1711.06640  [pdf, other

    cs.CV

    Neural Motifs: Scene Graph Parsing with Global Context

    Authors: Rowan Zellers, Mark Yatskar, Sam Thomson, Ye** Choi

    Abstract: We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that object labels are highly predictive of relation labels but not vice-versa. We also find that there ar… ▽ More

    Submitted 29 March, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

    Comments: CVPR 2018 camera ready

  24. arXiv:1707.09468  [pdf, ps, other

    cs.CL cs.CV

    Zero-Shot Activity Recognition with Verb Attribute Induction

    Authors: Rowan Zellers, Ye** Choi

    Abstract: In this paper, we investigate large-scale zero-shot activity recognition by modeling the visual and linguistic attributes of action verbs. For example, the verb "salute" has several properties, such as being a light movement, a social act, and short in duration. We use these attributes as the internal map** between visual and textual representations to reason about a previously unseen action. In… ▽ More

    Submitted 2 September, 2017; v1 submitted 29 July, 2017; originally announced July 2017.

    Comments: accepted to EMNLP 2017

  25. arXiv:1606.06259  [pdf

    cs.CL cs.MM

    MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

    Authors: Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency

    Abstract: People are sharing their opinions, stories and reviews through online video sharing websites every day. Studying sentiment and subjectivity in these opinion videos is experiencing a growing attention from academia and industry. While sentiment analysis has been successful for text, it is an understudied research question for videos and multimedia content. The biggest setbacks for studies in this d… ▽ More

    Submitted 11 August, 2016; v1 submitted 20 June, 2016; originally announced June 2016.

    Comments: Accepted as Journal Publication in IEEE Intelligent Systems

    Journal ref: IEEE Intelligent Systems 31.6 (2016): 82-88