Skip to main content

Showing 1–35 of 35 results for author: Hendricks, L A

.
  1. arXiv:2404.16244  [pdf, other

    cs.CY

    The Ethics of Advanced AI Assistants

    Authors: Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz , et al. (32 additional authors not shown)

    Abstract: This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  2. arXiv:2404.14068  [pdf, other

    cs.AI cs.LG

    Holistic Safety and Responsibility Evaluations of Advanced AI Models

    Authors: Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac

    Abstract: Safety and responsibility evaluations of advanced AI models are a critical but develo** field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages excluding bibliography

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2310.11986  [pdf, other

    cs.AI cs.CL cs.CY

    Sociotechnical Safety Evaluation of Generative AI Systems

    Authors: Laura Weidinger, Maribeth Rauh, Nahema Marchal, Arianna Manzini, Lisa Anne Hendricks, Juan Mateos-Garcia, Stevie Bergman, Jackie Kay, Conor Griffin, Ben Bariach, Iason Gabriel, Verena Rieser, William Isaac

    Abstract: Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main… ▽ More

    Submitted 31 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: main paper p.1-29, 5 figures, 2 tables

  6. arXiv:2305.14281  [pdf, other

    cs.CL cs.CV

    Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

    Authors: Emanuele Bugliarello, Aida Nematzadeh, Lisa Anne Hendricks

    Abstract: Recent work in vision-and-language pretraining has investigated supervised signals from object detection data to learn better, fine-grained multimodal representations. In this work, we take a step further and explore how we can tap into supervision from small-scale visual relation data. In particular, we propose two pretraining approaches to contextualise visual entities in a multimodal setup. Wit… ▽ More

    Submitted 19 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  7. arXiv:2305.07558  [pdf, other

    cs.CL cs.CV

    Measuring Progress in Fine-grained Vision-and-Language Understanding

    Authors: Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks, Aida Nematzadeh

    Abstract: While pretraining on large-scale image-text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, recent work has demonstrated that pretrained models lack "fine-grained" understanding, such as the ability to recognise relationships, verbs, and numbers in images. This has resulted in an increased interest in the community to either develop new benchmarks or model… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  8. arXiv:2209.14375  [pdf, other

    cs.LG cs.CL

    Improving alignment of dialogue agents via targeted human judgements

    Authors: Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu , et al. (9 additional authors not shown)

    Abstract: We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into na… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  9. arXiv:2206.08325  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

    Authors: Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Weidinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, William Isaac, Lisa Anne Hendricks

    Abstract: Large language models produce human-like text that drive a growing number of applications. However, recent literature and, increasingly, real world observations, have demonstrated that these models can generate language that is toxic, biased, untruthful or otherwise harmful. Though work to evaluate language model harms is under way, translating foresight about which harms may arise into rigorous b… ▽ More

    Submitted 28 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022 Datasets and Benchmarks Track; 10 pages plus appendix

  10. arXiv:2203.15556  [pdf, other

    cs.CL cs.LG

    Training Compute-Optimal Large Language Models

    Authors: Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

    Abstract: We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst kee** the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

  11. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  12. arXiv:2112.04359  [pdf, other

    cs.CL cs.AI cs.CY

    Ethical and social risks of harm from Language Models

    Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

    Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  13. arXiv:2109.07445  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Challenges in Detoxifying Language Models

    Authors: Johannes Welbl, Amelia Glaese, Jonathan Uesato, Sumanth Dathathri, John Mellor, Lisa Anne Hendricks, Kirsty Anderson, Pushmeet Kohli, Ben Coppin, Po-Sen Huang

    Abstract: Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity. We critically discuss this approach, evaluate several toxicity mitigation strategies wit… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: 23 pages, 6 figures, published in Findings of EMNLP 2021

    ACM Class: I.2.6; I.2.7

  14. arXiv:2106.09141  [pdf, other

    cs.CL cs.CV

    Probing Image-Language Transformers for Verb Understanding

    Authors: Lisa Anne Hendricks, Aida Nematzadeh

    Abstract: Multimodal image-language transformers have achieved impressive results on a variety of tasks that rely on fine-tuning (e.g., visual question answering and image retrieval). We are interested in shedding light on the quality of their pretrained representations -- in particular, if these models can distinguish different types of verbs or if they rely solely on nouns in a given sentence. To do so, w… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  15. arXiv:2102.00529  [pdf, other

    cs.CL cs.CV

    Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers

    Authors: Lisa Anne Hendricks, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, Aida Nematzadeh

    Abstract: Recently multimodal transformer models have gained popularity because their performance on language and vision tasks suggest they learn rich visual-linguistic representations. Focusing on zero-shot image retrieval tasks, we study three important factors which can impact the quality of learned representations: pretraining data, the attention mechanism, and loss functions. By pretraining models on s… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: pre-print of MIT Press Publication version

  16. arXiv:2004.06524  [pdf, other

    cs.CV cs.LG stat.ML

    Contrastive Examples for Addressing the Tyranny of the Majority

    Authors: Viktoriia Sharmanska, Lisa Anne Hendricks, Trevor Darrell, Novi Quadrianto

    Abstract: Computer vision algorithms, e.g. for face recognition, favour groups of individuals that are better represented in the training data. This happens because of the generalization that classifiers have to make. It is simpler to fit the majority groups as this fit is more important to overall error. We propose to create a balanced training dataset, consisting of the original dataset plus new data poin… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

  17. arXiv:1809.02156  [pdf, other

    cs.CL cs.CV

    Object Hallucination in Image Captioning

    Authors: Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

    Abstract: Despite continuously improving performance, contemporary image captioning models are prone to "hallucinating" objects that are not actually in a scene. One problem is that standard metrics only measure similarity to ground truth captions and may not fully capture image relevance. In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and asses… ▽ More

    Submitted 29 March, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: Rohrbach and Hendricks contributed equally; accepted to EMNLP 2018

  18. arXiv:1809.01337  [pdf, other

    cs.CV cs.CL

    Localizing Moments in Video with Temporal Language

    Authors: Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

    Abstract: Localizing moments in a longer video via natural language queries is a new, challenging task at the intersection of language and video understanding. Though moment localization with natural language is similar to other language and vision tasks like natural language object retrieval in images, moment localization offers an interesting opportunity to model temporal dependencies and reasoning in tex… ▽ More

    Submitted 5 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  19. arXiv:1807.09685  [pdf, other

    cs.CV

    Grounding Visual Explanations

    Authors: Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

    Abstract: Existing visual explanation generating agents learn to fluently justify a class prediction. However, they may mention visual attributes which reflect a strong class prior, although the evidence may not actually be in the image. This is particularly concerning as ultimately such agents fail in building trust with human users. To overcome this limitation, we propose a phrase-critic model to refine g… ▽ More

    Submitted 2 August, 2018; v1 submitted 25 July, 2018; originally announced July 2018.

    Comments: Accepted to ECCV 2018

    Journal ref: European Conference on Computer Vision (ECCV), 2018

  20. arXiv:1807.00517  [pdf, other

    cs.CV

    Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)

    Authors: Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach

    Abstract: Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data. This can lead to incorrect captions in domains where unbiased captions are desired, or required, due to over reliance on the learned prior and image co… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: Burns and Hendricks contributed equally. 2018 ICML Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

  21. arXiv:1806.09809  [pdf, other

    cs.CV

    Generating Counterfactual Explanations with Natural Language

    Authors: Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

    Abstract: Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. Current textual explanations learn to discuss class discriminative features in an image. However, it is also helpful to understand which attributes might change a classification decision if present in an image (e.g., "This is not a Scarlet Tanager because it doe… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden

  22. arXiv:1803.09797  [pdf, other

    cs.CV

    Women also Snowboard: Overcoming Bias in Captioning Models

    Authors: Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

    Abstract: Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data (e.g., if a word is present in 60% of training sentences, it might be predicted in 70% of sentences at test time). This can lead to incorrect captions i… ▽ More

    Submitted 13 March, 2019; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: 22 pages, 6 figures, Burns and Hendricks contributed equally

  23. arXiv:1802.08129  [pdf, other

    cs.AI cs.CL cs.CV

    Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: arXiv admin note: text overlap with arXiv:1612.04757

  24. arXiv:1711.07373  [pdf, other

    cs.CV

    Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. On the other hand, their opaqueness has led to a surge of interest in explainable systems. In this work, we emphasize the importance of model explanation in various forms such as visual pointing and textual justification. The lack of data with justification annotati… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: text overlap with arXiv:1612.04757

  25. arXiv:1711.06465  [pdf, other

    cs.CV

    Grounding Visual Explanations (Extended Abstract)

    Authors: Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

    Abstract: Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a new model is proposed for generating explanations by utilizing localized grounding of constituent phrases in generated explanations to ensure image relevance. Sp… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

  26. arXiv:1708.01641  [pdf, other

    cs.CV

    Localizing Moments in Video with Natural Language

    Authors: Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

    Abstract: We consider retrieving a specific temporal segment, or moment, from a video given a natural language text description. Methods designed to retrieve whole video clips with natural language determine what occurs in a video but not when. To address this issue, we propose the Moment Context Network (MCN) which effectively localizes natural language queries in videos by integrating local and global vid… ▽ More

    Submitted 4 August, 2017; originally announced August 2017.

    Comments: ICCV 2017

  27. arXiv:1703.10476  [pdf, other

    cs.CV cs.AI cs.CL

    Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

    Authors: Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele

    Abstract: While strong progress has been made in image captioning over the last years, machine and human captions are still quite distinct. A closer look reveals that this is due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans -- rightfully so -- generate multiple, diverse captions, due to the inherent… ▽ More

    Submitted 6 November, 2017; v1 submitted 30 March, 2017; originally announced March 2017.

    Comments: 16 pages, Published in ICCV 2017

  28. arXiv:1612.04757  [pdf, other

    cs.CV cs.AI cs.CL

    Attentive Explanations: Justifying Decisions and Pointing to the Evidence

    Authors: Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

    Abstract: Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can d… ▽ More

    Submitted 25 July, 2017; v1 submitted 14 December, 2016; originally announced December 2016.

  29. arXiv:1606.07770  [pdf, other

    cs.CV cs.CL

    Captioning Images with Diverse Objects

    Authors: Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko

    Abstract: Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -- labeled images from object recognition dat… ▽ More

    Submitted 20 July, 2017; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: CVPR 2017 Camera ready version. 17 pages (8 + 9 supplement), 12 figures, 8 tables. Includes project page http://vsubhashini.github.io/noc.html

  30. arXiv:1604.01729  [pdf, other

    cs.CL cs.CV

    Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

    Authors: Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, Kate Saenko

    Abstract: This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two larg… ▽ More

    Submitted 29 November, 2016; v1 submitted 6 April, 2016; originally announced April 2016.

    Comments: Accepted at EMNLP 2016. Project page: http://vsubhashini.github.io/language_fusion.html

    Journal ref: Proc.EMNLP (2016) pg.1961-1966

  31. arXiv:1603.08507  [pdf, other

    cs.CV cs.AI cs.CL

    Generating Visual Explanations

    Authors: Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell

    Abstract: Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. We prop… ▽ More

    Submitted 28 March, 2016; originally announced March 2016.

  32. arXiv:1511.06065  [pdf, other

    cs.RO cs.CV cs.LG

    Deep Learning for Tactile Understanding From Visual and Haptic Data

    Authors: Yang Gao, Lisa Anne Hendricks, Katherine J. Kuchenbecker, Trevor Darrell

    Abstract: Robots which interact with the physical world will benefit from a fine-grained tactile understanding of objects and surfaces. Additionally, for certain tasks, robots may need to know the haptic properties of an object before touching it. To enable better tactile understanding for robots, we propose a method of classifying surfaces with haptic adjectives (e.g., compressible or smooth) from both vis… ▽ More

    Submitted 11 April, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Camera ready version for ICRA 2016

  33. arXiv:1511.05284  [pdf, other

    cs.CV cs.CL

    Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

    Authors: Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell

    Abstract: While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we propose the Deep Compositional Captioner (DCC) to address the task of generating descriptions of novel objects which are not present in paired image-sentence da… ▽ More

    Submitted 27 April, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  34. arXiv:1412.7155  [pdf, other

    cs.CV cs.LG cs.NE

    Learning Compact Convolutional Neural Networks with Nested Dropout

    Authors: Chelsea Finn, Lisa Anne Hendricks, Trevor Darrell

    Abstract: Recently, nested dropout was proposed as a method for ordering representation units in autoencoders by their information content, without diminishing reconstruction cost. However, it has only been applied to training fully-connected autoencoders in an unsupervised setting. We explore the impact of nested dropout on the convolutional layers in a CNN trained by backpropagation, investigating whether… ▽ More

    Submitted 10 April, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

    Comments: 4 pages, 2 figures. Accepted as a workshop contribution at ICLR 2015

  35. arXiv:1411.4389  [pdf, other

    cs.CV

    Long-term Recurrent Convolutional Networks for Visual Recognition and Description

    Authors: Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

    Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of thes… ▽ More

    Submitted 31 May, 2016; v1 submitted 17 November, 2014; originally announced November 2014.

    Comments: Originally presented at CVPR 2015 (oral). Updated version (accepted as a TPAMI journal article) includes additional results