Search | arXiv e-print repository

arXiv:2011.05499 [pdf, other]

Unsupervised Learning of Dense Visual Representations

Authors: Pedro O. Pinheiro, Amjad Almahairi, Ryan Y. Benmalek, Florian Golemo, Aaron Courville

Abstract: Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these methods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose… ▽ More Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these methods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks. △ Less

Submitted 7 December, 2020; v1 submitted 10 November, 2020; originally announced November 2020.

arXiv:1906.05275 [pdf, other]

Kee** Notes: Conditional Natural Language Generation with a Scratchpad Mechanism

Authors: Ryan Y. Benmalek, Madian Khabsa, Suma Desu, Claire Cardie, Michele Banko

Abstract: We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks. By enabling the decoder at each time step to write to all of the encoder output layers, Scratchpad can employ the encoder as a "scratchpad" memory to keep… ▽ More We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks. By enabling the decoder at each time step to write to all of the encoder output layers, Scratchpad can employ the encoder as a "scratchpad" memory to keep track of what has been generated so far and thereby guide future generation. We evaluate Scratchpad in the context of three well-studied natural language generation tasks --- Machine Translation, Question Generation, and Text Summarization --- and obtain state-of-the-art or comparable performance on standard datasets for each task. Qualitative assessments in the form of human judgements (question generation), attention visualization (MT), and sample output (summarization) provide further evidence of the ability of Scratchpad to generate fluent and expressive output. △ Less

Submitted 13 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

Comments: Accepted to ACL 2019

arXiv:1806.06183 [pdf, other]

The Neural Painter: Multi-Turn Image Generation

Authors: Ryan Y. Benmalek, Claire Cardie, Serge Belongie, Xiadong He, Jianfeng Gao

Abstract: In this work we combine two research threads from Vision/ Graphics and Natural Language Processing to formulate an image generation task conditioned on attributes in a multi-turn setting. By multiturn, we mean the image is generated in a series of steps of user-specified conditioning information. Our proposed approach is practically useful and offers insights into neural interpretability. We intro… ▽ More In this work we combine two research threads from Vision/ Graphics and Natural Language Processing to formulate an image generation task conditioned on attributes in a multi-turn setting. By multiturn, we mean the image is generated in a series of steps of user-specified conditioning information. Our proposed approach is practically useful and offers insights into neural interpretability. We introduce a framework that includes a novel training algorithm as well as model improvements built for the multi-turn setting. We demonstrate that this framework generates a sequence of images that match the given conditioning information and that this task is useful for more detailed benchmarking and analysis of conditional image generation methods. △ Less

Submitted 16 June, 2018; originally announced June 2018.

Showing 1–3 of 3 results for author: Benmalek, R Y