-
To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text
Authors:
Matt Wilber,
William Timkey,
Marten Van Schijndel
Abstract:
Abstractive neural summarization models have seen great improvements in recent years, as shown by ROUGE scores of the generated summaries. But despite these improved metrics, there is limited understanding of the strategies different models employ, and how those strategies relate their understanding of language. To understand this better, we run several experiments to characterize how one popular…
▽ More
Abstractive neural summarization models have seen great improvements in recent years, as shown by ROUGE scores of the generated summaries. But despite these improved metrics, there is limited understanding of the strategies different models employ, and how those strategies relate their understanding of language. To understand this better, we run several experiments to characterize how one popular abstractive model, the pointer-generator model of See et al. (2017), uses its explicit copy/generation switch to control its level of abstraction (generation) vs extraction (copying). On an extractive-biased dataset, the model utilizes syntactic boundaries to truncate sentences that are otherwise often copied verbatim. When we modify the copy/generation switch and force the model to generate, only simple paraphrasing abilities are revealed alongside factual inaccuracies and hallucinations. On an abstractive-biased dataset, the model copies infrequently but shows similarly limited abstractive abilities. In line with previous research, these results suggest that abstractive summarization models lack the semantic understanding necessary to generate paraphrases that are both abstractive and faithful to the source document.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Learning from Multi-domain Artistic Images for Arbitrary Style Transfer
Authors:
Zheng Xu,
Michael Wilber,
Chen Fang,
Aaron Hertzmann,
Hailin **
Abstract:
We propose a fast feed-forward network for arbitrary style transfer, which can generate stylized image for previously unseen content and style image pairs. Besides the traditional content and style representation based on deep features and statistics for textures, we use adversarial networks to regularize the generation of stylized images. Our adversarial network learns the intrinsic property of i…
▽ More
We propose a fast feed-forward network for arbitrary style transfer, which can generate stylized image for previously unseen content and style image pairs. Besides the traditional content and style representation based on deep features and statistics for textures, we use adversarial networks to regularize the generation of stylized images. Our adversarial network learns the intrinsic property of image styles from large-scale multi-domain artistic images. The adversarial training is challenging because both the input and output of our generator are diverse multi-domain images. We use a conditional generator that stylized content by shifting the statistics of deep features, and a conditional discriminator based on the coarse category of styles. Moreover, we propose a mask module to spatially decide the stylization level and stabilize adversarial training by avoiding mode collapse. As a side effect, our trained discriminator can be applied to rank and select representative stylized images. We qualitatively and quantitatively evaluate the proposed method, and compare with recent style transfer methods. We release our code and model at https://github.com/nightldj/behance_release.
△ Less
Submitted 14 April, 2019; v1 submitted 25 May, 2018;
originally announced May 2018.
-
BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography
Authors:
Michael J. Wilber,
Chen Fang,
Hailin **,
Aaron Hertzmann,
John Collomosse,
Serge Belongie
Abstract:
Computer vision systems are designed to work well within the context of everyday photography. However, artists often render the world around them in ways that do not resemble photographs. Artwork produced by people is not constrained to mimic the physical world, making it more challenging for machines to recognize.
This work is a step toward teaching machines how to categorize images in ways tha…
▽ More
Computer vision systems are designed to work well within the context of everyday photography. However, artists often render the world around them in ways that do not resemble photographs. Artwork produced by people is not constrained to mimic the physical world, making it more challenging for machines to recognize.
This work is a step toward teaching machines how to categorize images in ways that are valuable to humans. First, we collect a large-scale dataset of contemporary artwork from Behance, a website containing millions of portfolios from professional and commercial artists. We annotate Behance imagery with rich attribute labels for content, emotions, and artistic media. Furthermore, we carry out baseline experiments to show the value of this dataset for artistic style prediction, for improving the generality of existing object classifiers, and for the study of visual domain adaptation. We believe our Behance Artistic Media dataset will be a good starting point for researchers wishing to study artistic imagery and relevant problems.
△ Less
Submitted 8 July, 2017; v1 submitted 27 April, 2017;
originally announced April 2017.
-
Residual Networks Behave Like Ensembles of Relatively Shallow Networks
Authors:
Andreas Veit,
Michael Wilber,
Serge Belongie
Abstract:
In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through re…
▽ More
In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through residual networks vary in length. Further, a lesion study reveals that these paths show ensemble-like behavior in the sense that they do not strongly depend on each other. Finally, and most surprising, most paths are shorter than one might expect, and only the short paths are needed during training, as longer paths do not contribute any gradient. For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep. Our results reveal one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks.
△ Less
Submitted 26 October, 2016; v1 submitted 20 May, 2016;
originally announced May 2016.
-
Can we still avoid automatic face detection?
Authors:
Michael J. Wilber,
Vitaly Shmatikov,
Serge Belongie
Abstract:
After decades of study, automatic face detection and recognition systems are now accurate and widespread. Naturally, this means users who wish to avoid automatic recognition are becoming less able to do so. Where do we stand in this cat-and-mouse race? We currently live in a society where everyone carries a camera in their pocket. Many people willfully upload most or all of the pictures they take…
▽ More
After decades of study, automatic face detection and recognition systems are now accurate and widespread. Naturally, this means users who wish to avoid automatic recognition are becoming less able to do so. Where do we stand in this cat-and-mouse race? We currently live in a society where everyone carries a camera in their pocket. Many people willfully upload most or all of the pictures they take to social networks which invest heavily in automatic face recognition systems. In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition? If so, how? Must evasion techniques be obvious to be effective, or are there still simple measures that users can use to protect themselves?
In this work, we find ways to evade face detection on Facebook, a representative example of a popular social network that uses automatic face detection to enhance their service. We challenge widely-held beliefs about evading face detection: do our old techniques such as blurring the face region or wearing "privacy glasses" still work? We show that in general, state-of-the-art detectors can often find faces even if the subject wears occluding clothing or even if the uploader damages the photo to prevent faces from being detected.
△ Less
Submitted 27 March, 2020; v1 submitted 14 February, 2016;
originally announced February 2016.
-
On Optimizing Human-Machine Task Assignments
Authors:
Andreas Veit,
Michael Wilber,
Rajan Vaish,
Serge Belongie,
James Davis,
Vishal Anand,
Anshu Aviral,
Prithvijit Chakrabarty,
Yash Chandak,
Sidharth Chaturvedi,
Chinmaya Devaraj,
Ankit Dhall,
Utkarsh Dwivedi,
Sanket Gupte,
Sharath N. Sridhar,
Karthik Paga,
Anuj Pahuja,
Aditya Raisinghani,
Ayush Sharma,
Shweta Sharma,
Darpana Sinha,
Nisarg Thakkar,
K. Bala Vignesh,
Utkarsh Verma,
Kanniganti Abhishek
, et al. (26 additional authors not shown)
Abstract:
When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease…
▽ More
When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease cost under this setting. First, we show that reordering tasks presented to the human can create a significant accuracy improvement. Further, we show that greedily choosing parameters to maximize machine accuracy is sub-optimal, and joint optimization of the combined system improves performance.
△ Less
Submitted 24 September, 2015;
originally announced September 2015.
-
Learning Concept Embeddings with Combined Human-Machine Expertise
Authors:
Michael J. Wilber,
Iljung S. Kwak,
David Kriegman,
Serge Belongie
Abstract:
This paper presents our work on "SNaCK," a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. Both parts are complimentary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. We show that ou…
▽ More
This paper presents our work on "SNaCK," a low-dimensional concept embedding algorithm that combines human expertise with automatic machine similarity kernels. Both parts are complimentary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. We show that our SNaCK embeddings are useful in several tasks: distinguishing prime and nonprime numbers on MNIST, discovering labeling mistakes in the Caltech UCSD Birds (CUB) dataset with the help of deep-learned features, creating training datasets for bird classifiers, capturing subjective human taste on a new dataset of 10,000 foods, and qualitatively exploring an unstructured set of pictographic characters. Comparisons with the state-of-the-art in these tasks show that SNaCK produces better concept embeddings that require less human supervision than the leading methods.
△ Less
Submitted 28 September, 2015; v1 submitted 24 September, 2015;
originally announced September 2015.
-
Image Representations and New Domains in Neural Image Captioning
Authors:
Jack Hessel,
Nicolas Savva,
Michael J. Wilber
Abstract:
We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-the-art neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result in…
▽ More
We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-the-art neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result in a new, fine-grained, transfer learned captioning domain, consisting of 66K recipe image/title pairs. We also provide some experiments regarding the appropriateness of datasets for automatic captioning, and find that having multiple captions per image is beneficial, but not an absolute requirement.
△ Less
Submitted 9 August, 2015;
originally announced August 2015.
-
Cost-Effective HITs for Relative Similarity Comparisons
Authors:
Michael J. Wilber,
Iljung S. Kwak,
Serge J. Belongie
Abstract:
Similarity comparisons of the form "Is object a more similar to b than to c?" are useful for computer vision and machine learning applications. Unfortunately, an embedding of $n$ points is specified by $n^3$ triplets, making collecting every triplet an expensive task. In noticing this difficulty, other researchers have investigated more intelligent triplet sampling techniques, but they do not stud…
▽ More
Similarity comparisons of the form "Is object a more similar to b than to c?" are useful for computer vision and machine learning applications. Unfortunately, an embedding of $n$ points is specified by $n^3$ triplets, making collecting every triplet an expensive task. In noticing this difficulty, other researchers have investigated more intelligent triplet sampling techniques, but they do not study their effectiveness or their potential drawbacks. Although it is important to reduce the number of collected triplets, it is also important to understand how best to display a triplet collection task to a user. In this work we explore an alternative display for collecting triplets and analyze the monetary cost and speed of the display. We propose best practices for creating cost effective human intelligence tasks for collecting triplets. We show that rather than changing the sampling algorithm, simple changes to the crowdsourcing UI can lead to much higher quality embeddings. We also provide a dataset as well as the labels collected from crowd workers.
△ Less
Submitted 12 April, 2014;
originally announced April 2014.
-
Good Recognition is Non-Metric
Authors:
Walter J. Scheirer,
Michael J. Wilber,
Michael Eckmann,
Terrance E. Boult
Abstract:
Recognition is the fundamental task of visual cognition, yet how to formalize the general recognition problem for computer vision remains an open issue. The problem is sometimes reduced to the simplest case of recognizing matching pairs, often structured to allow for metric constraints. However, visual recognition is broader than just pair matching -- especially when we consider multi-class traini…
▽ More
Recognition is the fundamental task of visual cognition, yet how to formalize the general recognition problem for computer vision remains an open issue. The problem is sometimes reduced to the simplest case of recognizing matching pairs, often structured to allow for metric constraints. However, visual recognition is broader than just pair matching -- especially when we consider multi-class training data and large sets of features in a learning context. What we learn and how we learn it has important implications for effective algorithms. In this paper, we reconsider the assumption of recognition as a pair matching test, and introduce a new formal definition that captures the broader context of the problem. Through a meta-analysis and an experimental assessment of the top algorithms on popular data sets, we gain a sense of how often metric properties are violated by good recognition algorithms. By studying these violations, useful insights come to light: we make the case that locally metric algorithms should leverage outside information to solve the general recognition problem.
△ Less
Submitted 19 February, 2013;
originally announced February 2013.
-
Shocklets, SLAMS, and field-aligned ion beams in the terrestrial foreshock
Authors:
L. B. Wilson III,
A. Koval,
D. G. Sibeck,
A. Szabo,
C. A. Cattell,
J. C. Kasper,
B. A. Maruca,
M. Pulupa,
C. S. Salem,
M. Wilber
Abstract:
We present Wind spacecraft observations of ion distributions showing field-aligned beams (FABs) and large-amplitude magnetic fluctuations composed of a series of shocklets and short large-amplitude magnetic structures (SLAMS). We show that the SLAMS are acting like a local quasi-perpendicular shock reflecting ions to produce the FABs. Previous FAB observations reported the source as the quasi-perp…
▽ More
We present Wind spacecraft observations of ion distributions showing field-aligned beams (FABs) and large-amplitude magnetic fluctuations composed of a series of shocklets and short large-amplitude magnetic structures (SLAMS). We show that the SLAMS are acting like a local quasi-perpendicular shock reflecting ions to produce the FABs. Previous FAB observations reported the source as the quasi-perpendicular bow shock. The SLAMS exhibit a foot-like magnetic enhancement with a leading magnetosonic whistler train, consistent with previous observations. The FABs are found to have T_b ~ 80-850 eV, V_b/V_sw ~ 1-2, T_{b,perp}/T{b,para} ~ 1-10, and n_b/n_i ~ 0.2-14%. Strong ion and electron heating are observed within the series of shocklets and SLAMS increasing by factors \geq 5 and \geq 3, respectively. Both the core and halo electron components show strong perpendicular heating inside the feature.
△ Less
Submitted 23 July, 2012;
originally announced July 2012.