-
Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts
Authors:
Yu Zhang,
Yunyi Zhang,
Martin Michalski,
Yucheng Jiang,
Yu Meng,
Jiawei Han
Abstract:
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approa…
▽ More
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.
△ Less
Submitted 10 January, 2023; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Entity Set Co-Expansion in StackOverflow
Authors:
Yu Zhang,
Yunyi Zhang,
Yucheng Jiang,
Martin Michalski,
Yu Deng,
Lucian Popa,
ChengXiang Zhai,
Jiawei Han
Abstract:
Given a few seed entities of a certain type (e.g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds. Entity set expansion in software-related domains such as StackOverflow can benefit many downstream tasks (e.g., software knowledge graph construction) and facilitate better IT operations and service managemen…
▽ More
Given a few seed entities of a certain type (e.g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds. Entity set expansion in software-related domains such as StackOverflow can benefit many downstream tasks (e.g., software knowledge graph construction) and facilitate better IT operations and service management. Meanwhile, existing approaches are less concerned with two problems: (1) How to deal with multiple types of seed entities simultaneously? (2) How to leverage the power of pre-trained language models (PLMs)? Being aware of these two problems, in this paper, we study the entity set co-expansion task in StackOverflow, which extracts Library, OS, Application, and Language entities from StackOverflow question-answer threads. During the co-expansion process, we use PLMs to derive embeddings of candidate entities for calculating similarities between entities. Experimental results show that our proposed SECoExpan framework outperforms previous approaches significantly.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
Authors:
Yu Meng,
Martin Michalski,
Jiaxin Huang,
Yu Zhang,
Tarek Abdelzaher,
Jiawei Han
Abstract:
Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underpe…
▽ More
Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.
△ Less
Submitted 12 May, 2023; v1 submitted 6 November, 2022;
originally announced November 2022.
-
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Authors:
Marcin Andrychowicz,
Anton Raichuk,
Piotr Stańczyk,
Manu Orsini,
Sertan Girgin,
Raphael Marinier,
Léonard Hussenot,
Matthieu Geist,
Olivier Pietquin,
Marcin Michalski,
Sylvain Gelly,
Olivier Bachem
Abstract:
In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literatur…
▽ More
In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
Authors:
Lasse Espeholt,
Raphaël Marinier,
Piotr Stanczyk,
Ke Wang,
Marcin Michalski
Abstract:
We present a modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL). By effectively utilizing modern accelerators, we show that it is not only possible to train on millions of frames per second but also to lower the cost of experiments compared to current methods. We achieve this with a simple architecture that features centralized inference and an optimized communi…
▽ More
We present a modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL). By effectively utilizing modern accelerators, we show that it is not only possible to train on millions of frames per second but also to lower the cost of experiments compared to current methods. We achieve this with a simple architecture that features centralized inference and an optimized communication layer. SEED adopts two state of the art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, DeepMind Lab and Google Research Football. We improve the state of the art on Football and are able to reach state of the art on Atari-57 three times faster in wall-time. For the scenarios we consider, a 40% to 80% cost reduction for running experiments is achieved. The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out.
△ Less
Submitted 11 February, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Authors:
Xiaohua Zhai,
Joan Puigcerver,
Alexander Kolesnikov,
Pierre Ruyssen,
Carlos Riquelme,
Mario Lucic,
Josip Djolonga,
Andre Susano Pinto,
Maxim Neumann,
Alexey Dosovitskiy,
Lucas Beyer,
Olivier Bachem,
Michael Tschannen,
Marcin Michalski,
Olivier Bousquet,
Sylvain Gelly,
Neil Houlsby
Abstract:
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, r…
▽ More
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, reconstruction error). We present the Visual Task Adaptation Benchmark (VTAB), which defines good representations as those that adapt to diverse, unseen tasks with few examples. With VTAB, we conduct a large-scale study of many popular publicly-available representation learning algorithms. We carefully control confounders such as architecture and tuning budget. We address questions like: How effective are ImageNet representations beyond standard natural datasets? How do representations trained via generative and discriminative models compare? To what extent can self-supervision replace labels? And, how close are we to general visual representations?
△ Less
Submitted 21 February, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Google Research Football: A Novel Reinforcement Learning Environment
Authors:
Karol Kurach,
Anton Raichuk,
Piotr Stańczyk,
Michał Zając,
Olivier Bachem,
Lasse Espeholt,
Carlos Riquelme,
Damien Vincent,
Marcin Michalski,
Olivier Bousquet,
Sylvain Gelly
Abstract:
Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator…
▽ More
Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions.
△ Less
Submitted 14 April, 2020; v1 submitted 25 July, 2019;
originally announced July 2019.
-
Integrating Visualization Literacy into Computer Graphics Education Using the Example of Dear Data
Authors:
Andrey Krekhov,
Michael Michalski,
Jens Krüger
Abstract:
The amount of visual communication we are facing is rapidly increasing, and skills to process, understand, and generate visual representations are in high demand. Especially students focusing on computer graphics and visualization can benefit from a more diverse education on visual literacy, as they often have to work on graphical representations for broad masses after their graduation. Our propos…
▽ More
The amount of visual communication we are facing is rapidly increasing, and skills to process, understand, and generate visual representations are in high demand. Especially students focusing on computer graphics and visualization can benefit from a more diverse education on visual literacy, as they often have to work on graphical representations for broad masses after their graduation. Our proposed teaching approach incorporates basic design thinking principles into traditional visualization and graphics education. Our course was inspired by the book Dear Data that was the subject of a lively discussion at the closing capstone of IEEE VIS 2017. The paper outlines our 12-week teaching experiment and summarizes the results extracted from accompanying questionnaires and interviews. In particular, we provide insights into the creation process and pain points of visualization novices, discuss the observed interplay between visualization tasks and design thinking, and finally draw design implications for visual literacy education in general.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
DeepAAA: clinically applicable and generalizable detection of abdominal aortic aneurysm using deep learning
Authors:
Jen-Tang Lu,
Rupert Brooks,
Stefan Hahn,
** Chen,
Varun Buch,
Gopal Kotecha,
Katherine P. Andriole,
Brian Ghoshhajra,
Joel Pinto,
Paul Vozila,
Mark Michalski,
Neil A. Tenenholtz
Abstract:
We propose a deep learning-based technique for detection and quantification of abdominal aortic aneurysms (AAAs). The condition, which leads to more than 10,000 deaths per year in the United States, is asymptomatic, often detected incidentally, and often missed by radiologists. Our model architecture is a modified 3D U-Net combined with ellipse fitting that performs aorta segmentation and AAA dete…
▽ More
We propose a deep learning-based technique for detection and quantification of abdominal aortic aneurysms (AAAs). The condition, which leads to more than 10,000 deaths per year in the United States, is asymptomatic, often detected incidentally, and often missed by radiologists. Our model architecture is a modified 3D U-Net combined with ellipse fitting that performs aorta segmentation and AAA detection. The study uses 321 abdominal-pelvic CT examinations performed by Massachusetts General Hospital Department of Radiology for training and validation. The model is then further tested for generalizability on a separate set of 57 examinations with differing patient demographics and acquisition characteristics than the original dataset. DeepAAA achieves high performance on both sets of data (sensitivity/specificity 0.91/0.95 and 0.85 / 1.0 respectively), on contrast and non-contrast CT scans and works with image volumes with varying numbers of images. We find that DeepAAA exceeds literature-reported performance of radiologists on incidental AAA detection. It is expected that the model can serve as an effective background detector in routine CT examinations to prevent incidental AAAs from being missed.
△ Less
Submitted 4 July, 2019;
originally announced July 2019.
-
4D CNN for semantic segmentation of cardiac volumetric sequences
Authors:
Andriy Myronenko,
Dong Yang,
Varun Buch,
Daguang Xu,
Alvin Ihsani,
Sean Doyle,
Mark Michalski,
Neil Tenenholtz,
Holger Roth
Abstract:
We propose a 4D convolutional neural network (CNN) for the segmentation of retrospective ECG-gated cardiac CT, a series of single-channel volumetric data over time. While only a small subset of volumes in the temporal sequence is annotated, we define a sparse loss function on available labels to allow the network to leverage unlabeled images during training and generate a fully segmented sequence.…
▽ More
We propose a 4D convolutional neural network (CNN) for the segmentation of retrospective ECG-gated cardiac CT, a series of single-channel volumetric data over time. While only a small subset of volumes in the temporal sequence is annotated, we define a sparse loss function on available labels to allow the network to leverage unlabeled images during training and generate a fully segmented sequence. We investigate the accuracy of the proposed 4D network to predict temporally consistent segmentations and compare with traditional 3D segmentation approaches. We demonstrate the feasibility of the 4D CNN and establish its performance on cardiac 4D CCTA.
△ Less
Submitted 9 October, 2019; v1 submitted 17 June, 2019;
originally announced June 2019.
-
Towards Accurate Generative Models of Video: A New Metric & Challenges
Authors:
Thomas Unterthiner,
Sjoerd van Steenkiste,
Karol Kurach,
Raphael Marinier,
Marcin Michalski,
Sylvain Gelly
Abstract:
Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presen…
▽ More
Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presentation of objects. While recent attempts at formulating generative models of video have had some success, current progress is hampered by (1) the lack of qualitative metrics that consider visual quality, temporal coherence, and diversity of samples, and (2) the wide gap between purely synthetic video data sets and challenging real-world data sets in terms of complexity. To this extent we propose Fréchet Video Distance (FVD), a new metric for generative models of video, and StarCraft 2 Videos (SCV), a benchmark of game play from custom starcraft 2 scenarios that challenge the current capabilities of generative models of video. We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.
△ Less
Submitted 27 March, 2019; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Fully-Automated Analysis of Body Composition from CT in Cancer Patients Using Convolutional Neural Networks
Authors:
Christopher P. Bridge,
Michael Rosenthal,
Bradley Wright,
Gopal Kotecha,
Florian Fintelmann,
Fabian Troschel,
Nityanand Miskin,
Khanant Desai,
William Wrobel,
Ana Babic,
Natalia Khalaf,
Lauren Brais,
Marisa Welch,
Caitlin Zellers,
Neil Tenenholtz,
Mark Michalski,
Brian Wolpin,
Katherine Andriole
Abstract:
The amounts of muscle and fat in a person's body, known as body composition, are correlated with cancer risks, cancer survival, and cardiovascular risk. The current gold standard for measuring body composition requires time-consuming manual segmentation of CT images by an expert reader. In this work, we describe a two-step process to fully automate the analysis of CT body composition using a Dense…
▽ More
The amounts of muscle and fat in a person's body, known as body composition, are correlated with cancer risks, cancer survival, and cardiovascular risk. The current gold standard for measuring body composition requires time-consuming manual segmentation of CT images by an expert reader. In this work, we describe a two-step process to fully automate the analysis of CT body composition using a DenseNet to select the CT slice and U-Net to perform segmentation. We train and test our methods on independent cohorts. Our results show Dice scores (0.95-0.98) and correlation coefficients (R=0.99) that are favorable compared to human readers. These results suggest that fully automated body composition analysis is feasible, which could enable both clinical use and large-scale population studies.
△ Less
Submitted 11 August, 2018;
originally announced August 2018.
-
Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks
Authors:
Hoo-Chang Shin,
Neil A Tenenholtz,
Jameson K Rogers,
Christopher G Schwarz,
Matthew L Senjem,
Jeffrey L Gunter,
Katherine Andriole,
Mark Michalski
Abstract:
Data diversity is critical to success when training deep learning models. Medical imaging data sets are often imbalanced as pathologic findings are generally rare, which introduces significant challenges when training deep learning models. In this work, we propose a method to generate synthetic abnormal MRI images with brain tumors by training a generative adversarial network using two publicly av…
▽ More
Data diversity is critical to success when training deep learning models. Medical imaging data sets are often imbalanced as pathologic findings are generally rare, which introduces significant challenges when training deep learning models. In this work, we propose a method to generate synthetic abnormal MRI images with brain tumors by training a generative adversarial network using two publicly available data sets of brain MRI. We demonstrate two unique benefits that the synthetic images provide. First, we illustrate improved performance on tumor segmentation by leveraging the synthetic images as a form of data augmentation. Second, we demonstrate the value of generative models as an anonymization tool, achieving comparable tumor segmentation results when trained on the synthetic data versus when trained on real subject data. Together, these results offer a potential solution to two of the largest challenges facing machine learning in medical imaging, namely the small incidence of pathological findings, and the restrictions around sharing of patient data.
△ Less
Submitted 13 September, 2018; v1 submitted 26 July, 2018;
originally announced July 2018.
-
DeepSPINE: Automated Lumbar Vertebral Segmentation, Disc-level Designation, and Spinal Stenosis Grading Using Deep Learning
Authors:
Jen-Tang Lu,
Stefano Pedemonte,
Bernardo Bizzo,
Sean Doyle,
Katherine P. Andriole,
Mark H. Michalski,
R. Gilberto Gonzalez,
Stuart R. Pomerantz
Abstract:
The high prevalence of spinal stenosis results in a large volume of MRI imaging, yet interpretation can be time-consuming with high inter-reader variability even among the most specialized radiologists. In this paper, we develop an efficient methodology to leverage the subject-matter-expertise stored in large-scale archival reporting and image data for a deep-learning approach to fully-automated l…
▽ More
The high prevalence of spinal stenosis results in a large volume of MRI imaging, yet interpretation can be time-consuming with high inter-reader variability even among the most specialized radiologists. In this paper, we develop an efficient methodology to leverage the subject-matter-expertise stored in large-scale archival reporting and image data for a deep-learning approach to fully-automated lumbar spinal stenosis grading. Specifically, we introduce three major contributions: (1) a natural-language-processing scheme to extract level-by-level ground-truth labels from free-text radiology reports for the various types and grades of spinal stenosis (2) accurate vertebral segmentation and disc-level localization using a U-Net architecture combined with a spine-curve fitting method, and (3) a multi-input, multi-task, and multi-class convolutional neural network to perform central canal and foraminal stenosis grading on both axial and sagittal imaging series inputs with the extracted report-derived labels applied to corresponding imaging level segments. This study uses a large dataset of 22796 disc-levels extracted from 4075 patients. We achieve state-of-the-art performance on lumbar spinal stenosis classification and expect the technique will increase both radiology workflow efficiency and the perceived value of radiology reports for referring clinicians and patients.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
A Large-Scale Study on Regularization and Normalization in GANs
Authors:
Karol Kurach,
Mario Lucic,
Xiaohua Zhai,
Marcin Michalski,
Sylvain Gelly
Abstract:
Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant number of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of "tricks". The success in many…
▽ More
Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant number of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of "tricks". The success in many practical applications coupled with the lack of a measure to quantify the failure modes of GANs resulted in a plethora of proposed losses, regularization and normalization schemes, as well as neural architectures. In this work we take a sober view of the current state of GANs from a practical perspective. We discuss and evaluate common pitfalls and reproducibility issues, open-source our code on Github, and provide pre-trained models on TensorFlow Hub.
△ Less
Submitted 14 May, 2019; v1 submitted 12 July, 2018;
originally announced July 2018.
-
MemGEN: Memory is All You Need
Authors:
Sylvain Gelly,
Karol Kurach,
Marcin Michalski,
Xiaohua Zhai
Abstract:
We propose a new learning paradigm called Deep Memory. It has the potential to completely revolutionize the Machine Learning field. Surprisingly, this paradigm has not been reinvented yet, unlike Deep Learning. At the core of this approach is the \textit{Learning By Heart} principle, well studied in primary schools all over the world.
Inspired by poem recitation, or by $π$ decimal memorization,…
▽ More
We propose a new learning paradigm called Deep Memory. It has the potential to completely revolutionize the Machine Learning field. Surprisingly, this paradigm has not been reinvented yet, unlike Deep Learning. At the core of this approach is the \textit{Learning By Heart} principle, well studied in primary schools all over the world.
Inspired by poem recitation, or by $π$ decimal memorization, we propose a concrete algorithm that mimics human behavior. We implement this paradigm on the task of generative modeling, and apply to images, natural language and even the $π$ decimals as long as one can print them as text. The proposed algorithm even generated this paper, in a one-shot learning setting. In carefully designed experiments, we show that the generated samples are indistinguishable from the training examples, as measured by any statistical tests or metrics.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
Are GANs Created Equal? A Large-Scale Study
Authors:
Mario Lucic,
Karol Kurach,
Marcin Michalski,
Sylvain Gelly,
Olivier Bousquet
Abstract:
Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach…
▽ More
Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the non-saturating GAN introduced in \cite{goodfellow2014generative}.
△ Less
Submitted 29 October, 2018; v1 submitted 28 November, 2017;
originally announced November 2017.