Skip to main content

Showing 1–50 of 56 results for author: Shlens, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2306.07952  [pdf, other

    cs.CV cs.CL cs.LG

    MOFI: Learning Image Representations from Noisy Entity Annotated Images

    Authors: Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

    Abstract: We present MOFI, Manifold OF Images, a new vision foundation model designed to learn image representations from noisy entity annotated images. MOFI differs from previous work in two key aspects: (i) pre-training data, and (ii) training recipe. Regarding data, we introduce a new approach to automatically assign entity labels to images from noisy image-text pairs. Our approach involves employing a n… ▽ More

    Submitted 17 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted to ICLR 2024

  3. arXiv:2304.04385  [pdf, other

    cs.LG

    On Robustness in Multimodal Learning

    Authors: Brandon McKinzie, Joseph Cheng, Vaishaal Shankar, Yinfei Yang, Jonathon Shlens, Alexander Toshev

    Abstract: Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms. We present a multimodal robustness framework… ▽ More

    Submitted 10 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  4. arXiv:2301.13081  [pdf, other

    cs.CV

    STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

    Authors: Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

    Abstract: Image and text retrieval is one of the foundational tasks in the vision and language domain with multiple real-world applications. State-of-the-art approaches, e.g. CLIP, ALIGN, represent images and texts as dense embeddings and calculate the similarity in the dense embedding space as the matching score. On the other hand, sparse semantic features like bag-of-words models are more interpretable, b… ▽ More

    Submitted 7 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  5. PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds

    Authors: Zhaoqi Leng, Shuyang Cheng, Benjamin Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov

    Abstract: Data augmentation is an important technique to improve data efficiency and save labeling cost for 3D detection in point clouds. Yet, existing augmentation policies have so far been designed to only utilize labeled data, which limits the data diversity. In this paper, we recognize that pseudo labeling and data augmentation are complementary, thus propose to leverage unlabeled data for data augmenta… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Journal ref: ECCV 2022 (pp. 555-572). Springer, Cham

  6. arXiv:2210.09996  [pdf, other

    cs.CV cs.LG

    Perceptual Grou** in Contrastive Vision-Language Models

    Authors: Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens

    Abstract: Recent advances in zero-shot image recognition suggest that vision-language models learn generic visual representations with a high degree of semantic information that may be arbitrarily probed with natural language phrases. Understanding an image, however, is not just about understanding what content resides within an image, but importantly, where that content resides. In this work we examine how… ▽ More

    Submitted 21 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted and presented at ICCV 2023

  7. arXiv:2108.00106  [pdf, other

    cs.LG cs.AI

    Soft Calibration Objectives for Neural Networks

    Authors: Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

    Abstract: Optimal decision making requires that classifiers produce uncertainty estimates consistent with their empirical accuracy. However, deep neural networks are often under- or over-confident in their predictions. Consequently, methods have been developed to improve the calibration of their predictive uncertainty both during training and post-hoc. In this work, we propose differentiable losses to impro… ▽ More

    Submitted 7 December, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: 17 pages total, 10 page main paper, 5 page appendix, 10 figures total, 8 figures in main paper, 2 figures in appendix

  8. arXiv:2106.08417  [pdf, other

    cs.CV cs.LG cs.RO

    Scene Transformer: A unified architecture for predicting multiple agent trajectories

    Authors: Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, Jonathon Shlens

    Abstract: Predicting the motion of multiple agents is necessary for planning in dynamic environments. This task is challenging for autonomous driving since agents (e.g. vehicles and pedestrians) and their associated behaviors may be diverse and influence one another. Most prior work have focused on predicting independent futures for each agent based on all past motion, and planning against these independent… ▽ More

    Submitted 4 March, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: ICLR 2022

  9. arXiv:2104.10133  [pdf, other

    cs.CV cs.LG cs.RO

    Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset

    Authors: Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov

    Abstract: As autonomous driving systems mature, motion forecasting has received increasing attention as a critical requirement for planning. Of particular importance are interactive situations such as merges, unprotected turns, etc., where predicting individual object motion is not sufficient. Joint predictions of multiple objects are required for effective route planning. There has been a critical need for… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: 15 pages, 10 figures

  10. arXiv:2103.12731  [pdf, other

    cs.CV

    Scaling Local Self-Attention for Parameter Efficient Visual Backbones

    Authors: Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, Jonathon Shlens

    Abstract: Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling and content-independent interactions of convolutions. Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolut… ▽ More

    Submitted 7 June, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: CVPR 2021 Oral

  11. arXiv:2103.07579  [pdf, other

    cs.CV

    Revisiting ResNets: Improved Training and Scaling Strategies

    Authors: Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

    Abstract: Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter mor… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

  12. arXiv:2103.02093  [pdf, other

    cs.CV cs.LG

    Pseudo-labeling for Scalable 3D Object Detection

    Authors: Benjamin Caine, Rebecca Roelofs, Vijay Vasudevan, Jiquan Ngiam, Yuning Chai, Zhifeng Chen, Jonathon Shlens

    Abstract: To safely deploy autonomous vehicles, onboard perception systems must work reliably at high accuracy across a diverse set of environments and geographies. One of the most common techniques to improve the efficacy of such systems in new domains involves collecting large labeled datasets, but such datasets can be extremely costly to obtain, especially if each new deployment geography requires additi… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

  13. arXiv:2103.01306  [pdf, other

    cs.CV cs.LG

    Scalable Scene Flow from Point Clouds in the Real World

    Authors: Philipp Jund, Chris Sweeney, Nichola Abdo, Zhifeng Chen, Jonathon Shlens

    Abstract: Autonomous vehicles operate in highly dynamic environments necessitating an accurate assessment of which aspects of a scene are moving and where they are moving to. A popular approach to 3D motion estimation, termed scene flow, is to employ 3D point cloud data from consecutive LiDAR scans, although such approaches have been limited by the small size of real-world, annotated LiDAR data. In this wor… ▽ More

    Submitted 25 October, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  14. arXiv:2101.11605  [pdf, other

    cs.CV cs.AI cs.LG

    Bottleneck Transformers for Visual Recognition

    Authors: Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

    Abstract: We present BoTNet, a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, our approach improves upon the baseline… ▽ More

    Submitted 2 August, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: Technical Report, 20 pages, 13 figures, 19 tables

  15. arXiv:2012.08668  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Mitigating Bias in Calibration Error Estimation

    Authors: Rebecca Roelofs, Nicholas Cain, Jonathon Shlens, Michael C. Mozer

    Abstract: For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared. Most research in calibration focuses on techniques to reduce this empirical measure of calibration error, ECE_bin. We instead focus on assessing statistical bias i… ▽ More

    Submitted 10 February, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: To be published in AISTATS 2022. Code is available https://github.com/google-research/google-research/tree/master/caltrain

  16. arXiv:2005.10266  [pdf, other

    cs.CV

    Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation

    Authors: Liang-Chieh Chen, Raphael Gontijo Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens

    Abstract: Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale human-annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks, where the exp… ▽ More

    Submitted 19 July, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Accepted to ECCV 2020

  17. arXiv:2005.01864  [pdf, other

    cs.CV

    Streaming Object Detection for 3-D Point Clouds

    Authors: Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

    Abstract: Autonomous vehicles operate in a dynamic environment, where the speed with which a vehicle can perceive and react impacts the safety and efficacy of the system. LiDAR provides a prominent sensory modality that informs many existing perceptual systems including object detection, segmentation, motion estimation, and action recognition. The latency for perceptual systems based on point cloud data can… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

  18. arXiv:2004.00831  [pdf, other

    cs.CV

    Improving 3D Object Detection through Progressive Population Based Augmentation

    Authors: Shuyang Cheng, Zhaoqi Leng, Ekin Dogus Cubuk, Barret Zoph, Chunyan Bai, Jiquan Ngiam, Yang Song, Benjamin Caine, Vijay Vasudevan, Congcong Li, Quoc V. Le, Jonathon Shlens, Dragomir Anguelov

    Abstract: Data augmentation has been widely adopted for object detection in 3D point clouds. However, all previous related efforts have focused on manually designing specific data augmentation methods for individual architectures. In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection. We introduce the Progressive Population Based Augmentation… ▽ More

    Submitted 16 July, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: Accepted at ECCV 2020

  19. arXiv:2002.02959  [pdf, other

    cs.CV cs.LG stat.ML

    Revisiting Spatial Invariance with Low-Rank Local Connectivity

    Authors: Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith

    Abstract: Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias. Locally connected layers, which differ from convolutional layers only in their lack of spatial invariance, usually perform poorly in practice. However, these observations still leave open the possibi… ▽ More

    Submitted 14 August, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Journal ref: International Conference on Machine Learning, 2020

  20. arXiv:1912.04838  [pdf, other

    cs.CV cs.LG stat.ML

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset

    Authors: Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov

    Abstract: The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help a… ▽ More

    Submitted 12 May, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020

  21. arXiv:1909.13719  [pdf, other

    cs.CV

    RandAugment: Practical automated data augmentation with a reduced search space

    Authors: Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, Quoc V. Le

    Abstract: Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detection. While these strategies were optimized for improving validation accuracy, they also led to state-of-the-art results in semi-supervised learnin… ▽ More

    Submitted 13 November, 2019; v1 submitted 30 September, 2019; originally announced September 2019.

    Comments: Added ablation experiments

  22. arXiv:1908.11069  [pdf, other

    cs.CV

    StarNet: Targeted Computation for Object Detection in Point Clouds

    Authors: Jiquan Ngiam, Benjamin Caine, Wei Han, Brandon Yang, Yuning Chai, Pei Sun, Yin Zhou, Xi Yi, Ouais Alsharif, Patrick Nguyen, Zhifeng Chen, Jonathon Shlens, Vijay Vasudevan

    Abstract: Detecting objects from LiDAR point clouds is an important component of self-driving car technology as LiDAR provides high resolution spatial information. Previous work on point-cloud 3D object detection has re-purposed convolutional approaches from traditional camera imagery. In this work, we present an object detection system called StarNet designed specifically to take advantage of the sparse an… ▽ More

    Submitted 2 December, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

  23. arXiv:1906.11172  [pdf, other

    cs.CV cs.LG

    Learning Data Augmentation Strategies for Object Detection

    Authors: Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

    Abstract: Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this w… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

  24. arXiv:1906.08988  [pdf, other

    cs.LG cs.CV stat.ML

    A Fourier Perspective on Model Robustness in Computer Vision

    Authors: Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D. Cubuk, Justin Gilmer

    Abstract: Achieving robustness to distributional shift is a longstanding and challenging goal of computer vision. Data augmentation is a commonly used approach for improving robustness, however robustness gains are typically not uniform across corruption types. Indeed increasing performance in the presence of random noise is often met with reduced performance on other corruptions such as contrast change. Un… ▽ More

    Submitted 16 September, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019

  25. arXiv:1906.05909  [pdf, other

    cs.CV

    Stand-Alone Self-Attention in Vision Models

    Authors: Prajit Ramachandran, Niki Parmar, Ashish Vaswani, Irwan Bello, Anselm Levskaya, Jonathon Shlens

    Abstract: Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  26. arXiv:1906.05721  [pdf, other

    cs.CV eess.IV

    Visual Wake Words Dataset

    Authors: Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard, Rocky Rhodes

    Abstract: The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: 10 pages, 4 figures

    ACM Class: I.2.10; B.7.1; I.5.2

  27. arXiv:1906.03367  [pdf, other

    cs.LG stat.ML

    Using learned optimizers to make models robust to input noise

    Authors: Luke Metz, Niru Maheswaranathan, Jonathon Shlens, Jascha Sohl-Dickstein, Ekin D. Cubuk

    Abstract: State-of-the art vision models can achieve superhuman performance on image classification tasks when testing and training data come from the same distribution. However, when models are tested on corrupted images (e.g. due to scale changes, translations, or shifts in brightness or contrast), performance degrades significantly. Here, we explore the possibility of meta-training a learned optimizer th… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  28. arXiv:1904.10076  [pdf, other

    cs.CV cs.LG

    Using Videos to Evaluate Image Model Robustness

    Authors: Keren Gu, Brandon Yang, Jiquan Ngiam, Quoc Le, Jonathon Shlens

    Abstract: Human visual systems are robust to a wide range of image transformations that are challenging for artificial networks. We present the first study of image model robustness to the minute transformations found across video frames, which we term "natural robustness". Compared to previous studies on adversarial examples and synthetic distortions, natural robustness captures a more diverse set of commo… ▽ More

    Submitted 29 August, 2019; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: Video Robustness Dataset included in directory

  29. arXiv:1904.09925  [pdf, other

    cs.CV

    Attention Augmented Convolutional Networks

    Authors: Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le

    Abstract: Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and genera… ▽ More

    Submitted 9 September, 2020; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  30. arXiv:1904.02632  [pdf, other

    cs.CV cs.LG stat.ML

    A Learned Representation for Scalable Vector Graphics

    Authors: Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens

    Abstract: Dramatic advances in generative models have resulted in near photographic quality for artificially rendered faces, animals and other objects in the natural world. In spite of such advances, a higher level understanding of vision and imagery does not arise from exhaustively modeling an object, but instead identifying higher-level attributes that best summarize the aspects of an object. In this work… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

  31. arXiv:1903.00925  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Accelerating Training of Deep Neural Networks with a Standardization Loss

    Authors: Jasmine Collins, Johannes Balle, Jonathon Shlens

    Abstract: A significant advance in accelerating neural network training has been the development of normalization methods, permitting the training of deep models both faster and with better accuracy. These advances come with practical challenges: for instance, batch normalization ties the prediction of individual examples with other examples within a batch, resulting in a network that is heavily dependent o… ▽ More

    Submitted 3 March, 2019; originally announced March 2019.

    Comments: Technical report. Results presented at WiML 2018

  32. arXiv:1809.04184  [pdf, other

    cs.CV cs.LG stat.ML

    Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

    Authors: Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens

    Abstract: The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: Accepted by NIPS 2018

  33. arXiv:1805.08974  [pdf, other

    cs.CV cs.LG stat.ML

    Do Better ImageNet Models Transfer Better?

    Authors: Simon Kornblith, Jonathon Shlens, Quoc V. Le

    Abstract: Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance… ▽ More

    Submitted 17 June, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: CVPR 2019 Oral

  34. arXiv:1803.06092  [pdf, other

    cs.AI cs.CV cs.LG

    A Dataset and Architecture for Visual Reasoning with a Working Memory

    Authors: Guangyu Robert Yang, Igor Ganichev, Xiao-**g Wang, Jonathon Shlens, David Sussillo

    Abstract: A vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli such as in video analysis or game play. Inspired by a rich tradition of visual reasoning and memory in cognitive psychology and neuroscience, we developed an artificial, configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. COG is much… ▽ More

    Submitted 20 July, 2018; v1 submitted 16 March, 2018; originally announced March 2018.

  35. arXiv:1712.00559  [pdf, other

    cs.CV cs.LG stat.ML

    Progressive Neural Architecture Search

    Authors: Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

    Abstract: We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate mode… ▽ More

    Submitted 26 July, 2018; v1 submitted 2 December, 2017; originally announced December 2017.

    Comments: To appear in ECCV 2018 as oral. The code and checkpoint for PNASNet-5 trained on ImageNet (both Mobile and Large) can now be downloaded from https://github.com/tensorflow/models/tree/master/research/slim#Pretrained. Also see https://github.com/chenxi116/PNASNet.TF for refactored and simplified TensorFlow code; see https://github.com/chenxi116/PNASNet.pytorch for exact conversion to PyTorch

  36. arXiv:1711.10151  [pdf, other

    cs.CV

    Recurrent Segmentation for Variable Computational Budgets

    Authors: Lane McIntosh, Niru Maheswaranathan, David Sussillo, Jonathon Shlens

    Abstract: State-of-the-art systems for semantic image segmentation use feed-forward pipelines with fixed computational costs. Building an image segmentation system that works across a range of computational budgets is challenging and time-intensive as new architectures must be designed and trained for every computational setting. To address this problem we develop a recurrent neural network that successivel… ▽ More

    Submitted 14 March, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

  37. arXiv:1707.07012  [pdf, other

    cs.CV cs.LG stat.ML

    Learning Transferable Architectures for Scalable Image Recognition

    Authors: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le

    Abstract: Develo** neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. The key… ▽ More

    Submitted 11 April, 2018; v1 submitted 21 July, 2017; originally announced July 2017.

  38. arXiv:1705.07208  [pdf, other

    cs.CV cs.LG

    PixColor: Pixel Recursive Colorization

    Authors: Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy

    Abstract: We propose a novel approach to automatically produce multiple colorized versions of a grayscale image. Our method results from the observation that the task of automated colorization is relatively easy given a low-resolution version of the color image. We first train a conditional PixelCNN to generate a low resolution color for a given grayscale image. Then, given the generated low-resolution colo… ▽ More

    Submitted 5 June, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

  39. arXiv:1705.06830  [pdf, other

    cs.CV

    Exploring the structure of a real-time, arbitrary neural artistic stylization network

    Authors: Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens

    Abstract: In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters… ▽ More

    Submitted 24 August, 2017; v1 submitted 18 May, 2017; originally announced May 2017.

    Comments: Accepted as an oral presentation at British Machine Vision Conference (BMVC) 2017

  40. arXiv:1702.00824  [pdf, other

    cs.CV

    YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

    Authors: Esteban Real, Jonathon Shlens, Stefano Mazzocchi, Xin Pan, Vincent Vanhoucke

    Abstract: We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. T… ▽ More

    Submitted 24 March, 2017; v1 submitted 2 February, 2017; originally announced February 2017.

    Comments: Accepted at the Conference on Computer Vision and Pattern Recognition (CVPR) 2017

    ACM Class: I.4.8; I.2.10; I.5.2; I.5.1

  41. arXiv:1702.00783  [pdf, other

    cs.CV cs.LG

    Pixel Recursive Super Resolution

    Authors: Ryan Dahl, Mohammad Norouzi, Jonathon Shlens

    Abstract: We present a pixel recursive super resolution model that synthesizes realistic details into images while enhancing their resolution. A low resolution image may correspond to multiple plausible high resolution images, thus modeling the super resolution process with a pixel independent conditional model often results in averaging different details--hence blurry edges. By contrast, our model is able… ▽ More

    Submitted 22 March, 2017; v1 submitted 2 February, 2017; originally announced February 2017.

  42. arXiv:1610.09585  [pdf, other

    stat.ML cs.CV

    Conditional Image Synthesis With Auxiliary Classifier GANs

    Authors: Augustus Odena, Christopher Olah, Jonathon Shlens

    Abstract: Synthesizing high resolution photorealistic images has been a long-standing challenge in machine learning. In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. We construct a variant of GANs employing label conditioning that results in 128x128 resolution image samples exhibiting global coherence. We expand on previous work… ▽ More

    Submitted 20 July, 2017; v1 submitted 29 October, 2016; originally announced October 2016.

  43. arXiv:1610.07629  [pdf, other

    cs.CV cs.LG

    A Learned Representation For Artistic Style

    Authors: Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur

    Abstract: The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the a… ▽ More

    Submitted 9 February, 2017; v1 submitted 24 October, 2016; originally announced October 2016.

    Comments: 9 pages. 15 pages of Appendix, International Conference on Learning Representations (ICLR) 2017

  44. arXiv:1603.04467  [pdf, other

    cs.DC cs.LG

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    Authors: Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah , et al. (15 additional authors not shown)

    Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational de… ▽ More

    Submitted 16 March, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

    Comments: Version 2 updates only the metadata, to correct the formatting of Martín Abadi's name

  45. arXiv:1512.00567  [pdf, other

    cs.CV

    Rethinking the Inception Architecture for Computer Vision

    Authors: Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna

    Abstract: Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided… ▽ More

    Submitted 11 December, 2015; v1 submitted 1 December, 2015; originally announced December 2015.

  46. arXiv:1511.05644  [pdf, other

    cs.LG

    Adversarial Autoencoders

    Authors: Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan Frey

    Abstract: In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. Matching the aggregated posterior to the prior ensures that generating from a… ▽ More

    Submitted 24 May, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  47. arXiv:1511.05641  [pdf, other

    cs.LG

    Net2Net: Accelerating Learning via Knowledge Transfer

    Authors: Tianqi Chen, Ian Goodfellow, Jonathon Shlens

    Abstract: We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scra… ▽ More

    Submitted 23 April, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

    Comments: ICLR 2016 submission

  48. arXiv:1412.7479  [pdf, ps, other

    cs.NE cs.LG

    Deep Networks With Large Output Spaces

    Authors: Sudheendra Vijayanarasimhan, Jonathon Shlens, Rajat Monga, Jay Yagnik

    Abstract: Deep neural networks have been extremely successful at various image, speech, video recognition tasks because of their ability to model deep structures within the data. However, they are still prohibitively expensive to train and apply for problems containing millions of classes in the output layer. Based on the observation that the key computation common to most neural network layers is a vector/… ▽ More

    Submitted 10 April, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

  49. arXiv:1412.6572  [pdf, other

    stat.ML cs.LG

    Explaining and Harnessing Adversarial Examples

    Authors: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy

    Abstract: Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfittin… ▽ More

    Submitted 20 March, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

  50. arXiv:1404.2986  [pdf, other

    cs.LG stat.ML

    A Tutorial on Independent Component Analysis

    Authors: Jonathon Shlens

    Abstract: Independent component analysis (ICA) has become a standard data analysis technique applied to an array of problems in signal processing and machine learning. This tutorial provides an introduction to ICA based on linear algebra formulating an intuition for ICA from first principles. The goal of this tutorial is to provide a solid foundation on this advanced topic so that one might learn the motiva… ▽ More

    Submitted 10 April, 2014; originally announced April 2014.