-
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking
Authors:
Arpit Bansal,
Avi Schwarzschild,
Eitan Borgnia,
Zeyad Emam,
Furong Huang,
Micah Goldblum,
Tom Goldstein
Abstract:
Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved thr…
▽ More
Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking." We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard extrapolation tasks.
△ Less
Submitted 14 October, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Active Learning at the ImageNet Scale
Authors:
Zeyad Ali Sami Emam,
Hong-Min Chu,
**-Yeh Chiang,
Wojciech Czaja,
Richard Leapman,
Micah Goldblum,
Tom Goldstein
Abstract:
Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-superv…
▽ More
Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-supervised pretraining (SSP) highlights the importance of harnessing abundant unlabeled data to boost model performance. By combining AL with SSP, we can make use of unlabeled data while simultaneously labeling and training on particularly informative samples.
In this work, we study a combination of AL and SSP on ImageNet. We find that performance on small toy datasets -- the typical benchmark setting in the literature -- is not representative of performance on ImageNet due to the class imbalanced samples selected by an active learner. Among the existing baselines we test, popular AL algorithms across a variety of small and large scale settings fail to outperform random sampling. To remedy the class-imbalance problem, we propose Balanced Selection (BASE), a simple, scalable AL algorithm that outperforms random sampling consistently by selecting more balanced samples for annotation than existing methods. Our code is available at: https://github.com/zeyademam/active_learning .
△ Less
Submitted 24 November, 2021;
originally announced November 2021.
-
Datasets for Studying Generalization from Easy to Hard Examples
Authors:
Avi Schwarzschild,
Eitan Borgnia,
Arjun Gupta,
Arpit Bansal,
Zeyad Emam,
Furong Huang,
Micah Goldblum,
Tom Goldstein
Abstract:
We describe new datasets for studying generalization from easy to hard examples.
We describe new datasets for studying generalization from easy to hard examples.
△ Less
Submitted 25 September, 2021; v1 submitted 12 August, 2021;
originally announced August 2021.
-
On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Develo** Deep Learning Models
Authors:
Zeyad Emam,
Andrew Kondrich,
Sasha Harrison,
Felix Lau,
Yushi Wang,
Aerin Kim,
Elliot Branson
Abstract:
High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant. Consequently, only a minority of publications in the computer vision comm…
▽ More
High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant. Consequently, only a minority of publications in the computer vision community tackle supervised learning on datasets that are orders of magnitude larger than Imagenet. In this paper, we survey computer vision research domains that study the effects of such large datasets on model performance across different vision tasks. We summarize the community's current understanding of those effects, and highlight some open questions related to training with massive datasets. In particular, we tackle: (a) The largest datasets currently used in computer vision research and the interesting takeaways from training on such datasets; (b) The effectiveness of pre-training on large datasets; (c) Recent advancements and hurdles facing synthetic datasets; (d) An overview of double descent and sample non-monotonicity phenomena; and finally, (e) A brief discussion of lifelong/continual learning and how it fares compared to learning from huge labeled datasets in an offline setting. Overall, our findings are that research on optimization for deep learning focuses on perfecting the training routine and thus making DL models less data hungry, while research on synthetic datasets aims to offset the cost of data labeling. However, for the time being, acquiring non-synthetic labeled data remains indispensable to boost performance.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Understanding Generalization through Visualizations
Authors:
W. Ronny Huang,
Zeyad Emam,
Micah Goldblum,
Liam Fowl,
J. K. Terry,
Furong Huang,
Tom Goldstein
Abstract:
The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization me…
▽ More
The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.
△ Less
Submitted 14 November, 2020; v1 submitted 7 June, 2019;
originally announced June 2019.