-
Adaptive Crowdsourcing Via Self-Supervised Learning
Authors:
Anmol Kagrecha,
Henrik Marklund,
Benjamin Van Roy,
Hong Jun Jeon,
Richard Zeckhauser
Abstract:
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across c…
▽ More
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across crowdworkers or their estimates correlate, the weighted sum offers a more accurate group estimate than the average. Existing algorithms such as expectation maximization can, at least in principle, produce similarly accurate group estimates. However, their computational requirements become onerous when complex models, such as neural networks, are required to express relationships among crowdworkers. Predict-each-worker accommodates such complexity as well as many other practical challenges. We analyze the efficacy of predict-each-worker through theoretical and computational studies. Among other things, we establish asymptotic optimality as the number of engagements per crowdworker grows.
△ Less
Submitted 1 February, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Maintaining Plasticity in Continual Learning via Regenerative Regularization
Authors:
Saurabh Kumar,
Henrik Marklund,
Benjamin Van Roy
Abstract:
In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regula…
▽ More
In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters should drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On problems representative of different types of nonstationarity in continual supervised learning, we demonstrate that L2 Init most consistently mitigates plasticity loss compared to previously proposed approaches.
△ Less
Submitted 3 October, 2023; v1 submitted 23 August, 2023;
originally announced August 2023.
-
Continual Learning as Computationally Constrained Reinforcement Learning
Authors:
Saurabh Kumar,
Henrik Marklund,
Ashish Rao,
Yifan Zhu,
Hong Jun Jeon,
Yueyang Liu,
Benjamin Van Roy
Abstract:
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning,…
▽ More
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.
△ Less
Submitted 20 August, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
Extending the WILDS Benchmark for Unsupervised Adaptation
Authors:
Shiori Sagawa,
Pang Wei Koh,
Tony Lee,
Irena Gao,
Sang Michael Xie,
Kendrick Shen,
Ananya Kumar,
Weihua Hu,
Michihiro Yasunaga,
Henrik Marklund,
Sara Beery,
Etienne David,
Ian Stavness,
Wei Guo,
Jure Leskovec,
Kate Saenko,
Tatsunori Hashimoto,
Sergey Levine,
Chelsea Finn,
Percy Liang
Abstract:
Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribu…
▽ More
Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribution shift benchmarks with unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). The update maintains consistency with the original WILDS benchmark by using identical labeled training, validation, and test sets, as well as the evaluation metrics. On these datasets, we systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.
△ Less
Submitted 23 April, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
WILDS: A Benchmark of in-the-Wild Distribution Shifts
Authors:
Pang Wei Koh,
Shiori Sagawa,
Henrik Marklund,
Sang Michael Xie,
Marvin Zhang,
Akshay Balsubramani,
Weihua Hu,
Michihiro Yasunaga,
Richard Lanas Phillips,
Irena Gao,
Tony Lee,
Etienne David,
Ian Stavness,
Wei Guo,
Berton A. Earnshaw,
Imran S. Haque,
Sara Beery,
Jure Leskovec,
Anshul Kundaje,
Emma Pierson,
Sergey Levine,
Chelsea Finn,
Percy Liang
Abstract:
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchma…
▽ More
Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty map**. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.
△ Less
Submitted 16 July, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Adaptive Risk Minimization: Learning to Adapt to Domain Shift
Authors:
Marvin Zhang,
Henrik Marklund,
Nikita Dhawan,
Abhishek Gupta,
Sergey Levine,
Chelsea Finn
Abstract:
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested under distribution shift, due to changing temporal correlations, atypical end users, or other factors. In this work, we consider the p…
▽ More
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested under distribution shift, due to changing temporal correlations, atypical end users, or other factors. In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts, corresponding to new domains or domain distributions. Most prior methods aim to learn a single robust model or invariant feature space that performs well on all domains. In contrast, we aim to learn models that adapt at test time to domain shift using unlabeled test points. Our primary contribution is to introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains. Compared to prior methods for robustness, invariance, and adaptation, ARM methods provide performance gains of 1-4% test accuracy on a number of image classification problems exhibiting domain shift.
△ Less
Submitted 1 December, 2021; v1 submitted 6 July, 2020;
originally announced July 2020.
-
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Authors:
Jeremy Irvin,
Pranav Rajpurkar,
Michael Ko,
Yifan Yu,
Silviana Ciurea-Ilcus,
Chris Chute,
Henrik Marklund,
Behzad Haghgoo,
Robyn Ball,
Katie Shpanskaya,
Jayne Seekins,
David A. Mong,
Safwan S. Halabi,
Jesse K. Sandberg,
Ricky Jones,
David B. Larson,
Curtis P. Langlotz,
Bhavik N. Patel,
Matthew P. Lungren,
Andrew Y. Ng
Abstract:
Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We invest…
▽ More
Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.
The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .
△ Less
Submitted 21 January, 2019;
originally announced January 2019.