-
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Authors:
Jifan Zhang,
Lalit Jain,
Yang Guo,
Jiayi Chen,
Kuan Lok Zhou,
Siddharth Suresh,
Andrew Wagenmaker,
Scott Sievert,
Timothy Rogers,
Kevin Jamieson,
Robert Mankoff,
Robert Nowak
Abstract:
We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning…
▽ More
We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even state-of-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Authors:
Subhojyoti Mukherjee,
Josiah P. Hanna,
Qiaomin Xie,
Robert Nowak
Abstract:
In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumulative regret for an unseen but related test task. We use a transformer as a decision-making algorithm to learn this shared structure so as to general…
▽ More
In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumulative regret for an unseen but related test task. We use a transformer as a decision-making algorithm to learn this shared structure so as to generalize to the test task. The prior work of pretrained decision transformers like DPT requires access to the optimal action during training which may be hard in several scenarios. Diverging from these works, our learning algorithm does not need the knowledge of optimal action per task during training but predicts a reward vector for each of the actions using only the observed offline data from the diverse training tasks. Finally, during inference time, it selects action using the reward predictions employing various exploration strategies in-context for an unseen test task. Our model outperforms other SOTA methods like DPT, and Algorithmic Distillation over a series of experiments on several structured bandit problems (linear, bilinear, latent, non-linear). Interestingly, we show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context by leveraging the shared structure across diverse tasks. We further extend the field of pre-trained decision transformers by showing that they can leverage unseen tasks with new actions and still learn the underlying latent structure to derive a near-optimal policy. We validate this over several experiments to show that our proposed solution is very general and has wide applications to potentially emergent online and offline strategies at test time. Finally, we theoretically analyze the performance of our algorithm and obtain generalization bounds in the in-context multi-task learning setting.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
ReLUs Are Sufficient for Learning Implicit Neural Representations
Authors:
Joseph Shenouda,
Yamin Zhou,
Robert D. Nowak
Abstract:
Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to…
▽ More
Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to remedy the spectral bias. This in turn enables its use for various INR tasks. Empirically, we demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons. Next, by leveraging recent theoretical works which characterize the kinds of functions ReLU neural networks learn, we provide a way to quantify the regularity of the learned function. This offers a principled approach to selecting the hyperparameters in INR architectures. We substantiate our claims through experiments in signal representation, super resolution, and computed tomography, demonstrating the versatility and effectiveness of our method. The code for all experiments can be found at https://github.com/joeshenouda/relu-inrs.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP
Authors:
Subhojyoti Mukherjee,
Josiah P. Hanna,
Robert Nowak
Abstract:
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accu…
▽ More
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accurate evaluation of the target policy. While prior work has considered behavior policy selection, in this paper, we additionally consider a safety constraint on the behavior policy. Namely, we assume there exists a known default policy that incurs a particular expected cost when run and we enforce that the cumulative cost of all behavior policies ran is better than a constant factor of the cost that would be incurred had we always run the default policy. We first show that there exists a class of intractable MDPs where no safe oracle algorithm with knowledge about problem parameters can efficiently collect data and satisfy the safety constraints. We then define the tractability condition for an MDP such that a safe oracle algorithm can efficiently collect data and using that we prove the first lower bound for this setting. We then introduce an algorithm SaVeR for this problem that approximates the safe oracle algorithm and bound the finite-sample mean squared error of the algorithm while ensuring it satisfies the safety constraint. Finally, we show in simulations that SaVeR produces low MSE policy evaluation while satisfying the safety constraint.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Graph Vertex Embeddings: Distance, Regularization and Community Detection
Authors:
Radosław Nowak,
Adam Małkowski,
Daniel Cieślak,
Piotr Sokół,
Paweł Wawrzyński
Abstract:
Graph embeddings have emerged as a powerful tool for representing complex network structures in a low-dimensional space, enabling the use of efficient methods that employ the metric structure in the embedding space as a proxy for the topological structure of the data. In this paper, we explore several aspects that affect the quality of a vertex embedding of graph-structured data. To this effect, w…
▽ More
Graph embeddings have emerged as a powerful tool for representing complex network structures in a low-dimensional space, enabling the use of efficient methods that employ the metric structure in the embedding space as a proxy for the topological structure of the data. In this paper, we explore several aspects that affect the quality of a vertex embedding of graph-structured data. To this effect, we first present a family of flexible distance functions that faithfully capture the topological distance between different vertices. Secondly, we analyze vertex embeddings as resulting from a fitted transformation of the distance matrix rather than as a direct result of optimization. Finally, we evaluate the effectiveness of our proposed embedding constructions by performing community detection on a host of benchmark datasets. The reported results are competitive with classical algorithms that operate on the entire graph while benefitting from a substantially reduced computational complexity due to the reduced dimensionality of the representations.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable Environments
Authors:
Jeongyeol Kwon,
Liu Yang,
Robert Nowak,
Josiah Hanna
Abstract:
Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. However, the effectiveness of such auxiliary tasks has not been fully convincing, especially in partially observable environments that require long-term m…
▽ More
Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. However, the effectiveness of such auxiliary tasks has not been fully convincing, especially in partially observable environments that require long-term memorization and inference. In this empirical study, we investigate the effectiveness of future prediction for learning the representations of histories, possibly of extensive length, in partially observable environments. We first introduce an approach that decouples the task of learning history representations from policy optimization via future prediction. Then, our main contributions are two-fold: (a) we demonstrate that the performance of reinforcement learning is strongly correlated with the prediction accuracy of future observations in partially observable environments, and (b) our approach can significantly improve the overall end-to-end approach by preventing high-variance noisy signals from reinforcement learning objectives to influence the representation learning. We illustrate our claims on three types of benchmarks that necessitate the ability to process long histories for high returns.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Learning from the Best: Active Learning for Wireless Communications
Authors:
Nasim Soltani,
Jifan Zhang,
Batool Salehi,
Debashri Roy,
Robert Nowak,
Kaushik Chowdhury
Abstract:
Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may involve private intellectual properties, and is often computationally and financially expensive. Active learning is an emerging area of research in machine learning that aims to redu…
▽ More
Collecting an over-the-air wireless communications training dataset for deep learning-based communication tasks is relatively simple. However, labeling the dataset requires expert involvement and domain knowledge, may involve private intellectual properties, and is often computationally and financially expensive. Active learning is an emerging area of research in machine learning that aims to reduce the labeling overhead without accuracy degradation. Active learning algorithms identify the most critical and informative samples in an unlabeled dataset and label only those samples, instead of the complete set. In this paper, we introduce active learning for deep learning applications in wireless communications, and present its different categories. We present a case study of deep learning-based mmWave beam selection, where labeling is performed by a compute-intensive algorithm based on exhaustive search. We evaluate the performance of different active learning algorithms on a publicly available multi-modal dataset with different modalities including image and LiDAR. Our results show that using an active learning algorithm for class-imbalanced datasets can reduce labeling overhead by up to 50% for this dataset while maintaining the same accuracy as classical training.
△ Less
Submitted 23 January, 2024;
originally announced February 2024.
-
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Authors:
Gantavya Bhatt,
Yifang Chen,
Arnav M. Das,
Jifan Zhang,
Sang T. Truong,
Stephen Mussmann,
Yinglun Zhu,
Jeffrey Bilmes,
Simon S. Du,
Kevin Jamieson,
Jordan T. Ash,
Robert D. Nowak
Abstract:
Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues t…
▽ More
Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
△ Less
Submitted 6 May, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Lock-free de Bruijn graph
Authors:
Daniel Górniak,
Robert Nowak
Abstract:
De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in modern computers. The assembly task is an indispensable step in sequencing genomes of new organisms and studying structural genomic changes. In recent years, the…
▽ More
De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in modern computers. The assembly task is an indispensable step in sequencing genomes of new organisms and studying structural genomic changes. In recent years, the dynamic development of next-generation sequencing (NGS) methods raises hopes for making whole-genome sequencing a fast and reliable tool used, for example, in medical diagnostics. However, this is hampered by the slowness and computational requirements of the current processing algorithms, which raises the need to develop more efficient algorithms. One possible approach, still little explored, is the use of quantum computing.
We created the lock-free version of the de Bruijn graph, as well as a lock-free algorithm to build such graph from reads. Our algorithm and data structures are developed to use parallel threads of execution and do not use mutexes or other locking mechanisms, instead, we used only compare-and-swap instruction and other atomic operations. It makes our algorithm very fast and efficiently scaling.
The presented article depicts the new lock-free de Bruijn graph data structure with a graph build algorithm. We developed a C++ library and tested its performance to depict its high speed and scalability compared to other available tools.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
DIRECT: Deep Active Learning under Imbalance and Label Noise
Authors:
Shyam Nuggehalli,
Jifan Zhang,
Lalit Jain,
Robert Nowak
Abstract:
Class imbalance is a prevalent issue in real world machine learning applications, often leading to poor performance in rare and minority classes. With an abundance of wild unlabeled data, active learning is perhaps the most effective technique in solving the problem at its root -- collecting a more balanced and informative set of labeled examples during annotation. Label noise is another common is…
▽ More
Class imbalance is a prevalent issue in real world machine learning applications, often leading to poor performance in rare and minority classes. With an abundance of wild unlabeled data, active learning is perhaps the most effective technique in solving the problem at its root -- collecting a more balanced and informative set of labeled examples during annotation. Label noise is another common issue in data annotation jobs, which is especially challenging for active learning methods. In this work, we conduct the first study of active learning under both class imbalance and label noise. We propose a novel algorithm that robustly identifies the class separation threshold and annotates the most uncertain examples that are closest from it. Through a novel reduction to one-dimensional active learning, our algorithm DIRECT is able to leverage the classic active learning literature to address issues such as batch labeling and tolerance towards label noise. We present extensive experiments on imbalanced datasets with and without label noise. Our results demonstrate that DIRECT can save more than 60% of the annotation budget compared to state-of-art active learning algorithms and more than 80% of annotation budget compared to random sampling.
△ Less
Submitted 20 May, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Looped Transformers are Better at Learning Learning Algorithms
Authors:
Liu Yang,
Kangwook Lee,
Robert Nowak,
Dimitris Papailiopoulos
Abstract:
Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utili…
▽ More
Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of looped transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10% of the parameter count.
△ Less
Submitted 16 March, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Multi-task Representation Learning for Pure Exploration in Bilinear Bandits
Authors:
Subhojyoti Mukherjee,
Qiaomin Xie,
Josiah P. Hanna,
Robert Nowak
Abstract:
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common l…
▽ More
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common low-dimensional linear representation. The objective is to leverage this characteristic to expedite the process of identifying the best pair of arms for all tasks. We propose the algorithm GOBLIN that uses an experimental design approach to optimize sample allocations for learning the global representation as well as minimize the number of samples needed to identify the optimal pair of arms in individual tasks. To the best of our knowledge, this is the first study to give sample complexity analysis for pure exploration in bilinear bandits with shared representation. Our results demonstrate that by learning the shared representation across tasks, we achieve significantly improved sample complexity compared to the traditional approach of solving tasks independently.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Superconductivity in Compositionally-Complex Cuprates with the YBa$_2$Cu$_3$O$_{7-x}$ Structure
Authors:
Aditya Raghavan,
Nathan Arndt,
Nayelie Morales-Colón,
Eli Wennen,
Megan Wolfe,
Carolina Oliveira Gandin,
Kade Nelson,
Robert Nowak,
Sam Dillon,
Keon Sahebkar,
Ryan F. Need
Abstract:
High-temperature superconductivity is reported in a series of compositionally-complex cuprates with varying degrees of size and spin disorder. Three compositions of Y-site alloyed YBa$_2$Cu$_3$O$_{7-x}$, i.e., (5Y)BCO, were prepared using solid-state methods with different sets of rare earth ions on the Y-site. Synchrotron X-ray diffraction and energy-dispersive X-ray spectroscopy confirm these sa…
▽ More
High-temperature superconductivity is reported in a series of compositionally-complex cuprates with varying degrees of size and spin disorder. Three compositions of Y-site alloyed YBa$_2$Cu$_3$O$_{7-x}$, i.e., (5Y)BCO, were prepared using solid-state methods with different sets of rare earth ions on the Y-site. Synchrotron X-ray diffraction and energy-dispersive X-ray spectroscopy confirm these samples have high phase-purity and homogeneous mixing of the Y-site elements. The superconducting phase transition was probed using electrical resistivity and AC magnetometry measurements, which reveal the transition temperature, T$_C$, is greater than 91 K for all series when near optimal oxygen do**. Importantly, these T$_C$ values are only $\approx$1$\%$ suppressed relative to pure YBCO (T$_C$ = 93 K). This result highlights the robustness of pairing in the YBCO structure to specific types of disorder. In addition, the chemical flexibility of compositionally-complex cuprates allows spin and lattice disorder to be decoupled to a degree not previously possible in high-temperature superconductors. This feature makes compositionally-complex cuprates a uniquely well-suited materials platform for studying proposed pairing interactions in cuprates.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation
Authors:
Jeongyeol Kwon,
Dohyun Kwon,
Stephen Wright,
Robert Nowak
Abstract:
In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty paramet…
▽ More
In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter $σ> 0$. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be $O(σ)$-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as $O(σ)$-approximation of the original BO, we propose first-order algorithms that find an $ε$-stationary solution by optimizing the penalty formulation with $σ= O(ε)$. When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an $ε$-stationary point of the penalty function, using in total $O(ε^{-3})$ and $O(ε^{-7})$ accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, i.e., with $O(1)$ samples per iteration, and achieves the improved oracle-complexity of $O(ε^{-3})$ and $O(ε^{-5})$, respectively.
△ Less
Submitted 11 February, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Weighted variation spaces and approximation by shallow ReLU networks
Authors:
Ronald DeVore,
Robert D. Nowak,
Rahul Parhi,
Jonathan W. Siegel
Abstract:
We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that int…
▽ More
We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $Ω$ whose approximation rates avoid the curse of dimensionality. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes.
The present paper is concerned with the definition of these novel model classes on domains $Ω$. The current definition of these model classes does not depend on the domain $Ω$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Authors:
Jifan Zhang,
Yifang Chen,
Gregory Canal,
Stephen Mussmann,
Arnav M. Das,
Gantavya Bhatt,
Yinglun Zhu,
Jeffrey Bilmes,
Simon Shaolei Du,
Kevin Jamieson,
Robert D Nowak
Abstract:
Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires…
▽ More
Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.
△ Less
Submitted 1 March, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection
Authors:
Haoyue Bai,
Gregory Canal,
Xuefeng Du,
Jeongyeol Kwon,
Robert Nowak,
Yixuan Li
Abstract:
Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detection respectively. While both problems have received significant research attention lately, they have been pursued independently. This may not be surprising, since the two tasks have seemingly conflicting goals. T…
▽ More
Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detection respectively. While both problems have received significant research attention lately, they have been pursued independently. This may not be surprising, since the two tasks have seemingly conflicting goals. This paper provides a new unified approach that is capable of simultaneously generalizing to covariate shifts while robustly detecting semantic shifts. We propose a margin-based learning framework that exploits freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. We show both empirically and theoretically that the proposed margin constraint is the key to achieving both OOD generalization and detection. Extensive experiments show the superiority of our framework, outperforming competitive baselines that specialize in either OOD generalization or OOD detection. Code is publicly available at https://github.com/deeplearning-wisc/scone.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression
Authors:
Joseph Shenouda,
Rahul Parhi,
Kangwook Lee,
Robert D. Nowak
Abstract:
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper unders…
▽ More
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper understanding of multi-output networks and their function-space characteristics. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This representer theorem establishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks, shedding new light on multi-task learning with neural networks. Finally, this paper develops a connection between weight-decay regularization and the multi-task lasso problem. This connection leads to novel bounds for layer widths in deep networks that depend on the intrinsic dimensions of the training data representations. This insight not only deepens the understanding of the deep network architectural requirements, but also yields a simple convex optimization method for deep neural network compression. The performance of this compression procedure is evaluated on various architectures.
△ Less
Submitted 9 March, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Filtered Iterative Denoising for Linear Inverse Problems
Authors:
Danica Fliss,
Willem Marais,
Robert D. Nowak
Abstract:
Iterative denoising algorithms (IDAs) have been tremendously successful in a range of linear inverse problems arising in signal and image processing. The classic instance of this is the famous Iterative Soft-Thresholding Algorithm (ISTA), based on soft-thresholding of wavelet coefficients. More modern approaches to IDAs replace soft-thresholding with a black-box denoiser, such as BM3D or a learned…
▽ More
Iterative denoising algorithms (IDAs) have been tremendously successful in a range of linear inverse problems arising in signal and image processing. The classic instance of this is the famous Iterative Soft-Thresholding Algorithm (ISTA), based on soft-thresholding of wavelet coefficients. More modern approaches to IDAs replace soft-thresholding with a black-box denoiser, such as BM3D or a learned deep neural network denoiser. These are often referred to as ``plug-and-play" (PnP) methods because, in principle, an off-the-shelf denoiser can be used for a variety of different inverse problems. The problem with PnP methods is that they may not provide the best solutions to a specific linear inverse problem; better solutions can often be obtained by a denoiser that is customized to the problem domain. A problem-specific denoiser, however, requires expensive re-engineering or re-learning which eliminates the simplicity and ease that makes PnP methods attractive in the first place. This paper proposes a new IDA that allows one to use a general, black-box denoiser more effectively via a simple linear filtering modification to the usual gradient update steps that accounts for the specific linear inverse problem. The proposed Filtered IDA (FIDA) is mathematically derived from the classical ISTA and wavelet denoising viewpoint. We show experimentally that FIDA can produce superior results compared to existing IDA methods with BM3D.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Algorithm Selection for Deep Active Learning with Imbalanced Datasets
Authors:
Jifan Zhang,
Shuai Shao,
Saurabh Verma,
Robert Nowak
Abstract:
Label efficiency has become an increasingly important objective in deep learning applications. Active learning aims to reduce the number of labeled examples needed to train deep networks, but the empirical performance of active learning algorithms can vary dramatically across datasets and applications. It is difficult to know in advance which active learning strategy will perform well or best in a…
▽ More
Label efficiency has become an increasingly important objective in deep learning applications. Active learning aims to reduce the number of labeled examples needed to train deep networks, but the empirical performance of active learning algorithms can vary dramatically across datasets and applications. It is difficult to know in advance which active learning strategy will perform well or best in a given application. To address this, we propose the first adaptive algorithm selection strategy for deep active learning. For any unlabeled dataset, our (meta) algorithm TAILOR (Thompson ActIve Learning algORithm selection) iteratively and adaptively chooses among a set of candidate active learning algorithms. TAILOR uses novel reward functions aimed at gathering class-balanced examples. Extensive experiments in multi-class and multi-label applications demonstrate TAILOR's effectiveness in achieving accuracy comparable or better than that of the best of the candidate algorithms. Our implementation of TAILOR is open-sourced at https://github.com/jifanz/TAILOR.
△ Less
Submitted 2 November, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits
Authors:
Subhojyoti Mukherjee,
Qiaomin Xie,
Josiah Hanna,
Robert Nowak
Abstract:
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward n…
▽ More
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting that reduces the MSE of the value of the target policy. We then use this formulation to derive the optimal allocation of samples per action during data collection. We then introduce a novel algorithm SPEED (Structured Policy Evaluation Experimental Design) that tracks the optimal design and derive its regret with respect to the optimal design. Finally, we empirically validate that SPEED leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.
△ Less
Submitted 29 February, 2024; v1 submitted 28 January, 2023;
originally announced January 2023.
-
A Fully First-Order Method for Stochastic Bilevel Optimization
Authors:
Jeongyeol Kwon,
Dohyun Kwon,
Stephen Wright,
Robert Nowak
Abstract:
We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we p…
▽ More
We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $ε$-stationary solution of the bilevel problem after $ε^{-7/2}, ε^{-5/2}$, and $ε^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that if we employ momentum-assisted gradient estimators, the iteration complexities can be improved to $ε^{-5/2}, ε^{-4/2}$, and $ε^{-3/2}$, respectively. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
Deep learning has been wildly successful in practice and most state-of-the-art machine learning methods are based on neural networks. Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep neural networks. In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of deep learning.…
▽ More
Deep learning has been wildly successful in practice and most state-of-the-art machine learning methods are based on neural networks. Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep neural networks. In this article, we present a relatively new mathematical framework that provides the beginning of a deeper understanding of deep learning. This framework precisely characterizes the functional properties of neural networks that are trained to fit to data. The key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory, which are all techniques deeply rooted in signal processing. This framework explains the effect of weight decay regularization in neural network training, the use of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems.
△ Less
Submitted 8 June, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Spin-polarized transport in magnetic tunnel junctions with ZnTe barriers
Authors:
W. G. Wang,
C. Ni,
A. Ozbay,
L. R. Shah,
X. Fan,
X. M. Kou,
E. R. Nowak,
J. Q. Xiao
Abstract:
Magnetic tunnel junctions with wide band gap semiconductor ZnTe barrier were fabricated. A very low barrier height and sizable magnetoresistance were observed in the Fe/ZnTe/Fe junctions at room temperature. The nonlinear I-V characteristic curve confirmed the observed magnetoresistance is due to spin-dependent tunneling effect. Temperature dependent study indicated that the total conductance of t…
▽ More
Magnetic tunnel junctions with wide band gap semiconductor ZnTe barrier were fabricated. A very low barrier height and sizable magnetoresistance were observed in the Fe/ZnTe/Fe junctions at room temperature. The nonlinear I-V characteristic curve confirmed the observed magnetoresistance is due to spin-dependent tunneling effect. Temperature dependent study indicated that the total conductance of the junction is dominated by direct tunneling, with only a small portion from the hop** conduction through the defect states inside the barrier.
△ Less
Submitted 29 October, 2022;
originally announced October 2022.
-
OpenStack and Google Cloud performance comparison in Infrastructure as a Service model
Authors:
Michał Łątkowski,
Robert Nowak
Abstract:
Cloud computing is becoming common, and the choice of proper infrastructure is essential. One of main issues is choosing between private and public clound, between commercial and non-commercial solutions. This paper aims to compare the parameters of OpenStack and Google Cloud systems. Both systems deliver a computing cloud service, enabling the user to use the infrastructure as a service (IaaS) mo…
▽ More
Cloud computing is becoming common, and the choice of proper infrastructure is essential. One of main issues is choosing between private and public clound, between commercial and non-commercial solutions. This paper aims to compare the parameters of OpenStack and Google Cloud systems. Both systems deliver a computing cloud service, enabling the user to use the infrastructure as a service (IaaS) model. We developed the pipeline using the Python programming language and its libraries, which enable communication with the aforementioned clouds. We measured various parameters of instances and task execution: instance launch and deletion times, and their dependence on the number of launched instances. Moreover, we used benchmark algorithms to check the instance performance. We analysed the results and the factors that contributed to them and provided conclusions, recommendations, and suggestions for further research based on the gathered data.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Active Learning with Neural Networks: Insights from Nonparametric Statistics
Authors:
Yinglun Zhu,
Robert Nowak
Abstract:
Deep neural networks have great representation power, but typically require large numbers of training examples. This motivates deep active learning methods that can significantly reduce the amount of labeled training data. Empirical successes of deep active learning have been recently reported in the literature, however, rigorous label complexity guarantees of deep active learning have remained el…
▽ More
Deep neural networks have great representation power, but typically require large numbers of training examples. This motivates deep active learning methods that can significantly reduce the amount of labeled training data. Empirical successes of deep active learning have been recently reported in the literature, however, rigorous label complexity guarantees of deep active learning have remained elusive. This constitutes a significant gap between theory and practice. This paper tackles this gap by providing the first near-optimal label complexity guarantees for deep active learning. The key insight is to study deep active learning from the nonparametric classification perspective. Under standard low noise conditions, we show that active learning with neural networks can provably achieve the minimax label complexity, up to disagreement coefficient and other logarithmic terms. When equipped with an abstention option, we further develop an efficient deep active learning algorithm that achieves $\mathsf{polylog}(\frac{1}ε)$ label complexity, without any low noise assumptions. We also provide extensions of our results beyond the commonly studied Sobolev/Hölder spaces and develop label complexity guarantees for learning in Radon $\mathsf{BV}^2$ spaces, which have recently been proposed as natural function spaces associated with neural networks.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
Fast genomic optical map assembly algorithm using binary representation
Authors:
Przemysław Stawczyk,
Robert Nowak
Abstract:
Reducing the cost of sequencing genomes provided by next-generation sequencing technologies has greatly increased the number of genomic projects. As a result, there is a growing need for better assembly and assembly validation methods. One promising idea is to use heterogeneous data in assembly projects. Optical Map** (OM) is beneficial in validating genomic assemblies, correction and scaffoldin…
▽ More
Reducing the cost of sequencing genomes provided by next-generation sequencing technologies has greatly increased the number of genomic projects. As a result, there is a growing need for better assembly and assembly validation methods. One promising idea is to use heterogeneous data in assembly projects. Optical Map** (OM) is beneficial in validating genomic assemblies, correction and scaffolding. Single raw OM read describes a DNA molecule's long fragment, up to 1Mbp. Raw OM data from the same genome could be assembled to create consensus maps that span an entire chromosome.
The assembly process is computationally hard because of the large number of errors in input data.
This work describes a new algorithm and computer program to assemble OM reads without a reference genome. In our algorithm, we explored binary representation for genome maps. We focused on the efficiency of data structures and algorithms and scale on parallel platforms. The algorithm consists of several steps, of which the most important are : (1) conversion of the restriction maps into binary strings, (2) detection of overlaps between restriction maps, (3) determining the layout of restriction maps set, (4) creation of consensus genomic maps. Our algorithm deals with optical map** data with low error levels but fails with high-level error reads.
We developed a software library, console application and module for Python language. The approach presented in this paper proved to be faster than a dynamic programming approach and performed well on error-free data. It could be used as a step of \textit{de~novo} assembly pipelines or to detect misassemblies.The software is freely available in a public repository under GNU LGPL v3 license (https://sourceforge.net/p/binary-genome-maps/code).
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks
Authors:
Liu Yang,
Jifan Zhang,
Joseph Shenouda,
Dimitris Papailiopoulos,
Kangwook Lee,
Robert D. Nowak
Abstract:
Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks w…
▽ More
Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of $\ell_2$ (not squared) norms of the input and output weights associated with each ReLU neuron. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.
△ Less
Submitted 5 July, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Comprehensive structural changes in nanoscale-deformed silicon modelled with an integrated atomic potential
Authors:
Rafał Abram,
Dariusz Chrobak,
Jesper Byggmästar,
Kai H. Nordlund,
Roman Nowak
Abstract:
In spite of remarkable developments in the field of advanced materials, silicon remains one of the foremost semiconductors of the day. Of enduring relevance to science and technology is silicon's nanomechanical behaviour including phase transformation, amorphization and dislocations generation, particularly in the context of molecular dynamics and materials research. So far, comprehensive modellin…
▽ More
In spite of remarkable developments in the field of advanced materials, silicon remains one of the foremost semiconductors of the day. Of enduring relevance to science and technology is silicon's nanomechanical behaviour including phase transformation, amorphization and dislocations generation, particularly in the context of molecular dynamics and materials research. So far, comprehensive modelling of the whole cycle of events in silicon during nanoscale deformation has not been possible, however, due to the limitations inherent in the existing interatomic potentials. This paper examines how well an unconventional combination of two well-known potentials - the Tersoff and Stillinger-Weber - can perform in simulating that complexity. Our model indicates that an irreversible deformation of silicon (Si-I) is set in motion by a transformation to a non-diamond structure (Si-nd), and followed by a subsequent transition to the Si-II and Si-XII' phases (Si-I->Si-nd->Si-II->Si-XII'). This leads to the generation of dislocations spreading outwards from the incubation zone. In effect, our simulations parallel each and every one of the structural changes detected experimentally in the deformed material. This includes both the sequence of phase transitions and dislocation activity, which - taken together - neither the Tersoff nor Stillinger-Weber, or indeed any other available Si interatomic potential, is able to achieve in its own right. We have sought to additionally validate our method of merging atomic potentials by applying it to germanium, and found it can equally well predict germanium's transformation from a liquid to amorphous state.
△ Less
Submitted 8 August, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
-
One for All: Simultaneous Metric and Preference Learning over Multiple Users
Authors:
Gregory Canal,
Blake Mason,
Ramya Korlakai Vinayak,
Robert Nowak
Abstract:
This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by $d$-dimensional feature vectors and paired comparisons of the form ``item $i$ is preferable to item $j$'' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal po…
▽ More
This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by $d$-dimensional feature vectors and paired comparisons of the form ``item $i$ is preferable to item $j$'' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal point for each user reflecting their individual preferences. This model has the flexibility to capture individual preferences, while enjoying a metric learning sample cost that is amortized over the crowd. We first study this problem in a noiseless, continuous response setting (i.e., responses equal to differences of item distances) to understand the fundamental limits of learning. Next, we establish prediction error guarantees for noisy, binary measurements such as may be collected from human respondents, and show how the sample complexity improves when the underlying metric is low-rank. Finally, we establish recovery guarantees under assumptions on the response distribution. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements across a large number of users.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Mo-Si-B alloys for ultra-high temperature space and ground applications: liquid assisted fabrication under various temperature and time conditions
Authors:
G. Bruzda,
W. Polkowski,
R. Nowak,
A. Polkowska,
S. Lech,
K. Karczewski,
M. Książek,
D Giuranno
Abstract:
Boron-doped molybdenum silicides have been already recognized as attractive candidates for space and ground ultra-high temperature applications far beyond limits of state-of-the-art nickel based superalloys. In this work, we are exploring a new method for fabricating Mo-Si-B alloys (as coatings or small bulk components) by utilizing a pressure-less reactive melt infiltration approach. The basic as…
▽ More
Boron-doped molybdenum silicides have been already recognized as attractive candidates for space and ground ultra-high temperature applications far beyond limits of state-of-the-art nickel based superalloys. In this work, we are exploring a new method for fabricating Mo-Si-B alloys (as coatings or small bulk components) by utilizing a pressure-less reactive melt infiltration approach. The basic assumption of this approach is a synthesis of binary and/or ternary and complex intermetallic phases (silicides, borides, borosilicides), through a direct interaction of Si-B melt with molybdenum . The main purpose of this work, was to examine the effect of temperature and time of Si-B melt interaction on the structure and morphology of the formed reaction products. For this purpose, sessile drop experiments were carried out on the eutectic Si-3.2B (wt%) alloy/Mo couples at temperature varying between 1385-1550°C and holding time between 10 to 30 minutes. The solidified sessile drop couples were subjected to microstructural characterization by means of light microscopy and scanning electron microscopy analyses performed both at "top-view" and cross-sectioned interfaces. The phases formed within the interaction zone were identified by using TEM/SAED and XRD techniques. It was documented that a thickness of both main product layer (MoSi2+Mo5Si3), as well as boron-rich interlayer increases with raising temperature and time of the Si-B melt interaction with Mo substrates.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Efficient Active Learning with Abstention
Authors:
Yinglun Zhu,
Robert Nowak
Abstract:
The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one s…
▽ More
The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction and incur an error that is marginally smaller than random guessing. We develop the first computationally efficient active learning algorithm with abstention. Our algorithm provably achieves $\mathsf{polylog}(\frac{1}{\varepsilon})$ label complexity, without any low noise conditions. Such performance guarantee reduces the label complexity by an exponential factor, relative to passive learning and active learning that is not allowed to abstain. Furthermore, our algorithm is guaranteed to only abstain on hard examples (where the true label distribution is close to a fair coin), a novel property we term \emph{proper abstention} that also leads to a host of other desirable characteristics (e.g., recovering minimax guarantees in the standard setting, and avoiding the undesirable ``noise-seeking'' behavior often seen in active learning). We also provide novel extensions of our algorithm that achieve \emph{constant} label complexity and deal with model misspecification.
△ Less
Submitted 15 October, 2022; v1 submitted 31 March, 2022;
originally announced April 2022.
-
ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling
Authors:
Subhojyoti Mukherjee,
Josiah P. Hanna,
Robert Nowak
Abstract:
This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment formalized as an MDP. We develop theory for optimal data collection within the class of tree-structured MDPs by first deriving an oracle data collection s…
▽ More
This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment formalized as an MDP. We develop theory for optimal data collection within the class of tree-structured MDPs by first deriving an oracle data collection strategy that uses knowledge of the variance of the reward distributions. We then introduce the Reduced Variance Sampling (ReVar) algorithm that approximates the oracle strategy when the reward variances are unknown a priori and bound its sub-optimality compared to the oracle strategy. Finally, we empirically validate that ReVar leads to policy evaluation with mean squared error comparable to the oracle strategy and significantly lower than simply running the target policy.
△ Less
Submitted 17 June, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Training OOD Detectors in their Natural Habitats
Authors:
Julian Katz-Samuels,
Julia Nakhleh,
Robert Nowak,
Yixuan Li
Abstract:
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. However, these approaches make a strong distributional assumption that the auxiliary outlier data is completely separable from the in-distribution (ID) data. In this paper, we propose a novel framework that…
▽ More
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. However, these approaches make a strong distributional assumption that the auxiliary outlier data is completely separable from the in-distribution (ID) data. In this paper, we propose a novel framework that leverages wild mixture data, which naturally consists of both ID and OOD samples. Such wild data is abundant and arises freely upon deploying a machine learning classifier in their natural habitats. Our key idea is to formulate a constrained optimization problem and to show how to tractably solve it. Our learning objective maximizes the OOD detection rate, subject to constraints on the classification error of ID data and on the OOD error rate of ID examples. We extensively evaluate our approach on common OOD detection tasks and demonstrate superior performance.
△ Less
Submitted 28 June, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
GALAXY: Graph-based Active Learning at the Extreme
Authors:
Jifan Zhang,
Julian Katz-Samuels,
Robert Nowak
Abstract:
Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In "open world" settings, the classes of interest can make up a small fraction of the overall dataset -- most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and…
▽ More
Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In "open world" settings, the classes of interest can make up a small fraction of the overall dataset -- most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY's superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.
△ Less
Submitted 26 May, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers
Authors:
Julian Katz-Samuels,
Blake Mason,
Kevin Jamieson,
Rob Nowak
Abstract:
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We begin our investigation with the observation that agnostic algorithms \emph{cannot} be minimax-optimal in the realizable setting. Hence, we design novel computationally efficient algorithms for the realizable setting that ma…
▽ More
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We begin our investigation with the observation that agnostic algorithms \emph{cannot} be minimax-optimal in the realizable setting. Hence, we design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors and are general-purpose, accommodating a wide variety of function classes including kernel methods, H{ö}lder smooth functions, and convex functions. The sample complexities of our algorithms can be quantified in terms of well-known quantities like the extended teaching dimension and haystack dimension. However, unlike algorithms based directly on those combinatorial quantities, our algorithms are computationally efficient. To achieve computational efficiency, our algorithms sample from the version space using Monte Carlo "hit-and-run" algorithms instead of maintaining the version space explicitly. Our approach has two key strengths. First, it is simple, consisting of two unifying, greedy algorithms. Second, our algorithms have the capability to seamlessly leverage prior knowledge that is often available and useful in practice. In addition to our new theoretical results, we demonstrate empirically that our algorithms are competitive with Gaussian process UCB methods.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Nearly Optimal Algorithms for Level Set Estimation
Authors:
Blake Mason,
Romain Camilleri,
Subhojyoti Mukherjee,
Kevin Jamieson,
Robert Nowak,
Lalit Jain
Abstract:
The level set estimation problem seeks to find all points in a domain ${\cal X}$ where the value of an unknown function $f:{\cal X}\rightarrow \mathbb{R}$ exceeds a threshold $α$. The estimation is based on noisy function evaluations that may be acquired at sequentially and adaptively chosen locations in ${\cal X}$. The threshold value $α$ can either be \emph{explicit} and provided a priori, or \e…
▽ More
The level set estimation problem seeks to find all points in a domain ${\cal X}$ where the value of an unknown function $f:{\cal X}\rightarrow \mathbb{R}$ exceeds a threshold $α$. The estimation is based on noisy function evaluations that may be acquired at sequentially and adaptively chosen locations in ${\cal X}$. The threshold value $α$ can either be \emph{explicit} and provided a priori, or \emph{implicit} and defined relative to the optimal function value, i.e. $α= (1-ε)f(x_\ast)$ for a given $ε> 0$ where $f(x_\ast)$ is the maximal function value and is unknown. In this work we provide a new approach to the level set estimation problem by relating it to recent adaptive experimental design methods for linear bandits in the Reproducing Kernel Hilbert Space (RKHS) setting. We assume that $f$ can be approximated by a function in the RKHS up to an unknown misspecification and provide novel algorithms for both the implicit and explicit cases in this setting with strong theoretical guarantees. Moreover, in the linear (kernel) setting, we show that our bounds are nearly optimal, namely, our upper bounds match existing lower bounds for threshold linear bandits. To our knowledge this work provides the first instance-dependent, non-asymptotic upper bounds on sample complexity of level-set estimation that match information theoretic lower bounds.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performanc…
▽ More
We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.
△ Less
Submitted 12 October, 2022; v1 submitted 18 September, 2021;
originally announced September 2021.
-
Near Instance Optimal Model Selection for Pure Exploration Linear Bandits
Authors:
Yinglun Zhu,
Julian Katz-Samuels,
Robert Nowak
Abstract:
We introduce the model selection problem in pure exploration linear bandits, where the learner needs to adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model. We design algorithms in both fixed confidence and fixed budget settings with near instance optimal guarantees. The core of our algorithms is a new optimization problem based on experime…
▽ More
We introduce the model selection problem in pure exploration linear bandits, where the learner needs to adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model. We design algorithms in both fixed confidence and fixed budget settings with near instance optimal guarantees. The core of our algorithms is a new optimization problem based on experimental design that leverages the geometry of the action set to identify a near-optimal hypothesis class. Our fixed budget algorithm is developed based on a novel selection-validation procedure, which provides a new way to study the understudied fixed budget setting (even without the added challenge of model selection). We adapt our algorithms, in both fixed confidence and fixed budget settings, to problems with model misspecification.
△ Less
Submitted 17 March, 2022; v1 submitted 10 September, 2021;
originally announced September 2021.
-
Pure Exploration in Kernel and Neural Bandits
Authors:
Yinglun Zhu,
Dongruo Zhou,
Ruoxi Jiang,
Quanquan Gu,
Rebecca Willett,
Robert Nowak
Abstract:
We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification. Our approach is conceptually very different from existing works th…
▽ More
We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification. Our approach is conceptually very different from existing works that can either only handle low-dimensional linear bandits or passively deal with model misspecification. We showcase the application of our approach to two pure exploration settings that were previously under-studied: (1) the reward function belongs to a possibly infinite-dimensional Reproducing Kernel Hilbert Space, and (2) the reward function is nonlinear and can be approximated by neural networks. Our main results provide sample complexity guarantees that only depend on the effective dimension of the feature spaces in the kernel or neural representations. Extensive experiments conducted on both synthetic and real-world datasets demonstrate the efficacy of our methods.
△ Less
Submitted 17 March, 2022; v1 submitted 22 June, 2021;
originally announced June 2021.
-
Quantum Kibble-Zurek mechanism: Kink correlations after a quench in the quantum Ising chain
Authors:
Radosław J. Nowak,
Jacek Dziarmaga
Abstract:
The transverse field in the quantum Ising chain is linearly ramped from the para- to the ferromagnetic phase across the quantum critical point at a rate characterized by a quench time $τ_Q$. We calculate a connected kink-kink correlator in the final state at zero transverse field. The correlator is a sum of two terms: a negative (anti-bunching) Gaussian that depends on the Kibble-Zurek (KZ) correl…
▽ More
The transverse field in the quantum Ising chain is linearly ramped from the para- to the ferromagnetic phase across the quantum critical point at a rate characterized by a quench time $τ_Q$. We calculate a connected kink-kink correlator in the final state at zero transverse field. The correlator is a sum of two terms: a negative (anti-bunching) Gaussian that depends on the Kibble-Zurek (KZ) correlation length only and a positive term that depends on a second longer scale of length. The second length is made longer by dephasing of the state excited near the critical point during the following ramp across the ferromagnetic phase. This interpretation is corroborated by considering a linear ramp that is halted in the ferromagnetic phase for a finite waiting time and then continued at the same rate as before the halt. The extra time available for dephasing increases the second scale of length that asymptotically grows linearly with the waiting time. The dephasing also suppresses magnitude of the second term making it negligible for waiting times much longer than $τ_Q$. The same dephasing can be obtained with a smooth ramp that slows down in the ferromagnetic phase. Assuming sufficient dephasing we obtain also higher order kink correlators and the ferromagnetic correlation function.
△ Less
Submitted 27 August, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory
Authors:
Rahul Parhi,
Robert D. Nowak
Abstract:
We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit activations to data. We propose a new function space, which is reminiscent of classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU ne…
▽ More
We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit activations to data. We propose a new function space, which is reminiscent of classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems over functions from this space. The function space consists of compositions of functions from the Banach spaces of second-order bounded variation in the Radon domain. These are Banach spaces with sparsity-promoting norms, giving insight into the role of sparsity in deep neural networks. The neural network solutions have skip connections and rank bounded weight matrices, providing new theoretical support for these common architectural choices. The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization. Finally, our analysis builds on techniques from variational spline theory, providing new connections between deep neural networks and splines.
△ Less
Submitted 26 September, 2021; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Nearest Neighbor Search Under Uncertainty
Authors:
Blake Mason,
Ardhendu Tripathy,
Robert Nowak
Abstract:
Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning. There is vast literature on efficient algorithms for constructing data structures and performing exact and approximate NNS. This paper studies NNS under Uncertainty (NNSU). Specifically, consider the setting in which an NNS algorithm has access only to a stochastic distance oracle that provides a…
▽ More
Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning. There is vast literature on efficient algorithms for constructing data structures and performing exact and approximate NNS. This paper studies NNS under Uncertainty (NNSU). Specifically, consider the setting in which an NNS algorithm has access only to a stochastic distance oracle that provides a noisy, unbiased estimate of the distance between any pair of points, rather than the exact distance. This models many situations of practical importance, including NNS based on human similarity judgements, physical measurements, or fast, randomized approximations to exact distances. A naive approach to NNSU could employ any standard NNS algorithm and repeatedly query and average results from the stochastic oracle (to reduce noise) whenever it needs a pairwise distance. The problem is that a sufficient number of repeated queries is unknown in advance; e.g., a point maybe distant from all but one other point (crude distance estimates suffice) or it may be close to a large number of other points (accurate estimates are necessary). This paper shows how ideas from cover trees and multi-armed bandits can be leveraged to develop an NNSU algorithm that has optimal dependence on the dataset size and the (unknown)geometry of the dataset.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Pareto Optimal Model Selection in Linear Bandits
Authors:
Yinglun Zhu,
Robert Nowak
Abstract:
We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive…
▽ More
We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of "corralling" multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension $d_\star$ comes at a cost: There is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones.
△ Less
Submitted 16 March, 2022; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Chernoff Sampling for Active Testing and Extension to Active Regression
Authors:
Subhojyoti Mukherjee,
Ardhendu Tripathy,
Robert Nowak
Abstract:
Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model. In this paper, we revisit the work of Chernoff that described an asymptotically optimal algorithm for performing a hypothesis test. We obtain a novel sample complexity bound for Chernoff's algorithm, with a non-asymptotic term that characterizes its performance at a fixed…
▽ More
Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model. In this paper, we revisit the work of Chernoff that described an asymptotically optimal algorithm for performing a hypothesis test. We obtain a novel sample complexity bound for Chernoff's algorithm, with a non-asymptotic term that characterizes its performance at a fixed confidence level. We also develop an extension of Chernoff sampling that can be used to estimate the parameters of a wide variety of models and we obtain a non-asymptotic bound on the estimation error. We apply our extension of Chernoff sampling to actively learn neural network models and to estimate parameters in real-data linear and non-linear regression problems, where our approach performs favorably to state-of-the-art methods.
△ Less
Submitted 10 March, 2022; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Robust Outlier Arm Identification
Authors:
Yinglun Zhu,
Sumeet Katariya,
Robert Nowak
Abstract:
We study the problem of Robust Outlier Arm Identification (ROAI), where the goal is to identify arms whose expected rewards deviate substantially from the majority, by adaptively sampling from their reward distributions. We compute the outlier threshold using the median and median absolute deviation of the expected rewards. This is a robust choice for the threshold compared to using the mean and s…
▽ More
We study the problem of Robust Outlier Arm Identification (ROAI), where the goal is to identify arms whose expected rewards deviate substantially from the majority, by adaptively sampling from their reward distributions. We compute the outlier threshold using the median and median absolute deviation of the expected rewards. This is a robust choice for the threshold compared to using the mean and standard deviation, since it can identify outlier arms even in the presence of extreme outlier values. Our setting is different from existing pure exploration problems where the threshold is pre-specified as a given value or rank. This is useful in applications where the goal is to identify the set of promising items but the cardinality of this set is unknown, such as finding promising drugs for a new disease or identifying items favored by a population. We propose two $δ$-PAC algorithms for ROAI, which includes the first UCB-style algorithm for outlier detection, and derive upper bounds on their sample complexity. We also prove a matching, up to logarithmic factors, worst case lower bound for the problem, indicating that our upper bounds are generally unimprovable. Experimental results show that our algorithms are both robust and about $5$x sample efficient compared to state-of-the-art.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
InCorr: Interactive Data-Driven Correlation Panels for Digital Outcrop Analysis
Authors:
Thomas Ortner,
Andreas Walch,
Rebecca Nowak,
Robert Barnes,
Thomas Höllt,
Eduard Gröller
Abstract:
Geological analysis of 3D Digital Outcrop Models (DOMs) for reconstruction of ancient habitable environments is a key aspect of the upcoming ESA ExoMars 2022 Rosalind Franklin Rover and the NASA 2020 Rover Perseverance missions in seeking signs of past life on Mars. Geologists measure and interpret 3D DOMs, create sedimentary logs and combine them in `correlation panels' to map the extents of key…
▽ More
Geological analysis of 3D Digital Outcrop Models (DOMs) for reconstruction of ancient habitable environments is a key aspect of the upcoming ESA ExoMars 2022 Rosalind Franklin Rover and the NASA 2020 Rover Perseverance missions in seeking signs of past life on Mars. Geologists measure and interpret 3D DOMs, create sedimentary logs and combine them in `correlation panels' to map the extents of key geological horizons, and build a stratigraphic model to understand their position in the ancient landscape. Currently, the creation of correlation panels is completely manual and therefore time-consuming, and inflexible. With InCorr we present a visualization solution that encompasses a 3D logging tool and an interactive data-driven correlation panel that evolves with the stratigraphic analysis. For the creation of InCorr we closely cooperated with leading planetary geologists in the form of a design study. We verify our results by recreating an existing correlation analysis with InCorr and validate our correlation panel against a manually created illustration. Further, we conducted a user-study with a wider circle of geologists. Our evaluation shows that InCorr efficiently supports the domain experts in tackling their research questions and that it has the potential to significantly impact how geologists work with digital outcrop representations in general.
△ Less
Submitted 8 November, 2020; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Similarity Search for Efficient Active Learning and Search of Rare Concepts
Authors:
Cody Coleman,
Edward Chou,
Julian Katz-Samuels,
Sean Culatana,
Peter Bailis,
Alexander C. Berg,
Robert Nowak,
Roshan Sumbaly,
Matei Zaharia,
I. Zeki Yalniz
Abstract:
Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for la…
▽ More
Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set instead of scanning over all of the unlabeled data. We evaluate several selection strategies in this setting on three large-scale computer vision datasets: ImageNet, OpenImages, and a de-identified and aggregated dataset of 10 billion images provided by a large internet company. Our approach achieved similar mean average precision and recall as the traditional global approach while reducing the computational cost of selection by up to three orders of magnitude, thus enabling web-scale active learning.
△ Less
Submitted 22 July, 2021; v1 submitted 30 June, 2020;
originally announced July 2020.
-
On Regret with Multiple Best Arms
Authors:
Yinglun Zhu,
Robert Nowak
Abstract:
We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the proble…
▽ More
We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be minimax optimal simultaneously over all hardness levels; and (2) our algorithm achieves a rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over all hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.
△ Less
Submitted 22 October, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Finding All ε-Good Arms in Stochastic Bandits
Authors:
Blake Mason,
Lalit Jain,
Ardhendu Tripathy,
Robert Nowak
Abstract:
The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an ε-good arm, best-arm identification, top-k arm identification, and finding all arms with means above a specified threshold. However, the problem of finding all ε-good arms has been overlooked in past work, although arguably this may be t…
▽ More
The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an ε-good arm, best-arm identification, top-k arm identification, and finding all arms with means above a specified threshold. However, the problem of finding all ε-good arms has been overlooked in past work, although arguably this may be the most natural objective in many applications. For example, a virologist may conduct preliminary laboratory experiments on a large candidate set of treatments and move all ε-good treatments into more expensive clinical trials. Since the ultimate clinical efficacy is uncertain, it is important to identify all ε-good candidates. Mathematically, the all-ε-good arm identification problem presents significant new challenges and surprises that do not arise in the pure-exploration objectives studied in the past. We introduce two algorithms to overcome these and demonstrate their great empirical performance on a large-scale crowd-sourced dataset of 2.2M ratings collected by the New Yorker Caption Contest as well as a dataset testing hundreds of possible cancer drugs.
△ Less
Submitted 11 September, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.