Search | arXiv e-print repository

Improving the Knowledge Gradient Algorithm

Authors: Yang Le, Gao Siyang, Ho Chin Pang

Abstract: The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms. In this research, we show that this policy has limitations, causing the algorithm not asymptotically optimal. We next provide a… ▽ More The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms. In this research, we show that this policy has limitations, causing the algorithm not asymptotically optimal. We next provide a remedy for it, by following the manner of one-step look ahead of KG, but instead choosing the measurement that yields the greatest one-step improvement in the probability of selecting the best arm. The new policy is called improved knowledge gradient (iKG). iKG can be shown to be asymptotically optimal. In addition, we show that compared to KG, it is easier to extend iKG to variant problems of BAI, with the $ε$-good arm identification and feasible arm identification as two examples. The superior performances of iKG on these problems are further demonstrated using numerical examples. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 32 pages, 42 figures

arXiv:2202.13163 [pdf, other]

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

Authors: Chengchun Shi, Shikai Luo, Yuan Le, Hongtu Zhu, Rui Song

Abstract: We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remai… ▽ More We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL. △ Less

Submitted 26 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

arXiv:1510.02676 [pdf, other]

Some Theory For Practical Classifier Validation

Authors: Eric Bax, Ya Le

Abstract: We compare and contrast two approaches to validating a trained classifier while using all in-sample data for training. One is simultaneous validation over an organized set of hypotheses (SVOOSH), the well-known method that began with VC theory. The other is withhold and gap (WAG). WAG withholds a validation set, trains a holdout classifier on the remaining data, uses the validation data to validat… ▽ More We compare and contrast two approaches to validating a trained classifier while using all in-sample data for training. One is simultaneous validation over an organized set of hypotheses (SVOOSH), the well-known method that began with VC theory. The other is withhold and gap (WAG). WAG withholds a validation set, trains a holdout classifier on the remaining data, uses the validation data to validate that classifier, then adds the rate of disagreement between the holdout classifier and one trained using all in-sample data, which is an upper bound on the difference in error rates. We show that complex hypothesis classes and limited training data can make WAG a favorable alternative. △ Less

Submitted 9 October, 2015; originally announced October 2015.

arXiv:1411.0023 [pdf, other]

Validation of Matching

Authors: Ya Le, Eric Bax, Nicola Barbieri, David Garcia Soriano, Jitesh Mehta, James Li

Abstract: We introduce a technique to compute probably approximately correct (PAC) bounds on precision and recall for matching algorithms. The bounds require some verified matches, but those matches may be used to develop the algorithms. The bounds can be applied to network reconciliation or entity resolution algorithms, which identify nodes in different networks or values in a data set that correspond to t… ▽ More We introduce a technique to compute probably approximately correct (PAC) bounds on precision and recall for matching algorithms. The bounds require some verified matches, but those matches may be used to develop the algorithms. The bounds can be applied to network reconciliation or entity resolution algorithms, which identify nodes in different networks or values in a data set that correspond to the same entity. For network reconciliation, the bounds do not require knowledge of the network generation process. △ Less

Submitted 11 April, 2016; v1 submitted 31 October, 2014; originally announced November 2014.

arXiv:1407.4543 [pdf, other]

Sparse Quadratic Discriminant Analysis and Community Bayes

Authors: Ya Le, Trevor Hastie

Abstract: We develop a class of rules spanning the range between quadratic discriminant analysis and naive Bayes, through a path of sparse graphical models. A group lasso penalty is used to introduce shrinkage and encourage a similar pattern of sparsity across precision matrices. It gives sparse estimates of interactions and produces interpretable models. Inspired by the connected-components structure of th… ▽ More We develop a class of rules spanning the range between quadratic discriminant analysis and naive Bayes, through a path of sparse graphical models. A group lasso penalty is used to introduce shrinkage and encourage a similar pattern of sparsity across precision matrices. It gives sparse estimates of interactions and produces interpretable models. Inspired by the connected-components structure of the estimated precision matrices, we propose the community Bayes model, which partitions features into several conditional independent communities and splits the classification problem into separate smaller ones. The community Bayes idea is quite general and can be applied to non-Gaussian data and likelihood-based classifiers. △ Less

Submitted 19 October, 2016; v1 submitted 16 July, 2014; originally announced July 2014.

Comments: Revised version (adding more experiments)

Showing 1–5 of 5 results for author: Le, Y