Search | arXiv e-print repository

Combining Experimental and Historical Data for Policy Evaluation

Authors: Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu

Abstract: This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to min… ▽ More This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2404.17489 [pdf, other]

Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

Authors: Wei Cui, Rasa Hosseinzadeh, Junwei Ma, Tongzi Wu, Yi Sui, Keyvan Golestan

Abstract: Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular d… ▽ More Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular domain, the predominant augmentation technique for creating views is through corrupting tabular entries via swap** values, which is not as sound or effective. We propose a simple yet powerful improvement to this augmentation technique: corrupting tabular data conditioned on class identity. Specifically, when corrupting a specific tabular entry from an anchor row, instead of randomly sampling a value in the same feature column from the entire table uniformly, we only sample from rows that are identified to be within the same class as the anchor row. We assume the semi-supervised learning setting, and adopt the pseudo labeling technique for obtaining class identities over all table rows. We also explore the novel idea of selecting features to be corrupted based on feature correlation structures. Extensive experiments show that the proposed approach consistently outperforms the conventional corruption method for tabular data classification tasks. Our code is available at https://github.com/willtop/Tabular-Class-Conditioned-SSL. △ Less

Submitted 30 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 algorithms, 3 figures, 5 tables

arXiv:2401.13744 [pdf, other]

Conformal Prediction Sets Improve Human Decision Making

Authors: Jesse C. Cresswell, Yi Sui, Bhargava Kumar, Noël Vouitsis

Abstract: In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human de… ▽ More In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams. △ Less

Submitted 9 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Published at ICML 2024. Code available at https://github.com/layer6ai-labs/hitl-conformal-prediction

arXiv:2401.02650 [pdf, other]

Improving sample efficiency of high dimensional Bayesian optimization with MCMC

Authors: Zeji Yi, Yunyue Wei, Chu Xin Cheng, Kaibo He, Yanan Sui

Abstract: Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure.… ▽ More Sequential optimization methods are often confronted with the curse of dimensionality in high-dimensional spaces. Current approaches under the Gaussian process framework are still burdened by the computational complexity of tracking Gaussian process posteriors and need to partition the optimization problem into small regions to ensure exploration or assume an underlying low-dimensional structure. With the idea of transiting the candidate points towards more promising positions, we propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We provide theoretical guarantees of its convergence in the Gaussian process Thompson sampling setting. We also show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2306.04675 [pdf, other]

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

Authors: George Stein, Jesse C. Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem

Abstract: We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metr… ▽ More We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self-supervised feature extractors, find that the semantic information encoded by individual networks strongly depends on their training procedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation of generative models. Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet. However, our experiments show that current metrics do not properly detect memorization: none in the literature is able to separate memorization from other phenomena such as underfitting or mode shrinkage. To facilitate further development of generative models and their evaluation we release all generated image datasets, human evaluation data, and a modular library to compute 17 common metrics for 9 different encoders at https://github.com/layer6ai-labs/dgm-eval. △ Less

Submitted 30 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023. 53 pages, 29 figures, 12 tables. Code at https://github.com/layer6ai-labs/dgm-eval, reviews at https://openreview.net/forum?id=08zf7kTOoh

Journal ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

arXiv:2206.12280 [pdf, other]

Bayesian Circular Lattice Filters for Computationally Efficient Estimation of Multivariate Time-Varying Autoregressive Models

Authors: Yuelei Sui, Scott H. Holan, Wen-Hsi Yang

Abstract: Nonstationary time series data exist in various scientific disciplines, including environmental science, biology, signal processing, econometrics, among others. Many Bayesian models have been developed to handle nonstationary time series. The time-varying vector autoregressive (TV-VAR) model is a well-established model for multivariate nonstationary time series. Nevertheless, in most cases, the la… ▽ More Nonstationary time series data exist in various scientific disciplines, including environmental science, biology, signal processing, econometrics, among others. Many Bayesian models have been developed to handle nonstationary time series. The time-varying vector autoregressive (TV-VAR) model is a well-established model for multivariate nonstationary time series. Nevertheless, in most cases, the large number of parameters presented by the model results in a high computational burden, ultimately limiting its usage. This paper proposes a computationally efficient multivariate Bayesian Circular Lattice Filter to extend the usage of the TV-VAR model to a broader class of high-dimensional problems. Our fully Bayesian framework allows both the autoregressive (AR) coefficients and innovation covariance to vary over time. Our estimation method is based on the Bayesian lattice filter (BLF), which is extremely computationally efficient and stable in univariate cases. To illustrate the effectiveness of our approach, we conduct a comprehensive comparison with other competing methods through simulation studies and find that, in most cases, our approach performs superior in terms of average squared error between the estimated and true time-varying spectral density. Finally, we demonstrate our methodology through applications to quarterly Gross Domestic Product (GDP) data and Northern California wind data. △ Less

Submitted 24 June, 2022; originally announced June 2022.

arXiv:2102.12769 [pdf, other]

No-Regret Reinforcement Learning with Heavy-Tailed Rewards

Authors: Vincent Zhuang, Yanan Sui

Abstract: Reinforcement learning algorithms typically assume rewards to be sampled from light-tailed distributions, such as Gaussian or bounded. However, a wide variety of real-world systems generate rewards that follow heavy-tailed distributions. We consider such scenarios in the setting of undiscounted reinforcement learning. By constructing a lower bound, we show that the difficulty of learning heavy-tai… ▽ More Reinforcement learning algorithms typically assume rewards to be sampled from light-tailed distributions, such as Gaussian or bounded. However, a wide variety of real-world systems generate rewards that follow heavy-tailed distributions. We consider such scenarios in the setting of undiscounted reinforcement learning. By constructing a lower bound, we show that the difficulty of learning heavy-tailed rewards asymptotically dominates the difficulty of learning transition probabilities. Leveraging techniques from robust mean estimation, we propose Heavy-UCRL2 and Heavy-Q-Learning, and show that they achieve near-optimal regret bounds in this setting. Our algorithms also naturally generalize to deep reinforcement learning applications; we instantiate Heavy-DQN as an example of this. We demonstrate that all of our algorithms outperform baselines on both synthetic MDPs and standard RL benchmarks. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: AISTATS 21

arXiv:2102.06790 [pdf, other]

A Unified Lottery Ticket Hypothesis for Graph Neural Networks

Authors: Tianlong Chen, Yongduo Sui, Xuxi Chen, Aston Zhang, Zhangyang Wang

Abstract: With graphs rapidly growing in size and deeper graph neural networks (GNNs) emerging, the training and inference of GNNs become increasingly expensive. Existing network weight pruning algorithms cannot address the main space and computational bottleneck in GNNs, caused by the size and connectivity of the graph. To this end, this paper first presents a unified GNN sparsification (UGS) framework tha… ▽ More With graphs rapidly growing in size and deeper graph neural networks (GNNs) emerging, the training and inference of GNNs become increasingly expensive. Existing network weight pruning algorithms cannot address the main space and computational bottleneck in GNNs, caused by the size and connectivity of the graph. To this end, this paper first presents a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights, for effectively accelerating GNN inference on large-scale graphs. Leveraging this new tool, we further generalize the recently popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network, which can be jointly identified from the original GNN and the full dense graph by iteratively applying UGS. Like its counterpart in convolutional neural networks, GLT can be trained in isolation to match the performance of training with the full model and graph, and can be drawn from both randomly initialized and self-supervised pre-trained GNNs. Our proposal has been experimentally verified across various GNN architectures and diverse tasks, on both small-scale graph datasets (Cora, Citeseer and PubMed), and large-scale datasets from the challenging Open Graph Benchmark (OGB). Specifically, for node classification, our found GLTs achieve the same accuracies with 20%~98% MACs saving on small graphs and 25%~85% MACs saving on large ones. For link prediction, GLTs lead to 48%~97% and 70% MACs saving on small and large graph datasets, respectively, without compromising predictive performance. Codes available at https://github.com/VITA-Group/Unified-LTH-GNN. △ Less

Submitted 7 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

arXiv:2010.09808 [pdf, other]

Imitation with Neural Density Models

Authors: Kuno Kim, Akshat **dal, Yang Song, Jiaming Song, Yanan Sui, Stefano Ermon

Abstract: We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We prese… ▽ More We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2003.13413 [pdf, other]

Secure Metric Learning via Differential Pairwise Privacy

Authors: **g Li, Yuangang Pan, Yulei Sui, Ivor W. Tsang

Abstract: Distance Metric Learning (DML) has drawn much attention over the last two decades. A number of previous works have shown that it performs well in measuring the similarities of individuals given a set of correctly labeled pairwise data by domain experts. These important and precisely-labeled pairwise data are often highly sensitive in real world (e.g., patients similarity). This paper studies, for… ▽ More Distance Metric Learning (DML) has drawn much attention over the last two decades. A number of previous works have shown that it performs well in measuring the similarities of individuals given a set of correctly labeled pairwise data by domain experts. These important and precisely-labeled pairwise data are often highly sensitive in real world (e.g., patients similarity). This paper studies, for the first time, how pairwise information can be leaked to attackers during distance metric learning, and develops differential pairwise privacy (DPP), generalizing the definition of standard differential privacy, for secure metric learning. Unlike traditional differential privacy which only applies to independent samples, thus cannot be used for pairwise data, DPP successfully deals with this problem by reformulating the worst case. Specifically, given the pairwise data, we reveal all the involved correlations among pairs in the constructed undirected graph. DPP is then formalized that defines what kind of DML algorithm is private to preserve pairwise data. After that, a case study employing the contrastive loss is exhibited to clarify the details of implementing a DPP-DML algorithm. Particularly, the sensitivity reduction technique is proposed to enhance the utility of the output distance metric. Experiments both on a toy dataset and benchmarks demonstrate that the proposed scheme achieves pairwise data privacy without compromising the output performance much (Accuracy declines less than 0.01 throughout all benchmark datasets when the privacy budget is set at 4). △ Less

Submitted 30 March, 2020; originally announced March 2020.

arXiv:1908.01289 [pdf, other]

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Authors: Ellen R. Novoseller, Yibing Wei, Yanan Sui, Yisong Yue, Joel W. Burdick

Abstract: In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in… ▽ More In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present DUELING POSTERIOR SAMPLING (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the preference feedback. As preference feedback is provided on trajectories rather than individual state-action pairs, we develop a Bayesian approach for the credit assignment problem, translating preferences to a posterior distribution over state-action reward models. We prove an asymptotic Bayesian no-regret rate for DPS with a Bayesian linear regression credit assignment model. This is the first regret guarantee for preference-based RL to our knowledge. We also discuss possible avenues for extending the proof methodology to other credit assignment models. Finally, we evaluate the approach empirically, showing competitive performance against existing baselines. △ Less

Submitted 29 June, 2020; v1 submitted 4 August, 2019; originally announced August 2019.

Comments: To appear in Conference on Uncertainty in Artificial Intelligence (UAI), 2020. 9 pages before references and appendix; 51 pages total; 7 figures; 4 tables. This replacement incorporates reviewer comments, and in comparison to version 1, extends the theoretical and empirical analyses and adds mathematical detail. Code: https://github.com/ernovoseller/DuelingPosteriorSampling

arXiv:1806.07555 [pdf, other]

Stagewise Safe Bayesian Optimization with Gaussian Processes

Authors: Yanan Sui, Vincent Zhuang, Joel W. Burdick, Yisong Yue

Abstract: Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function… ▽ More Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function with absolute feedback or preference feedback subject to unknown safety constraints. We develop an efficient safe Bayesian optimization algorithm, StageOpt, that separates safe region expansion and utility function maximization into two distinct stages. Compared to existing approaches which interleave between expansion and optimization, we show that StageOpt is more efficient and naturally applicable to a broader class of problems. We provide theoretical guarantees for both the satisfaction of safety constraints as well as convergence to the optimal utility value. We evaluate StageOpt on both a variety of synthetic experiments, as well as in clinical practice. We demonstrate that StageOpt is more effective than existing safe optimization approaches, and is able to safely and effectively optimize spinal cord stimulation therapy in our clinical experiments. △ Less

Submitted 26 January, 2020; v1 submitted 20 June, 2018; originally announced June 2018.

Comments: International Conference on Machine Learning (ICML) 2018

arXiv:1711.07894 [pdf, other]

Quantifying Performance of Bipedal Standing with Multi-channel EMG

Authors: Yanan Sui, Kun ho Kim, Joel W. Burdick

Abstract: Spinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feed… ▽ More Spinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feedback. To study the potential for automated electromyography (EMG) analysis in SCI, we evaluated the standing quality of paralyzed patients undergoing electrical spinal cord stimulation using both video and multi-channel surface EMG recordings during spinal stimulation therapy sessions. The quality of standing under different stimulation settings was quantified manually by experienced clinicians. By correlating features of the recorded EMG activity with the expert evaluations, we show that multi-channel EMG recording can provide accurate, fast, and robust estimation for the quality of bipedal standing in spinally stimulated SCI patients. Moreover, our analysis shows that the total number of EMG channels needed to effectively predict standing quality can be reduced while maintaining high estimation accuracy, which provides more flexibility for rehabilitation robotic systems to incorporate EMG recordings. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Journal ref: IROS 2017

Showing 1–13 of 13 results for author: Sui, Y