Skip to main content

Showing 1–50 of 60 results for author: Lam, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.02754  [pdf, other

    math.ST stat.ME

    Is Cross-Validation the Gold Standard to Evaluate Model Performance?

    Authors: Garud Iyengar, Henry Lam, Tianyu Wang

    Abstract: Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that in fact, for a wide spectrum of models, CV does not statistically outperform the simple "plug-in" approach where one r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2406.14071  [pdf, other

    stat.ML cs.LG

    Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

    Authors: Ziyi Huang, Henry Lam, Haofeng Zhang

    Abstract: Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Nevertheless, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a general theoretical framework to analyze stochastic linear bandits in the presence of approximate inference and conduct regret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.07825  [pdf, other

    math.OC stat.ME

    Shape-Constrained Distributional Optimization via Importance-Weighted Sample Average Approximation

    Authors: Henry Lam, Zhenyuan Liu, Dashi I. Singham

    Abstract: Shape-constrained optimization arises in a wide range of problems including distributionally robust optimization (DRO) that has surging popularity in recent years. In the DRO literature, these problems are usually solved via reduction into moment-constrained problems using the Choquet representation. While powerful, such an approach could face tractability challenges arising from the geometries an… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2405.14953  [pdf, other

    cs.LG cs.AI stat.ML

    Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

    Authors: Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

    Abstract: Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in its lack of capability to characterize the diversity of human preferences. Inspired by Mallows' theory of preference ranking, we develop in this paper… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.14741  [pdf, other

    math.OC cs.LG stat.ML

    Bagging Improves Generalization Exponentially

    Authors: Huajie Qian, Donghao Ying, Henry Lam, Wotao Yin

    Abstract: Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on bagging: By suitably aggregating the base learners… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Correct author list typo

  6. arXiv:2403.09877  [pdf, other

    stat.ME math.ST

    Quantifying Distributional Input Uncertainty via Inflated Kolmogorov-Smirnov Confidence Band

    Authors: Motong Chen, Henry Lam, Zhenyuan Liu

    Abstract: In stochastic simulation, input uncertainty refers to the propagation of the statistical noise in calibrating input models to impact output accuracy, in addition to the Monte Carlo simulation noise. The vast majority of the input uncertainty literature focuses on estimating target output quantities that are real-valued. However, outputs of simulation models are random and real-valued targets essen… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  7. arXiv:2310.11065  [pdf, other

    stat.ML cs.LG

    Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification

    Authors: Henry Lam, Zitong Wang

    Abstract: Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  8. arXiv:2310.09766  [pdf, other

    stat.ML cs.LG

    Pseudo-Bayesian Optimization

    Authors: Haoxian Chen, Henry Lam

    Abstract: Bayesian Optimization is a popular approach for optimizing expensive black-box functions. Its key idea is to use a surrogate model to approximate the objective and, importantly, quantify the associated uncertainty that allows a sequential search of query points that balance exploitation-exploration. Gaussian process (GP) has been a primary candidate for the surrogate model, thanks to its Bayesian-… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

  9. arXiv:2306.14041  [pdf, other

    math.OC cs.LG stat.ML

    Smoothed $f$-Divergence Distributionally Robust Optimization

    Authors: Zhenyuan Liu, Bart P. G. Van Parys, Henry Lam

    Abstract: In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA a… ▽ More

    Submitted 12 October, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

    MSC Class: 90C15; 90C17; 90C25

  10. arXiv:2306.05674  [pdf, other

    stat.ML cs.LG

    Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks

    Authors: Ziyi Huang, Henry Lam, Haofeng Zhang

    Abstract: Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated net… ▽ More

    Submitted 9 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  11. arXiv:2305.18412  [pdf, other

    stat.AP cs.LG

    Short-term Temporal Dependency Detection under Heterogeneous Event Dynamic with Hawkes Processes

    Authors: Yu Chen, Fengpei Li, Anderson Schneider, Yuriy Nevmyvaka, Asohan Amarasingham, Henry Lam

    Abstract: Many event sequence data exhibit mutually exciting or inhibiting patterns. Reliable detection of such temporal dependency is crucial for scientific investigation. The de facto model is the Multivariate Hawkes Process (MHP), whose impact function naturally encodes a causal structure in Granger causality. However, the vast majority of existing methods use direct or nonlinear transform of standard MH… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Conference on Uncertainty in Artificial Intelligence 2023

  12. arXiv:2305.02434  [pdf, other

    stat.ME math.ST

    Uncertainty Quantification and Confidence Intervals for Naive Rare-Event Estimators

    Authors: Yuanlu Bai, Henry Lam

    Abstract: We consider the estimation of rare-event probabilities using sample proportions output by naive Monte Carlo or collected data. Unlike using variance reduction techniques, this naive estimator does not have a priori relative efficiency guarantee. On the other hand, due to the recent surge of sophisticated rare-event problems arising in safety evaluations of intelligent systems, efficiency-guarantee… ▽ More

    Submitted 26 April, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

  13. arXiv:2304.06833  [pdf, other

    stat.ML cs.LG stat.ME

    Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective

    Authors: Adam N. Elmachtoub, Henry Lam, Haofeng Zhang, Yunfan Zhao

    Abstract: In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (… ▽ More

    Submitted 6 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

  14. arXiv:2301.01360  [pdf, other

    stat.ME math.OC math.ST

    A Distributionally Robust Optimization Framework for Extreme Event Estimation

    Authors: Yuanlu Bai, Henry Lam, Xinyu Zhang

    Abstract: Conventional methods for extreme event estimation rely on well-chosen parametric models asymptotically justified from extreme value theory (EVT). These methods, while powerful and theoretically grounded, could however encounter a difficult bias-variance tradeoff that exacerbates especially when data size is too small, deteriorating the reliability of the tail estimation. In this paper, we study a… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  15. arXiv:2210.12334  [pdf, other

    stat.ML cs.LG math.OC

    Adaptive Data Fusion for Multi-task Non-smooth Optimization

    Authors: Henry Lam, Kaizheng Wang, Yuhang Wu, Yichen Zhang

    Abstract: We study the problem of multi-task non-smooth optimization that arises ubiquitously in statistical learning, decision-making and risk management. We develop a data fusion approach that adaptively leverages commonalities among a large number of objectives to improve sample efficiency while tackling their unknown heterogeneities. We provide sharp statistical guarantees for our approach. Numerical ex… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 25 pages

  16. arXiv:2210.10974  [pdf, other

    stat.ME math.ST stat.CO

    Bootstrap in High Dimension with Low Computation

    Authors: Henry Lam, Zhenyuan Liu

    Abstract: The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We study the use of bootstraps in high-dimensional environments with a small number of resamples. In particular, we show that with a recent "cheap" bootstrap pers… ▽ More

    Submitted 19 June, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted to Proceedings of the 40th International Conference on Machine Learning (ICML)

  17. arXiv:2206.04287  [pdf, other

    cs.LG stat.ML

    Evaluating Aleatoric Uncertainty via Conditional Generative Models

    Authors: Ziyi Huang, Henry Lam, Haofeng Zhang

    Abstract: Aleatoric uncertainty quantification seeks for distributional knowledge of random responses, which is important for reliability analysis and robustness improvement in machine learning applications. Previous research on aleatoric uncertainty estimation mainly targets closed-formed conditional densities or variances, which requires strong restrictions on the data distribution or dimensionality. To o… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  18. arXiv:2204.02351  [pdf, other

    cs.LG cs.RO stat.ME

    Test Against High-Dimensional Uncertainties: Accelerated Evaluation of Autonomous Vehicles with Deep Importance Sampling

    Authors: Mansur Arief, Zhepeng Cen, Zhenyuan Liu, Zhiyuang Huang, Henry Lam, Bo Li, Ding Zhao

    Abstract: Evaluating the performance of autonomous vehicles (AV) and their complex subsystems to high precision under naturalistic circumstances remains a challenge, especially when failure or dangerous cases are rare. Rarity does not only require an enormous sample size for a naive method to achieve high confidence estimation, but it also causes dangerous underestimation of the true failure rate and it is… ▽ More

    Submitted 5 April, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

  19. arXiv:2204.01904  [pdf, other

    stat.ME

    Prediction Intervals for Simulation Metamodeling

    Authors: Henry Lam, Haofeng Zhang

    Abstract: Simulation metamodeling refers to the construction of lower-fidelity models to represent input-output relations using few simulation runs. Stochastic kriging, which is based on Gaussian process, is a versatile and common technique for such a task. However, this approach relies on specific model assumptions and could encounter scalability challenges. In this paper, we study an alternative metamodel… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

  20. arXiv:2202.00090  [pdf, other

    stat.ME math.ST stat.CO

    A Cheap Bootstrap Method for Fast Inference

    Authors: Henry Lam

    Abstract: The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model fitting. We present a bootstrap methodology that uses minimal computation, namely with a resample effort as low as one Monte Carlo replication, while maintaining… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  21. arXiv:2201.12955  [pdf, other

    cs.LG stat.ML

    Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework

    Authors: Ziyi Huang, Henry Lam, Amirhossein Meisami, Haofeng Zhang

    Abstract: Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. However, there is a large discrepancy between the superior practical performance of these approaches and their theoretical justification. Previous research only indicates a negative theoretical result: Thompson sampling could have a worst-case linear regret $Ω(T)$ with a constant thresh… ▽ More

    Submitted 9 November, 2023; v1 submitted 30 January, 2022; originally announced January 2022.

  22. arXiv:2112.03874  [pdf, other

    q-fin.ST cs.AI cs.CE cs.LG cs.MA stat.ME

    Efficient Calibration of Multi-Agent Simulation Models from Output Series with Bayesian Optimization

    Authors: Yuanlu Bai, Henry Lam, Svitlana Vyetrenko, Tucker Balch

    Abstract: Multi-agent simulation is commonly used across multiple disciplines, specifically in artificial intelligence in recent years, which creates an environment for downstream machine learning or reinforcement learning tasks. In many practical scenarios, however, only the output series that result from the interactions of simulation agents are observable. Therefore, simulators need to be calibrated so t… ▽ More

    Submitted 20 September, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: This paper has been accepted and will be published in ICAIF 2022 proceedings

  23. arXiv:2111.07894  [pdf, other

    math.OC math.PR stat.ME

    Orthounimodal Distributionally Robust Optimization: Representation, Computation and Multivariate Extreme Event Applications

    Authors: Henry Lam, Zhenyuan Liu, Xinyu Zhang

    Abstract: This paper studies a basic notion of distributional shape known as orthounimodality (OU) and its use in shape-constrained distributionally robust optimization (DRO). As a key motivation, we argue how such type of DRO is well-suited to tackle multivariate extreme event estimation by giving statistically valid confidence bounds on target extremal probabilities. In particular, we explain how DRO can… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  24. arXiv:2111.06859  [pdf, other

    math.ST math.PR stat.ME

    Higher-Order Coverage Errors of Batching Methods via Edgeworth Expansions on $t$-Statistics

    Authors: Shengyi He, Henry Lam

    Abstract: While batching methods have been widely used in simulation and statistics, it is open regarding their higher-order coverage behaviors and whether one variant is better than the others in this regard. We develop techniques to obtain higher-order coverage errors for batching methods by building Edgeworth-type expansions on $t$-statistics. The coefficients in these expansions are intricate analytical… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  25. arXiv:2111.02204  [pdf, other

    stat.ME stat.ML

    Certifiable Deep Importance Sampling for Rare-Event Simulation of Black-Box Systems

    Authors: Mansur Arief, Yuanlu Bai, Wenhao Ding, Shengyi He, Zhiyuan Huang, Henry Lam, Ding Zhao

    Abstract: Rare-event simulation techniques, such as importance sampling (IS), constitute powerful tools to speed up challenging estimation of rare catastrophic events. These techniques often leverage the knowledge and analysis on underlying system structures to endow desirable efficiency guarantees. However, black-box problems, especially those arising from recent safety-critical applications of AI-driven p… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: The conference version of this paper has appeared in AISTATS 2021 (arXiv:2006.15722)

  26. arXiv:2110.12573  [pdf, other

    math.ST stat.ME

    Over-Conservativeness of Variance-Based Efficiency Criteria and Probabilistic Efficiency in Rare-Event Simulation

    Authors: Yuanlu Bai, Zhiyuan Huang, Henry Lam, Ding Zhao

    Abstract: In rare-event simulation, an importance sampling (IS) estimator is regarded as efficient if its relative error, namely the ratio between its standard deviation and mean, is sufficiently controlled. It is widely known that when a rare-event set contains multiple "important regions" encoded by the so-called dominating points, IS needs to account for all of them via mixing to achieve efficiency. We a… ▽ More

    Submitted 28 October, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

  27. arXiv:2110.12131  [pdf, other

    stat.CO stat.ME

    Doubly Robust Stein-Kernelized Monte Carlo Estimator: Simultaneous Bias-Variance Reduction and Supercanonical Convergence

    Authors: Henry Lam, Haofeng Zhang

    Abstract: Standard Monte Carlo computation is widely known to exhibit a canonical square-root convergence speed in terms of sample size. Two recent techniques, one based on control variate and one on importance sampling, both derived from an integration of reproducing kernels and Stein's identity, have been proposed to reduce the error in Monte Carlo computation to supercanonical convergence. This paper pre… ▽ More

    Submitted 9 March, 2023; v1 submitted 23 October, 2021; originally announced October 2021.

  28. arXiv:2110.12122  [pdf, other

    cs.LG stat.ME stat.ML

    Quantifying Epistemic Uncertainty in Deep Learning

    Authors: Ziyi Huang, Henry Lam, Haofeng Zhang

    Abstract: Uncertainty quantification is at the core of the reliability and robustness of machine learning. In this paper, we provide a theoretical framework to dissect the uncertainty, especially the \textit{epistemic} component, in deep learning into \textit{procedural variability} (from the training procedure) and \textit{data variability} (from the training data), which is the first such attempt in the l… ▽ More

    Submitted 18 June, 2023; v1 submitted 22 October, 2021; originally announced October 2021.

  29. arXiv:2108.05908  [pdf, ps, other

    math.OC math.PR stat.ME

    Higher-Order Expansion and Bartlett Correctability of Distributionally Robust Optimization

    Authors: Shengyi He, Henry Lam

    Abstract: Distributionally robust optimization (DRO) is a worst-case framework for stochastic optimization under uncertainty that has drawn fast-growing studies in recent years. When the underlying probability distribution is unknown and observed from data, DRO suggests to compute the worst-case distribution within a so-called uncertainty set that captures the involved statistical uncertainty. In particular… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

  30. arXiv:2106.11180  [pdf, other

    math.OC cs.LG stat.ME

    Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization

    Authors: Yibo Zeng, Henry Lam

    Abstract: Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven… ▽ More

    Submitted 12 October, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted by NeurIPS 2022

  31. arXiv:2105.13419  [pdf, ps, other

    math.OC stat.ME

    On the Impossibility of Statistically Improving Empirical Optimization: A Second-Order Stochastic Dominance Perspective

    Authors: Henry Lam

    Abstract: When the underlying probability distribution in a stochastic optimization is observed only through data, various data-driven formulations have been studied to obtain approximate optimal solutions. We show that no such formulations can, in a sense, theoretically improve the statistical quality of the solution obtained from empirical optimization. We argue this by proving that the first-order behavi… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

  32. arXiv:2105.12893  [pdf, other

    stat.ME cs.CE cs.LG

    Calibrating Over-Parametrized Simulation Models: A Framework via Eligibility Set

    Authors: Yuanlu Bai, Tucker Balch, Haoxian Chen, Danial Dervovic, Henry Lam, Svitlana Vyetrenko

    Abstract: Stochastic simulation aims to compute output performance for complex models that lack analytical tractability. To ensure accurate prediction, the model needs to be calibrated and validated against real data. Conventional methods approach these tasks by assessing the model-data match via simple hypothesis tests or distance minimization in an ad hoc fashion, but they can encounter challenges arising… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  33. arXiv:2105.09177  [pdf, other

    math.OC math.PR stat.ME

    Distributionally Constrained Black-Box Stochastic Gradient Estimation and Optimization

    Authors: Henry Lam, Junhui Zhang

    Abstract: We consider stochastic gradient estimation using only black-box function evaluations, where the function argument lies within a probability simplex. This problem is motivated from gradient-descent optimization procedures in multiple applications in distributionally robust analysis and inverse model calibration involving decision variables that are probability distributions. We are especially inter… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

  34. arXiv:2102.13625  [pdf, other

    stat.ML cs.LG

    Learning Prediction Intervals for Regression: Generalization and Calibration

    Authors: Haoxian Chen, Ziyi Huang, Henry Lam, Huajie Qian, Haofeng Zhang

    Abstract: We study the generation of prediction intervals in regression for uncertainty quantification. This task can be formalized as an empirical constrained optimization problem that minimizes the average interval width while maintaining the coverage accuracy across data. We strengthen the existing literature by studying two aspects of this empirical optimization. First is a general learning theory to ch… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

  35. arXiv:2102.10631  [pdf, other

    stat.ME math.OC math.PR

    Adaptive Importance Sampling for Efficient Stochastic Root Finding and Quantile Estimation

    Authors: Shengyi He, Guangxin Jiang, Henry Lam, Michael C. Fu

    Abstract: In solving simulation-based stochastic root-finding or optimization problems that involve rare events, such as in extreme quantile estimation, running crude Monte Carlo can be prohibitively inefficient. To address this issue, importance sampling can be employed to drive down the sampling error to a desirable level. However, selecting a good importance sampler requires knowledge of the solution to… ▽ More

    Submitted 21 February, 2021; originally announced February 2021.

  36. Model Calibration via Distributionally Robust Optimization: On the NASA Langley Uncertainty Quantification Challenge

    Authors: Yuanlu Bai, Zhiyuan Huang, Henry Lam

    Abstract: We study a methodology to tackle the NASA Langley Uncertainty Quantification Challenge, a model calibration problem under both aleatory and epistemic uncertainties. Our methodology is based on an integration of robust optimization, more specifically a recent line of research known as distributionally robust optimization, and importance sampling in Monte Carlo simulation. The main computation machi… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2006.15689

  37. arXiv:2012.05591  [pdf, other

    stat.ME

    Efficient Learning for Clustering and Optimizing Context-Dependent Designs

    Authors: Haidong Li, Henry Lam, Yijie Peng

    Abstract: We consider a simulation optimization problem for a context-dependent decision-making. A Gaussian mixture model is proposed to capture the performance clustering phenomena of context-dependent designs. Under a Bayesian framework, we develop a dynamic sampling policy to efficiently learn both the global information of each cluster and local information of each design for selecting the best designs… ▽ More

    Submitted 13 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

  38. arXiv:2012.05577   

    stat.ME

    Context-dependent Ranking and Selection under a Bayesian Framework

    Authors: Haidong Li, Henry Lam, Zhe Liang, Yijie Peng

    Abstract: We consider a context-dependent ranking and selection problem. The best design is not universal but depends on the contexts. Under a Bayesian framework, we develop a dynamic sampling scheme for context-dependent optimization (DSCO) to efficiently learn and select the best designs in all contexts. The proposed sampling scheme is proved to be consistent. Numerical experiments show that the proposed… ▽ More

    Submitted 18 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: The article was published without the co-Author's notice, and it is withdrawn due to his objection

  39. arXiv:2010.04890  [pdf, other

    cs.LG math.ST stat.ML

    Rare-Event Simulation for Neural Network and Random Forest Predictors

    Authors: Yuanlu Bai, Zhiyuan Huang, Henry Lam, Ding Zhao

    Abstract: We study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests. This problem is motivated from fast emerging studies on the safety evaluation of intelligent systems, robustness quantification of learning models, and other potential applications to large-scale simulation in whi… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

  40. arXiv:2007.04443  [pdf, other

    math.ST stat.ME

    Minimax Efficient Finite-Difference Stochastic Gradient Estimators Using Black-Box Function Evaluations

    Authors: Henry Lam, Haidong Li, Xuhui Zhang

    Abstract: Standard approaches to stochastic gradient estimation, with only noisy black-box function evaluations, use the finite-difference method or its variants. While natural, it is open to our knowledge whether their statistical accuracy is the best possible. This paper argues so by showing that central finite-difference is a nearly minimax optimal zeroth-order gradient estimator for a suitable class of… ▽ More

    Submitted 12 November, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

  41. arXiv:2006.15722  [pdf, other

    cs.LG stat.ML

    Deep Probabilistic Accelerated Evaluation: A Robust Certifiable Rare-Event Simulation Methodology for Black-Box Safety-Critical Systems

    Authors: Mansur Arief, Zhiyuan Huang, Guru Koushik Senthil Kumar, Yuanlu Bai, Shengyi He, Wenhao Ding, Henry Lam, Ding Zhao

    Abstract: Evaluating the reliability of intelligent physical systems against rare safety-critical events poses a huge testing burden for real-world applications. Simulation provides a useful platform to evaluate the extremal risks of these systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these learning-based syste… ▽ More

    Submitted 8 March, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

  42. arXiv:2006.15689  [pdf, other

    stat.ME

    A Distributionally Robust Optimization Approach to the NASA Langley Uncertainty Quantification Challenge

    Authors: Yuanlu Bai, Zhiyuan Huang, Henry Lam

    Abstract: We study a methodology to tackle the NASA Langley Uncertainty Quantification Challenge problem, based on an integration of robust optimization, more specifically a recent line of research known as distributionally robust optimization, and importance sampling in Monte Carlo simulation. The main computation machinery in this integrated methodology boils down to solving sampled linear programs. We wi… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: Published in the Proceedings of the 30th European Safety and Reliability Conference and the 15th Probabilistic Safety Assessment and Management Conference

  43. arXiv:2001.03952  [pdf, other

    eess.SP cs.LG stat.ML

    Channel Assignment in Uplink Wireless Communication using Machine Learning Approach

    Authors: Guangyu Jia, Zhaohui Yang, Hak-Keung Lam, Jianfeng Shi, Mohammad Shikh-Bahaei

    Abstract: This letter investigates a channel assignment problem in uplink wireless communication systems. Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints. A convex optimization based algorithm is provided to obtain the optimal channel assignment, where the closed-form solution is obtained in each step. Due to high computational complexity in the convex opt… ▽ More

    Submitted 12 January, 2020; originally announced January 2020.

  44. arXiv:1910.06324  [pdf, other

    cs.LG math.ST stat.ML

    Robust Importance Weighting for Covariate Shift

    Authors: Henry Lam, Fengpei Li, Siddharth Prusty

    Abstract: In many learning problems, the training and testing data follow different distributions and a particularly common situation is the \textit{covariate shift}. To correct for sampling biases, most approaches, including the popular kernel mean matching (KMM), focus on estimating the importance weights between the two distributions. Reweighting-based methods, however, are exposed to high variance when… ▽ More

    Submitted 11 March, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

  45. arXiv:1910.05471  [pdf, other

    cs.LG stat.ML

    Uncertainty Quantification and Exploration for Reinforcement Learning

    Authors: YI Zhu, **g Dong, Henry Lam

    Abstract: We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors… ▽ More

    Submitted 4 December, 2022; v1 submitted 11 October, 2019; originally announced October 2019.

  46. arXiv:1905.04079  [pdf, other

    cs.LG cs.MM stat.ML

    Compressing Weight-updates for Image Artifacts Removal Neural Networks

    Authors: Yat Hong Lam, Alireza Zare, Caglar Aytekin, Francesco Cricri, Jani Lainema, Emre Aksu, Miska Hannuksela

    Abstract: In this paper, we present a novel approach for fine-tuning a decoder-side neural network in the context of image compression, such that the weight-updates are better compressible. At encoder side, we fine-tune a pre-trained artifact removal network on target data by using a compression objective applied on the weight-update. In particular, the compression objective encourages weight-updates which… ▽ More

    Submitted 14 June, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: Submission for CHALLENGE ON LEARNED IMAGE COMPRESSION (CLIC) 2019 (updated on 14 June 2019)

  47. arXiv:1904.09306  [pdf, other

    stat.ME cs.RO stat.ML

    Evaluation Uncertainty in Data-Driven Self-Driving Testing

    Authors: Zhiyuan Huang, Mansur Arief, Henry Lam, Ding Zhao

    Abstract: Safety evaluation of self-driving technologies has been extensively studied. One recent approach uses Monte Carlo based evaluation to estimate the occurrence probabilities of safety-critical events as safety measures. These Monte Carlo samples are generated from stochastic input models constructed based on real-world data. In this paper, we propose an approach to assess the impact on the probabili… ▽ More

    Submitted 17 July, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

  48. arXiv:1902.04673  [pdf, other

    stat.ME

    Enhanced Balancing of Bias-Variance Tradeoff in Stochastic Estimation: A Minimax Perspective

    Authors: Henry Lam, Xinyu Zhang, Xuhui Zhang

    Abstract: Biased stochastic estimators, such as finite-differences for noisy gradient estimation, often contain parameters that need to be properly chosen to balance impacts from the bias and the variance. While the optimal order of these parameters in terms of the simulation budget can be readily established, the precise best values depend on model characteristics that are typically unknown in advance. We… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  49. arXiv:1811.04500  [pdf, other

    stat.ME

    Subsampling to Enhance Efficiency in Input Uncertainty Quantification

    Authors: Henry Lam, Huajie Qian

    Abstract: In stochastic simulation, input uncertainty refers to the output variability arising from the statistical noise in specifying the input models. This uncertainty can be measured by a variance contribution in the output, which, in the nonparametric setting, is commonly estimated via the bootstrap. However, due to the convolution of the simulation noise and the input noise, the bootstrap consists of… ▽ More

    Submitted 19 May, 2021; v1 submitted 11 November, 2018; originally announced November 2018.

  50. arXiv:1809.02911  [pdf, other

    stat.AP

    Synthesis of Different Autonomous Vehicles Test Approaches

    Authors: Zhiyuan Huang, Mansur Arief, Henry Lam, Ding Zhao

    Abstract: Currently, the most prevalent way to evaluate an autonomous vehicle is to directly test it on the public road. However, because of recent accidents caused by autonomous vehicles, it becomes controversial about whether on-road tests should be the best approach. Alternatively, people use test tracks or simulation to assess the safety of autonomous vehicles. These approaches are time-efficient and le… ▽ More

    Submitted 8 September, 2018; originally announced September 2018.