Skip to main content

Showing 1–50 of 96 results for author: Zhao, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.19531  [pdf, other

    stat.ML cs.LG

    Forward and Backward State Abstractions for Off-policy Evaluation

    Authors: Meiling Hao, **fan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao, Chengchun Shi

    Abstract: Off-policy evaluation (OPE) is crucial for evaluating a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging.This paper studies state abstractions-originally designed for policy learning-in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstracti… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 42 pages, 5 figures

    ACM Class: G.3; I.2.6; G.1.2

  2. arXiv:2405.16219  [pdf, other

    cs.LG stat.ML

    Deep Causal Generative Models with Property Control

    Authors: Qilong Zhao, Shiyu Wang, Guangji Bai, Bo Pan, Zhaohui Qin, Liang Zhao

    Abstract: Generating data with properties of interest by external users while following the right causation among its intrinsic factors is important yet has not been well addressed jointly. This is due to the long-lasting challenge of jointly identifying key latent variables, their causal relations, and their correlation with properties of interest, as well as how to leverage their discoveries toward causal… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

  3. arXiv:2405.07026  [pdf, other

    stat.ME

    Selective Randomization Inference for Adaptive Experiments

    Authors: Tobias Freidling, Qingyuan Zhao, Zijun Gao

    Abstract: Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are not pre-specified, it has long been recognized that statistical inference for adaptive experiments is not straightforward. Most existing methods only apply to specific ad… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  4. arXiv:2405.02373  [pdf, other

    math.OC cs.LG stat.ML

    Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints

    Authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Amirhossein Asgharnia

    Abstract: This paper studies an online optimal resource reservation problem in communication networks with job transfers where the goal is to minimize the reservation cost while maintaining the blocking cost under a certain budget limit. To tackle this problem, we propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints. We then analyze the perform… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.15558

  5. arXiv:2403.06942  [pdf, other

    eess.SY cs.LG stat.ML

    Grid Monitoring and Protection with Continuous Point-on-Wave Measurements and Generative AI

    Authors: Lang Tong, Xinyi Wang, Qing Zhao

    Abstract: Purpose This article presents a case for a next-generation grid monitoring and control system, leveraging recent advances in generative artificial intelligence (AI), machine learning, and statistical inference. Advancing beyond earlier generations of wide-area monitoring systems built upon supervisory control and data acquisition (SCADA) and synchrophasor technologies, we argue for a monitoring an… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  6. arXiv:2402.13870  [pdf, ps, other

    cs.LG eess.SP stat.AP

    Generative Probabilistic Time Series Forecasting and Applications in Grid Operations

    Authors: Xinyi Wang, Lang Tong, Qing Zhao

    Abstract: Generative probabilistic forecasting produces future time series samples according to the conditional probability distribution given past time series observations. Such techniques are essential in risk-based decision-making and planning under uncertainty with broad applications in grid operations, including electricity price forecasting, risk-based economic dispatch, and stochastic optimizations.… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted at CISS 2024. arXiv admin note: text overlap with arXiv:2306.03782

  7. arXiv:2402.13182  [pdf, other

    cs.LG cs.DC stat.ML

    Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

    Authors: Nikola Pavlovic, Sudeep Salgia, Qing Zhao

    Abstract: We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time $T$ and aggrega… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  8. arXiv:2401.17518  [pdf, other

    stat.ME math.ST

    Model Uncertainty and Selection of Risk Models for Left-Truncated and Right-Censored Loss Data

    Authors: Qian Zhao, Sahadeb Upretee, Dao** Yu

    Abstract: Insurance loss data are usually in the form of left-truncation and right-censoring due to deductibles and policy limits respectively. This paper investigates the model uncertainty and selection procedure when various parametric models are constructed to accommodate such left-truncated and right-censored data. The joint asymptotic properties of the estimators have been established using the Delta m… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Journal ref: Risks, 2023, 11(11),188

  9. arXiv:2401.16651  [pdf, other

    stat.ME math.ST stat.AP stat.CO

    A constructive approach to selective risk control

    Authors: Zijun Gao, Wenjie Hu, Qingyuan Zhao

    Abstract: Many modern applications require the use of data to both select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini-Hochberg (BH) procedure for multiple hypothesis testing as the iterative limit of the Benjamini-Yekutieli (BY)… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 8 figures, 2 tables

  10. arXiv:2401.07711  [pdf, other

    cs.LG stat.ML

    Efficient Nonparametric Tensor Decomposition for Binary and Count Data

    Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao

    Abstract: In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-l… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: AAAI-24

  11. arXiv:2311.10023  [pdf, other

    stat.ML cs.LG

    Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques

    Authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Amirhossein Asgharnia

    Abstract: We tackle in this paper an online network resource allocation problem with job transfers. The network is composed of many servers connected by communication links. The system operates in discrete time; at each time slot, the administrator reserves resources at servers for future job requests, and a cost is incurred for the reservations made. Then, after receptions, the jobs may be transferred betw… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  12. arXiv:2310.15351  [pdf, other

    cs.LG stat.ML

    Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established… ▽ More

    Submitted 2 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  13. arXiv:2310.07838  [pdf, other

    cs.LG cs.AI cs.IT math.ST stat.ML

    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

    Authors: Qingyue Zhao, Banghua Zhu

    Abstract: We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the… ▽ More

    Submitted 14 November, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 41 pages, 2 figures; Appendix polished

  14. arXiv:2309.06053  [pdf, ps, other

    stat.ME math.ST

    Confounder selection via iterative graph expansion

    Authors: F. Richard Guo, Qingyuan Zhao

    Abstract: Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of observational studies. Previous methods, such as Pearl's celebrated back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confou… ▽ More

    Submitted 24 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 29 pages; added link to Shiny web app

  15. arXiv:2308.00950  [pdf, other

    stat.ME

    Beta-trees: Multivariate histograms with confidence statements

    Authors: Guenther Walther, Qian Zhao

    Abstract: Multivariate histograms are difficult to construct due to the curse of dimensionality. Motivated by $k$-d trees in computer science, we show how to construct an efficient data-adaptive partition of Euclidean space that possesses the following two properties: With high confidence the distribution from which the data are generated is close to uniform on each rectangle of the partition; and despite t… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    MSC Class: 62G15

  16. arXiv:2306.09507  [pdf, ps, other

    stat.AP math.ST stat.CO stat.ME

    Winsorized Robust Credibility Models

    Authors: Qian Zhao, Chudamani Poudyal

    Abstract: The BĂĽhlmann model, a branch of classical credibility theory, has been successively applied to the premium estimation for group insurance contracts and other insurance specifications. In this paper, we develop a robust BĂĽhlmann credibility via the censored version of loss data, or the censored mean (a robust alternative to traditional individual mean). This framework yields explicit formulas of st… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Winsorized Robust Credibility Models

  17. arXiv:2305.15558  [pdf, other

    math.OC cs.LG stat.ML

    Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints

    Authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Shima Kheradmand

    Abstract: In this paper, we study an optimal online resource reservation problem in a simple communication network. The network is composed of two compute nodes linked by a local communication link. The system operates in discrete time; at each time slot, the administrator reserves resources for servers before the actual job requests are known. A cost is incurred for the reservations made. Then, after the c… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  18. arXiv:2305.08359  [pdf, other

    cs.LG math.OC stat.ML

    Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

    Authors: Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

    Abstract: Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$. However, it remains an open question that if such results can be carried over to adversarial RL, where the reward is adversarially chosen at each episode. In this paper, we… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 34 pages

  19. arXiv:2303.01552  [pdf, other

    stat.ME math.ST stat.AP

    Simultaneous Hypothesis Testing Using Internal Negative Controls with An Application to Proteomics

    Authors: Zijun Gao, Qingyuan Zhao

    Abstract: Negative control is a common technique in scientific investigations and broadly refers to the situation where a null effect (''negative result'') is expected. Motivated by a real proteomic dataset, we will present three promising and closely connected methods of using negative controls to assist simultaneous hypothesis testing. The first method uses negative controls to construct a permutation p-v… ▽ More

    Submitted 19 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 41 pages, 10 figures, 3 tables

  20. arXiv:2301.00040  [pdf, other

    stat.ME

    Optimization-based Sensitivity Analysis for Unmeasured Confounding using Partial Correlations

    Authors: Tobias Freidling, Qingyuan Zhao

    Abstract: Causal inference necessarily relies upon untestable assumptions; hence, it is crucial to assess the robustness of obtained results to violations of identification assumptions. However, such sensitivity analysis is only occasionally undertaken in practice, as many existing methods only apply to relatively simple models and their results are often difficult to interpret. We take a more flexible appr… ▽ More

    Submitted 19 January, 2024; v1 submitted 30 December, 2022; originally announced January 2023.

  21. arXiv:2211.08637  [pdf, other

    stat.OT

    Near-peer mentoring in data science: Two experiences at Stanford University

    Authors: Chiara Sabatti, Qian Zhao

    Abstract: Universities have been expanding the data science programs for undergraduate students, with the simultaneous goal of reaching and retaining students from underrepresented groups in the data science workforce. The set of new programs also offer opportunities to involve graduate students, fostering their growth as future leaders in data science education. We describe two programs that use the near p… ▽ More

    Submitted 8 June, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

  22. arXiv:2211.04697  [pdf, other

    stat.ME math.ST

    $L^{\infty}$- and $L^2$-sensitivity analysis for causal inference with unmeasured confounding

    Authors: Yao Zhang, Qingyuan Zhao

    Abstract: Sensitivity analysis for the unconfoundedness assumption is crucial in observational studies. For this purpose, the marginal sensitivity model (MSM) gained popularity recently due to its good interpretability and mathematical properties. However, as a quantification of confounding strength, the $L^{\infty}$-bound it puts on the logit difference between the observed and full data propensity scores… ▽ More

    Submitted 24 February, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 57 pages, 3 figures, 2 tables

  23. arXiv:2210.13358  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Novelty Detection in Time Series via Weak Innovations Representation: A Deep Learning Approach

    Authors: Xinyi Wang, Mei-jen Lee, Qing Zhao, Lang Tong

    Abstract: We consider novelty detection in time series with unknown and nonparametric probability structures. A deep learning approach is proposed to causally extract an innovations sequence consisting of novelty samples statistically independent of all past samples of the time series. A novelty detection algorithm is developed for the online detection of novel changes in the probability structure in the in… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  24. arXiv:2210.09026  [pdf, other

    cs.LG stat.ML

    WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

    Authors: Xi Chen, Tianyu Shi, Qingpeng Zhao, Yuchen Sun, Yunfei Gao, Xiangjun Wang

    Abstract: Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments such as Arcade Learning Environment, MuJoCo, and ViZDoom. However, they are hardly extensible to more complicated problems, mainly due to the lack of complexity and variations in the environments they are trained and tested on. Furthermore, they are not extensible t… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  25. arXiv:2210.04360  [pdf, other

    stat.ME

    A unified analysis of regression adjustment in randomized experiments

    Authors: Katarzyna Reluga, Ting Ye, Qingyuan Zhao

    Abstract: Regression adjustment is broadly applied in randomized trials under the premise that it usually improves the precision of a treatment effect estimator. However, previous work has shown that this is not always true. To further understand this phenomenon, we develop a unified comparison of the asymptotic variance of a class of linear regression-adjusted estimators. Our analysis is based on the class… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: 17 pages, 1 figure, 2 tables

    MSC Class: 62F10; 62J99 ACM Class: G.3

  26. arXiv:2209.06620  [pdf, other

    cs.LG cs.AI stat.ML

    Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

    Authors: Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

    Abstract: Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a d… ▽ More

    Submitted 27 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: First two authors contribute equally

  27. arXiv:2208.14035  [pdf, other

    stat.ME

    Almost exact Mendelian randomization

    Authors: Matthew J Tudball, George Davey Smith, Qingyuan Zhao

    Abstract: Mendelian randomization (MR) is a natural experimental design based on the random transmission of genes from parents to offspring. However, this inferential basis is typically only implicit or used as an informal justification. As parent-offspring data becomes more widely available, we advocate a different approach to MR that is exactly based on this natural randomization, thereby formalizing the… ▽ More

    Submitted 18 April, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 41 pages, 10 figures

    MSC Class: 62D20; 62G10; 62P10

  28. arXiv:2208.13871  [pdf, ps, other

    stat.ME math.ST

    Confounder Selection: Objectives and Approaches

    Authors: F. Richard Guo, Anton Rask Lundborg, Qingyuan Zhao

    Abstract: Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection… ▽ More

    Submitted 24 September, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 15 pages

  29. arXiv:2208.08944  [pdf, other

    stat.ME

    An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

    Authors: Qian Zhao, Emmanuel J. Candes

    Abstract: Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions.… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

  30. arXiv:2207.07948  [pdf, other

    stat.ML cs.LG

    Collaborative Learning in Kernel-based Bandits for Distributed Users

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We study collaborative learning among distributed clients facilitated by a central server. Each client is interested in maximizing a personalized objective function that is a weighted sum of its local objective and a global objective. Each client has direct access to random bandit feedback on its local objective, but only has a partial view of the global objective and relies on information exchang… ▽ More

    Submitted 17 April, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

  31. arXiv:2206.00099  [pdf, other

    stat.ML cs.LG

    Provably and Practically Efficient Neural Contextual Bandits

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We consider the neural contextual bandit problem. In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  32. arXiv:2204.02477  [pdf, ps, other

    stat.ME

    Method of Winsorized Moments for Robust Fitting of Truncated and Censored Lognormal Distributions

    Authors: Chudamani Poudyal, Qian Zhao, Vytaras Brazauskas

    Abstract: When constructing parametric models to predict the cost of future claims, several important details have to be taken into account: (i) models should be designed to accommodate deductibles, policy limits, and coinsurance factors, (ii) parameters should be estimated robustly to control the influence of outliers on model predictions, and (iii) all point predictions should be augmented with estimates… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: 35 pages, 4 figures, etc

  33. What is a randomization test?

    Authors: Yao Zhang, Qingyuan Zhao

    Abstract: The meaning of randomization tests has become obscure in statistics education and practice over the last century. This article makes a fresh attempt at rectifying this core concept of statistics. A new term -- "quasi-randomization test" -- is introduced to define significance tests based on theoretical models and distinguish these tests from the "randomization tests" based on the physical act of r… ▽ More

    Submitted 4 April, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: 45 pages, 2 figures. Accepted for publication in the Journal of American Statistical Association on 26th March, 2023. arXiv admin note: substantial text overlap with arXiv:2104.10618

    MSC Class: 62G10; 62B15

  34. arXiv:2203.08857  [pdf, other

    stat.ML cs.AI cs.LG

    Noisy Tensor Completion via Low-rank Tensor Ring

    Authors: Yuning Qiu, Guoxu Zhou, Qibin Zhao, Shengli Xie

    Abstract: Tensor completion is a fundamental tool for incomplete data analysis, where the goal is to predict missing entries from partial observations. However, existing methods often make the explicit or implicit assumption that the observed entries are noise-free to provide a theoretical guarantee of exact recovery of missing entries, which is quite restrictive in practice. To remedy such drawbacks, this… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  35. arXiv:2110.13391  [pdf

    stat.AP

    Analyzing the Data of COVID-19 with Quasi-Distribution Fitting Based on Piecewise B-spline Curves

    Authors: Qingliang Zhao, Zhenhuan Lu, Yiduo Wang

    Abstract: Facing the world wide coronavirus disease 2019 (COVID-19) pandemic, a new fitting method (QDF, quasi-distribution fitting) which could be used to analyze the data of COVID-19 is developed based on piecewise quasi-uniform B-spline curves. For any given country or district, it simulates the distribution histogram data which is made from the daily confirmed cases (or the other data including daily re… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  36. arXiv:2104.10618  [pdf, other

    math.ST stat.ME

    Multiple conditional randomization tests for lagged and spillover treatment effects

    Authors: Yao Zhang, Qingyuan Zhao

    Abstract: We consider the problem of constructing multiple conditional randomization tests. They may test different causal hypotheses but always aim to be nearly independent, allowing the randomization p-values to be interpreted individually and combined using standard methods. We start with a simple, sequential construction of such tests, and then discuss its application to three problems: evidence factors… ▽ More

    Submitted 23 November, 2023; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: 40 pages; Part of the original version of this paper can be found at arXiv:2203.10980

    MSC Class: 62G10; 62B15

  37. arXiv:2101.11552  [pdf, other

    cs.LG stat.ML

    Efficient Graph Deep Learning in TensorFlow with tf_geometric

    Authors: Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, Changsheng Xu

    Abstract: We introduce tf_geometric, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1.x and 2.x. tf_geometric provides kernel libraries for building Graph Neural Networks (GNNs) as well as implementations of popular GNNs. The kernel libraries consist of infrastructures for building efficient GNNs, including graph data structures, graph map-reduce framewor… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: 7 pages, 5 figures

  38. arXiv:2011.14047  [pdf, other

    cs.LG stat.ML

    Learning from Incomplete Features by Simultaneous Training of Neural Networks and Sparse Coding

    Authors: Cesar F. Caiafa, Ziyao Wang, Jordi Solé-Casals, Qibin Zhao

    Abstract: In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs in the applications when not all the features are collected for every data sample. A new supervised learning method is developed to train a general classifier,… ▽ More

    Submitted 17 April, 2021; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: 11 pages, 7 figures, paper accepted for presentation at L2ID Workshop at CVPR 2021 (19-25 June, 2021)

  39. arXiv:2010.13997  [pdf, other

    stat.ML cs.LG

    A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance

    Authors: Sudeep Salgia, Sattar Vakili, Qing Zhao

    Abstract: We consider sequential optimization of an unknown function in a reproducing kernel Hilbert space. We propose a Gaussian process-based algorithm and establish its order-optimal regret performance (up to a poly-logarithmic factor). This is the first GP-based algorithm with an order-optimal regret guarantee. The proposed algorithm is rooted in the methodology of domain shrinking realized through a se… ▽ More

    Submitted 29 October, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted to NeurIPS 2021

  40. arXiv:2009.11828  [pdf, other

    stat.ME

    Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials

    Authors: Ting Ye, Jun Shao, Yanyao Yi, Qingyuan Zhao

    Abstract: In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better p… ▽ More

    Submitted 13 July, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

  41. arXiv:2009.08868  [pdf

    q-bio.BM cs.LG stat.ML

    Review of Machine-Learning Methods for RNA Secondary Structure Prediction

    Authors: Qi Zhao, Zheng Zhao, Xiaoya Fan, Zhengwei Yuan, Qian Mao, Yudong Yao

    Abstract: Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagn… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

    Comments: 25 pages, 5 figures, 1 table

    MSC Class: I.2.0 General

  42. arXiv:2009.04832  [pdf, other

    stat.AP stat.ME

    A note on post-treatment selection in studying racial discrimination in policing

    Authors: Qingyuan Zhao, Luke J Keele, Dylan S Small, Marshall M Joffe

    Abstract: We discuss some causal estimands used to study racial discrimination in policing. A central challenge is that not all police-civilian encounters are recorded in administrative datasets and available to researchers. One possible solution is to consider the average causal effect of race conditional on the civilian already being detained by the police. We find that such an estimand can be quite diffe… ▽ More

    Submitted 14 June, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

    Comments: Accepted for publication in the American Political Science Review on 14th June, 2021

    MSC Class: 62D20 (Primary) 62P25 (Secondary)

  43. arXiv:2007.14546  [pdf, other

    cs.LG stat.ML

    MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks

    Authors: Jun Shu, Yanwen Zhu, Qian Zhao, Zongben Xu, Deyu Meng

    Abstract: The learning rate (LR) is one of the most important hyper-parameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it al… ▽ More

    Submitted 13 May, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: 19 pages

  44. arXiv:2007.06476  [pdf, other

    stat.AP stat.ME

    A Latent Mixture Model for Heterogeneous Causal Mechanisms in Mendelian Randomization

    Authors: Daniel Iong, Qingyuan Zhao, Yang Chen

    Abstract: Mendelian Randomization (MR) is a popular method in epidemiology and genetics that uses genetic variation as instrumental variables for causal inference. Existing MR methods usually assume most genetic variants are valid instrumental variables that identify a common causal effect. There is a general lack of awareness that this effect homogeneity assumption can be violated when there are multiple c… ▽ More

    Submitted 13 June, 2022; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 30 pages, 17 figures, 3 tables

  45. arXiv:2006.06930  [pdf, other

    cs.LG cs.CV stat.ML

    Longitudinal Self-Supervised Learning

    Authors: Qingyu Zhao, Zixuan Liu, Ehsan Adeli, Kilian M. Pohl

    Abstract: Machine learning analysis of longitudinal neuroimaging data is typically based on supervised learning, which requires a large number of ground-truth labels to be informative. As ground-truth labels are often missing or expensive to obtain in neuroscience, we avoid them in our analysis by combing factor disentanglement with self-supervised learning to identify changes and consistencies across the m… ▽ More

    Submitted 26 June, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  46. arXiv:2006.05697  [pdf, other

    cs.LG stat.ML

    Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

    Authors: Jun Shu, Qian Zhao, Zongben Xu, Deyu Meng

    Abstract: To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly es… ▽ More

    Submitted 11 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 14 pages

  47. arXiv:2005.11442  [pdf, other

    cs.LG stat.ML

    Active Learning for Skewed Data Sets

    Authors: Abbas Kazerouni, Qi Zhao, **g Xie, Sandeep Tata, Marc Najork

    Abstract: Consider a sequential active learning problem where, at each round, an agent selects a batch of unlabeled data points, queries their labels and updates a binary classifier. While there exists a rich body of work on active learning in this general form, in this paper, we focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training da… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

  48. arXiv:2004.07743  [pdf, other

    stat.AP stat.ME

    BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic

    Authors: Qingyuan Zhao, Nianqiao Ju, Sergio Bacallado, Rajen D. Shah

    Abstract: The coronavirus disease 2019 (COVID-19) has quickly grown from a regional outbreak in Wuhan, China to a global pandemic. Early estimates of the epidemic growth and incubation period of COVID-19 may have been biased due to sample selection. Using detailed case reports from 14 locations in and outside mainland China, we obtained 378 Wuhan-exported cases who left Wuhan before an abrupt travel quarant… ▽ More

    Submitted 24 September, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: 33 pages, 8 figures, 5 tables; Accepted for publication in The Annals of Applied Statistics on 24th September, 2020

    MSC Class: 62P10; 62F15

  49. arXiv:2003.10613  [pdf, other

    cs.LG stat.ML

    Spatio-Temporal Graph Convolution for Resting-State fMRI Analysis

    Authors: Soham Gadgil, Qingyu Zhao, Adolf Pfefferbaum, Edith V. Sullivan, Ehsan Adeli, Kilian M. Pohl

    Abstract: The Blood-Oxygen-Level-Dependent (BOLD) signal of resting-state fMRI (rs-fMRI) records the temporal dynamics of intrinsic functional networks in the brain. However, existing deep learning methods applied to rs-fMRI either neglect the functional dependency between different brain regions in a network or discard the information in the temporal dynamics of brain activity. To overcome those shortcomin… ▽ More

    Submitted 28 June, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

  50. arXiv:2003.05482  [pdf, other

    stat.ML cs.LG math.OC

    Stochastic Coordinate Minimization with Progressive Precision for Stochastic Convex Optimization

    Authors: Sudeep Salgia, Qing Zhao, Sattar Vakili

    Abstract: A framework based on iterative coordinate minimization (CM) is developed for stochastic convex optimization. Given that exact coordinate minimization is impossible due to the unknown stochastic nature of the objective function, the crux of the proposed optimization algorithm is an optimal control of the minimization precision in each iteration. We establish the optimal precision control and the re… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.