Skip to main content

Showing 1–50 of 103 results for author: Zhou, W

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.00882  [pdf, other

    stat.ME

    Subgroup Identification with Latent Factor Structure

    Authors: Yong He, Dong Liu, Fuxin Wang, Mingjuan Zhang, Wen-Xin Zhou

    Abstract: Subgroup analysis has attracted growing attention due to its ability to identify meaningful subgroups from a heterogeneous population and thereby improving predictive power. However, in many scenarios such as social science and biology, the covariates are possibly highly correlated due to the existence of common factors, which brings great challenges for group identification and is neglected in th… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2404.15466  [pdf, other

    stat.ML cs.LG

    Private Optimal Inventory Policy Learning for Feature-based Newsvendor with Unknown Demand

    Authors: Tuoyi Zhao, Wen-xin Zhou, Lan Wang

    Abstract: The data-driven newsvendor problem with features has recently emerged as a significant area of research, driven by the proliferation of data across various sectors such as retail, supply chains, e-commerce, and healthcare. Given the sensitive nature of customer or organizational data often used in feature-based analysis, it is crucial to ensure individual privacy to uphold trust and confidence. De… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  3. arXiv:2403.07431  [pdf, other

    stat.ML cs.LG

    Knowledge Transfer across Multiple Principal Component Analysis Studies

    Authors: Zeyu Li, Kangxiang Qin, Yong He, Wang Zhou, Xinsheng Zhang

    Abstract: Transfer learning has aroused great interest in the statistical community. In this article, we focus on knowledge transfer for unsupervised learning tasks in contrast to the supervised learning tasks in the literature. Given the transferable source populations, we propose a two-step transfer learning algorithm to extract useful information from multiple source principal component analysis (PCA) st… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  4. arXiv:2402.04602  [pdf, other

    math.ST cs.IT stat.ME

    Online Quantile Regression

    Authors: Yinan Shen, Dong Xia, Wen-Xin Zhou

    Abstract: This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analys… ▽ More

    Submitted 18 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  5. arXiv:2401.03072  [pdf, other

    stat.ME math.ST

    Optimal Nonparametric Inference on Network Effects with Dependent Edges

    Authors: Wenqin Du, Yuan Zhang, Wen Zhou

    Abstract: Testing network effects in weighted directed networks is a foundational problem in econometrics, sociology, and psychology. Yet, the prevalent edge dependency poses a significant methodological challenge. Most existing methods are model-based and come with stringent assumptions, limiting their applicability. In response, we introduce a novel, fully nonparametric framework that requires only minima… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 29 pages, 3 figures

    MSC Class: 62E17; 62G10; 91D30

  6. arXiv:2310.19300  [pdf, other

    stat.ML cs.LG

    Stage-Aware Learning for Dynamic Treatments

    Authors: Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu

    Abstract: Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal treatment searching algorithms, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms could suffer from insufficient sample size under optimal treatments, especially for chronic diseases involving long stages of decision-making. To address the… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  7. arXiv:2310.07990  [pdf

    q-bio.GN cs.IR cs.LG stat.AP

    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

    Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

    Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  8. arXiv:2309.16188  [pdf, other

    stat.ML cs.LG

    Stackelberg Batch Policy Learning

    Authors: Wenzhuo Zhou, Annie Qu

    Abstract: Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have comm… ▽ More

    Submitted 1 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

  9. arXiv:2309.13459  [pdf, other

    stat.ML cs.AI cs.LG

    A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

    Authors: Wenzhuo Zhou, Annie Qu, Keiland W. Cooper, Norbert Fortin, Babak Shahbaba

    Abstract: Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, however, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel \textbf{M}odel-\textbf{a}gnostic \text… ▽ More

    Submitted 18 May, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

  10. arXiv:2309.13458  [pdf, other

    stat.ME

    Policy Learning for Individualized Treatment Regimes on Infinite Time Horizon

    Authors: Wenzhuo Zhou, Yuhan Li, Ruoqing Zhu

    Abstract: With the recent advancements of technology in facilitating real-time monitoring and data collection, "just-in-time" interventions can be delivered via mobile devices to achieve both real-time and long-term management and control. Reinforcement learning formalizes such mobile interventions as a sequence of decision rules and assigns treatment arms based on the user's status at each decision point.… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  11. arXiv:2309.13278  [pdf, other

    stat.ML cs.LG

    Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework

    Authors: Wenzhuo Zhou, Yuhan Li, Ruoqing Zhu, Annie Qu

    Abstract: We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes, where the objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. This task faces two primary challenges: providing a comprehensive and rigorous error quantification in CI estimation, and addressi… ▽ More

    Submitted 1 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

  12. arXiv:2307.09725  [pdf

    stat.AP

    Global Inequality in Cooling from Urban Green Spaces and its Climate Change Adaptation Potential

    Authors: Yuxiang Li, Jens-Christian Svenning, Weiqi Zhou, Kai Zhu, Jesse F. Abrams, Timothy M. Lenton, Shuqing N. Teng, Robert R. Dunn, Chi Xu

    Abstract: Heat extremes are projected to severely impact humanity and with increasing geographic disparities. Global South countries are more exposed to heat extremes and have reduced adaptation capacity. One documented source of such adaptation inequality is a lack of resources to cool down indoor temperatures. Less is known about the capacity to ameliorate outdoor heat stress. Here, we assess global inequ… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 56 pages, 28 figures

  13. arXiv:2307.02695  [pdf, other

    stat.ME

    High-Dimensional Expected Shortfall Regression

    Authors: Shushu Zhang, Xuming He, Kean Ming Tan, Wen-Xin Zhou

    Abstract: The expected shortfall is defined as the average over the tail below (or above) a certain quantile of a probability distribution. The expected shortfall regression provides powerful tools for learning the relationship between a response variable and a set of covariates while exploring the heterogeneous effects of the covariates. In the health disparity research, for example, the lower/upper tail o… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: R code for fitting the proposed method can be found at https://github.com/shushuzh/ES_highD.git

  14. arXiv:2305.15742  [pdf, other

    stat.ML cs.LG stat.ME

    Counterfactual Generative Models for Time-Varying Treatments

    Authors: Shenghao Wu, Wenbin Zhou, Minshuo Chen, Shixiang Zhu

    Abstract: Estimating the counterfactual outcome of treatment is essential for decision-making in public health and clinical science, among others. Often, treatments are administered in a sequential, time-varying manner, leading to an exponentially increased number of possible counterfactual outcomes. Furthermore, in modern applications, the outcomes are high-dimensional and conventional average treatment ef… ▽ More

    Submitted 15 June, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Published at KDD'24

  15. arXiv:2304.00433  [pdf, other

    eess.SP cs.CV cs.LG stat.CO

    Ideal Observer Computation by Use of Markov-Chain Monte Carlo with Generative Adversarial Networks

    Authors: Weimin Zhou, Umberto Villa, Mark A. Anastasio

    Abstract: Medical imaging systems are often evaluated and optimized via objective, or task-specific, measures of image quality (IQ) that quantify the performance of an observer on a specific clinically-relevant task. The performance of the Bayesian Ideal Observer (IO) sets an upper limit among all observers, numerical or human, and has been advocated for use as a figure-of-merit (FOM) for evaluating and opt… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: Submitted to IEEE Transactions on Medical Imaging

  16. arXiv:2303.03532  [pdf, ps, other

    stat.ME

    Extreme eigenvalues of sample covariance matrices under generalized elliptical models with applications

    Authors: Xiucai Ding, Jiahui Xie, Long Yu, Wang Zhou

    Abstract: We consider the extreme eigenvalues of the sample covariance matrix $Q=YY^*$ under the generalized elliptical model that $Y=Σ^{1/2}XD.$ Here $Σ$ is a bounded $p \times p$ positive definite deterministic matrix representing the population covariance structure, $X$ is a $p \times n$ random matrix containing either independent columns sampled from the unit sphere in $\mathbb{R}^p$ or i.i.d. centered… ▽ More

    Submitted 19 April, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 90 pages, 6 figures, some typos are corrected

  17. arXiv:2303.02817  [pdf, other

    stat.ME

    Huber Principal Component Analysis for Large-dimensional Factor Models

    Authors: Yong He, Lingxiao Li, Dong Liu, Wen-Xin Zhou

    Abstract: Factor models have been widely used in economics and finance. However, the heavy-tailed nature of macroeconomic and financial data is often neglected in the existing literature. To address this issue and achieve robustness, we propose an approach to estimate factor loadings and scores by minimizing the Huber loss function, which is motivated by the equivalence of conventional Principal Component A… ▽ More

    Submitted 29 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

  18. arXiv:2301.08940  [pdf, other

    stat.ML cs.LG stat.ME

    Quasi-optimal Reinforcement Learning with Continuous Actions

    Authors: Yuhan Li, Wenzhuo Zhou, Ruoqing Zhu

    Abstract: Many real-world applications of reinforcement learning (RL) require making decisions in continuous action environments. In particular, determining the optimal dose level plays a vital role in develo** medical treatment regimes. One challenge in adapting existing RL algorithms to medical applications, however, is that the popular infinite support stochastic policies, e.g., Gaussian policy, may as… ▽ More

    Submitted 1 October, 2023; v1 submitted 21 January, 2023; originally announced January 2023.

    Comments: The first two authors contributed equally to this work

  19. arXiv:2301.00360  [pdf, ps, other

    stat.ME

    An Efficient Iterative Least Squares Algorithm for Large-dimensional Matrix Factor Model via Random Projection

    Authors: Yong He, Ran Zhao, Wen-Xin Zhou

    Abstract: The matrix factor model has drawn growing attention for its advantage in achieving two-directional dimension reduction simultaneously for matrix-structured observations. In this paper, we propose a simple iterative least squares algorithm for matrix factor models, in contrast to the Principal Component Analysis (PCA)-based methods in the literature. In detail, we first propose to estimate the late… ▽ More

    Submitted 1 August, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  20. arXiv:2212.05565  [pdf, other

    stat.ME math.ST

    Robust Estimation and Inference for Expected Shortfall Regression with Many Regressors

    Authors: Xuming He, Kean Ming Tan, Wen-Xin Zhou

    Abstract: Expected Shortfall (ES), also known as superquantile or Conditional Value-at-Risk, has been recognized as an important measure in risk analysis and stochastic optimization, and is also finding applications beyond these areas. In finance, it refers to the conditional expected return of an asset given that the return is below some quantile of its distribution. In this paper, we consider a recently p… ▽ More

    Submitted 11 December, 2022; originally announced December 2022.

  21. arXiv:2212.05562  [pdf, ps, other

    stat.ME stat.ML

    Retire: Robust Expectile Regression in High Dimensions

    Authors: Rebeka Man, Kean Ming Tan, Zian Wang, Wen-Xin Zhou

    Abstract: High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distribution… ▽ More

    Submitted 22 March, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

  22. arXiv:2210.11049  [pdf, other

    cs.CR cs.AI cs.LG stat.ML

    How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers

    Authors: Guangsheng Zhang, Bo Liu, Huan Tian, Tianqing Zhu, Ming Ding, Wanlei Zhou

    Abstract: As a booming research area in the past decade, deep learning technologies have been driven by big data collected and processed on an unprecedented scale. However, privacy concerns arise due to the potential leakage of sensitive information from the training data. Recent research has revealed that deep learning models are vulnerable to various privacy attacks, including membership inference attacks… ▽ More

    Submitted 2 February, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: To appear in USENIX Security 2024

  23. arXiv:2209.00804  [pdf, ps, other

    stat.ME

    Marginal Regression on Transient State Occupation Probabilities with Clustered Multistate Process Data

    Authors: Wenxian Zhou, Giorgos Bakoyannis, Ying Zhang, Constantin T Yiannoutsos

    Abstract: Clustered multistate process data are commonly encountered in multicenter observational studies and clinical trials. A clinically important estimand with such data is the marginal probability of being in a particular transient state as a function of time. However, there is currently no method for nonparametric marginal regression analysis of these probabilities with clustered multistate process da… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

  24. High-Dimensional Composite Quantile Regression: Optimal Statistical Guarantees and Fast Algorithms

    Authors: Haeseong Moon, Wen-Xin Zhou

    Abstract: The composite quantile regression (CQR) was introduced by Zou and Yuan [Ann. Statist. 36 (2008) 1108--1126] as a robust regression method for linear models with heavy-tailed errors while achieving high efficiency. Its penalized counterpart for high-dimensional sparse models was recently studied in Gu and Zou [IEEE Trans. Inf. Theory 66 (2020) 7132--7154], along with a specialized optimization algo… ▽ More

    Submitted 12 October, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

    Comments: 42 pages, 7 figures

    MSC Class: 62J07 (Primary) 62A01 (Secondary)

    Journal ref: Electron. J. Statist. 17(2): 2067-2119 (2023)

  25. arXiv:2207.09633  [pdf, other

    stat.ME

    Matrix Kendall's tau in High-dimensions: A Robust Statistic for Matrix Factor Model

    Authors: Yong He, Yalin Wang, Long Yu, Wang Zhou, Wen-Xin Zhou

    Abstract: In this article, we first propose generalized row/column matrix Kendall's tau for matrix-variate observations that are ubiquitous in areas such as finance and medical imaging. For a random matrix following a matrix-variate elliptically contoured distribution, we show that the eigenspaces of the proposed row/column matrix Kendall's tau coincide with those of the row/column scatter matrix respective… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  26. arXiv:2205.02432  [pdf, other

    stat.ME stat.CO

    A Unified Algorithm for Penalized Convolution Smoothed Quantile Regression

    Authors: Rebeka Man, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou

    Abstract: Penalized quantile regression (QR) is widely used for studying the relationship between a response variable and a set of predictors under data heterogeneity in high-dimensional settings. Compared to penalized least squares, scalable algorithms for fitting penalized QR are lacking due to the non-differentiable piecewise linear loss function. To overcome the lack of smoothness, a recently proposed c… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  27. arXiv:2204.09116  [pdf, other

    math.OC cs.LG stat.ML

    A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization

    Authors: Jarad Forristal, Joshua Griffin, Wenwen Zhou, Seyedalireza Yektamaram

    Abstract: In this work we describe an Adaptive Regularization using Cubics (ARC) method for large-scale nonconvex unconstrained optimization using Limited-memory Quasi-Newton (LQN) matrices. ARC methods are a relatively new family of optimization strategies that utilize a cubic-regularization (CR) term in place of trust-regions and line-searches. LQN methods offer a large-scale alternative to using explicit… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 14 pages, 1 figures, 3 tables

    MSC Class: 90C53; 15A06; 90C06; 65K05; 65K10; 49M15

  28. arXiv:2203.10418  [pdf, other

    math.ST stat.ML

    How do noise tails impact on deep ReLU networks?

    Authors: Jianqing Fan, Yihong Gu, Wen-Xin Zhou

    Abstract: This paper investigates the stability of deep ReLU neural networks for nonparametric regression under the assumption that the noise has only a finite p-th moment. We unveil how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss a… ▽ More

    Submitted 30 December, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: 79 pages, 5 figures

    MSC Class: 62G08; 62G35

  29. arXiv:2202.08441  [pdf, other

    stat.ME stat.ML

    Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression

    Authors: Yinan Lin, Wen Zhou, Zhi Geng, Gexin Xiao, Jianxin Yin

    Abstract: In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. Here, we consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. Both the threshold points and regression coefficients are unknown and to be estimated. For high dimensional data, we propose a fusi… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  30. arXiv:2202.06188  [pdf, other

    math.ST math.PR stat.ME

    Testing the number of common factors by bootstrapped sample covariance matrix in high-dimensional factor models

    Authors: Long Yu, Peng Zhao, Wang Zhou

    Abstract: This paper studies the impact of bootstrap procedure on the eigenvalue distributions of the sample covariance matrix under a high-dimensional factor structure. We provide asymptotic distributions for the top eigenvalues of bootstrapped sample covariance matrix under mild conditions. After bootstrap, the spiked eigenvalues which are driven by common factors will converge weakly to Gaussian limits a… ▽ More

    Submitted 20 November, 2023; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: 102 pages, 9 figures, 6 tables

    MSC Class: 62H25; 60B20

  31. arXiv:2111.01560  [pdf, other

    stat.ML cs.LG

    Efficient Learning of Quadratic Variance Function Directed Acyclic Graphs via Topological Layers

    Authors: Wei Zhou, Xin He, Wei Zhong, Junhui Wang

    Abstract: Directed acyclic graph (DAG) models are widely used to represent causal relationships among random variables in many application domains. This paper studies a special class of non-Gaussian DAG models, where the conditional variance of each node given its parents is a quadratic function of its conditional mean. Such a class of non-Gaussian DAG models are fairly flexible and admit many popular distr… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  32. arXiv:2110.13113  [pdf, other

    stat.ME stat.ML

    Communication-Constrained Distributed Quantile Regression with Optimal Statistical Guarantees

    Authors: Kean Ming Tan, Heather Battey, Wen-Xin Zhou

    Abstract: We address the problem of how to achieve optimal inference in distributed quantile regression without stringent scaling conditions. This is challenging due to the non-smooth nature of the quantile regression (QR) loss function, which invalidates the use of existing methodology. The difficulties are resolved through a double-smoothing approach that is applied to the local (at each data source) and… ▽ More

    Submitted 22 August, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

  33. arXiv:2110.10719  [pdf, other

    stat.ME stat.ML

    Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning

    Authors: Wenzhuo Zhou, Ruoqing Zhu, Annie Qu

    Abstract: Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions. However, the practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. Many mHealth applications involve decision-making with large numbers of interve… ▽ More

    Submitted 18 October, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

  34. arXiv:2109.05640  [pdf, other

    stat.ME math.ST

    High-Dimensional Quantile Regression: Convolution Smoothing and Concave Regularization

    Authors: Kean Ming Tan, Lan Wang, Wen-Xin Zhou

    Abstract: $\ell_1$-penalized quantile regression is widely used for analyzing high-dimensional data with heterogeneity. It is now recognized that the $\ell_1$-penalty introduces non-negligible estimation bias, while a proper use of concave regularization may lead to estimators with refined convergence rates and oracle properties as the signal strengthens. Although folded concave penalized $M… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: Main text is 27 pages, online supplementary materials are attached after the main text

  35. arXiv:2107.10118  [pdf, other

    stat.AP q-bio.PE

    Tracking the Transmission Dynamics of COVID-19 with a Time-Varying Coefficient State-Space Model

    Authors: Joshua P. Keller, Tianjian Zhou, Andee Kaplan, G. Brooke Anderson, Wen Zhou

    Abstract: The spread of COVID-19 has been greatly impacted by regulatory policies and behavior patterns that vary across counties, states, and countries. Population-level dynamics of COVID-19 can generally be described using a set of ordinary differential equations, but these deterministic equations are insufficient for modeling the observed case rates, which can vary due to local testing and case reporting… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  36. arXiv:2107.02726  [pdf, other

    stat.ME

    Distributed Adaptive Huber Regression

    Authors: Jiyu Luo, Qiang Sun, Wenxin Zhou

    Abstract: Distributed data naturally arise in scenarios involving multiple sources of observations, each stored at a different location. Directly pooling all the data together is often prohibited due to limited bandwidth and storage, or due to privacy protocols. This paper introduces a new robust distributed algorithm for fitting linear regressions when data are subject to heavy-tailed and/or asymmetric err… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: 29 pages

  37. arXiv:2107.00109  [pdf, other

    stat.ME math.ST

    Adaptive Capped Least Squares

    Authors: Qiang Sun, Rui Mao, Wen-Xin Zhou

    Abstract: This paper proposes the capped least squares regression with an adaptive resistance parameter, hence the name, adaptive capped least squares regression. The key observation is, by taking the resistant parameter to be data dependent, the proposed estimator achieves full asymptotic efficiency without losing the resistance property: it achieves the maximum breakdown point asymptotically. Computationa… ▽ More

    Submitted 30 June, 2021; originally announced July 2021.

  38. arXiv:2106.14324  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Learning stochastic object models from medical imaging measurements by use of advanced ambient generative adversarial networks

    Authors: Weimin Zhou, Sayantan Bhadra, Frank J. Brooks, Hua Li, Mark A. Anastasio

    Abstract: Purpose: To objectively assess new medical imaging technologies via computer-simulations, it is important to account for the variability in the ensemble of objects to be imaged. This source of variability can be described by stochastic object models (SOMs). It is generally desirable to establish SOMs from experimental imaging measurements acquired by use of a well-characterized imaging system, but… ▽ More

    Submitted 27 February, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

    Comments: Journal of Medical Imaging

    Journal ref: J. Med. Imag. 9(1), 015503 (2022)

  39. arXiv:2104.09090  [pdf, ps, other

    stat.ME

    Semiparametric Marginal Regression for Clustered Competing Risks Data with Missing Cause of Failure

    Authors: Wenxian Zhou, Giorgos Bakoyannis, Ying Zhang, Constantin T. Yiannoutsos

    Abstract: Clustered competing risks data are commonly encountered in multicenter studies. The analysis of such data is often complicated due to informative cluster size, a situation where the outcomes under study are associated with the size of the cluster. In addition, cause of failure is frequently incompletely observed in real-world settings. To the best of our knowledge, there is no methodology for popu… ▽ More

    Submitted 22 April, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

  40. arXiv:2103.00674  [pdf, other

    stat.ME cs.LG math.ST stat.AP stat.ML

    BEAUTY Powered BEAST

    Authors: Kai Zhang, Zhigen Zhao, Wen Zhou

    Abstract: We study distribution-free goodness-of-fit tests with the proposed Binary Expansion Approximation of UniformiTY (BEAUTY) approach. This method generalizes the renowned Euler's formula, and approximates the characteristic function of any copula through a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many importa… ▽ More

    Submitted 16 October, 2023; v1 submitted 28 February, 2021; originally announced March 2021.

  41. arXiv:2101.01908  [pdf, ps, other

    math.ST stat.ME

    Factor Modelling for Clustering High-dimensional Time Series

    Authors: Bo Zhang, Guangming Pan, Qiwei Yao, Wang Zhou

    Abstract: We propose a new unsupervised learning method for clustering a large number of time series based on a latent factor structure. Each cluster is characterized by its own cluster-specific factors in addition to some common factors which impact on all the time series concerned. Our setting also offers the flexibility that some time series may not belong to any clusters. The consistency with explicit c… ▽ More

    Submitted 8 September, 2022; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: 13 figures, 12 Tables

  42. arXiv:2012.05187  [pdf, other

    math.ST stat.ME

    Smoothed Quantile Regression with Large-Scale Inference

    Authors: Xuming He, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou

    Abstract: Quantile regression is a powerful tool for learning the relationship between a response variable and a multivariate predictor while exploring heterogeneous effects. In this paper, we consider statistical inference for quantile regression with large-scale data in the "increasing dimension" regime. We provide a comprehensive and in-depth analysis of a convolution-type smoothing approach that achieve… ▽ More

    Submitted 17 May, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: An R package conquer for fitting smoothed quantile regression is available in CRAN, https://cran.r-project.org/web/packages/conquer/index.html

  43. Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination

    Authors: Tao Zhang, Tianqing Zhu, **g Li, Mengde Han, Wanlei Zhou, Philip S. Yu

    Abstract: A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair. While research is already underway to formalize a machine-learning concept of fairness and to design frameworks for building fair models with sacrifice in accuracy, most are geared toward either supervised or unsupervised learning. Yet two observations inspired us to wonder whether… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    Comments: This paper has been published in IEEE Transactions on Knowledge and Data Engineering

  44. arXiv:2009.06190  [pdf, other

    cs.LG stat.ML

    Fairness Constraints in Semi-supervised Learning

    Authors: Tao Zhang, Tianqing Zhu, Mengde Han, **g Li, Wanlei Zhou, Philip S. Yu

    Abstract: Fairness in machine learning has received considerable attention. However, most studies on fair learning focus on either supervised learning or unsupervised learning. Very few consider semi-supervised settings. Yet, in reality, most machine learning tasks rely on large datasets that contain both labeled and unlabeled data. One of key issues with fair learning is the balance between fairness and ac… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

  45. arXiv:2008.01916  [pdf, other

    cs.CR cs.LG stat.ML

    More Than Privacy: Applying Differential Privacy in Key Areas of Artificial Intelligence

    Authors: Tianqing Zhu, Dayong Ye, Wei Wang, Wanlei Zhou, Philip S. Yu

    Abstract: Artificial Intelligence (AI) has attracted a great deal of attention in recent years. However, alongside all its advancements, problems have also emerged, such as privacy violations, security issues and model fairness. Differential privacy, as a promising mathematical model, has several attractive properties that can help solve these problems, making it quite a valuable tool. For this reason, diff… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Journal ref: IEEE Tranactions on Knowledge and Data Engineering 2020

  46. arXiv:2007.08506  [pdf, other

    cs.LG stat.ML

    SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design

    Authors: Ari Seff, Yaniv Ovadia, Wenda Zhou, Ryan P. Adams

    Abstract: Parametric computer-aided design (CAD) is the dominant paradigm in mechanical engineering for physical design. Distinguished by relational geometry, parametric CAD models begin as two-dimensional sketches consisting of geometric primitives (e.g., line segments, arcs) and explicit constraints between them (e.g., coincidence, perpendicularity) that form the basis for three-dimensional construction o… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  47. arXiv:2007.00169  [pdf, other

    cs.LG stat.ML

    Regularly Updated Deterministic Policy Gradient Algorithm

    Authors: Shuai Han, Wenbo Zhou, Shuai Lü, Jiayu Yu

    Abstract: Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

  48. arXiv:2006.10980  [pdf, other

    cs.LG cs.AI stat.ML

    NROWAN-DQN: A Stable Noisy Network with Noise Reduction and Online Weight Adjustment for Exploration

    Authors: Shuai Han, Wenbo Zhou, **g Liu, Shuai Lü

    Abstract: Deep reinforcement learning has been applied more and more widely nowadays, especially in various complex control tasks. Effective exploration for noisy networks is one of the most important issues in deep reinforcement learning. Noisy networks tend to produce stable outputs for agents. However, this tendency is not always enough to find a stable policy for an agent, which decreases efficiency and… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  49. arXiv:2006.00112  [pdf, other

    eess.SP cs.CV cs.LG stat.ML

    Approximating the Ideal Observer for joint signal detection and localization tasks by use of supervised learning methods

    Authors: Weimin Zhou, Hua Li, Mark A. Anastasio

    Abstract: Medical imaging systems are commonly assessed and optimized by use of objective measures of image quality (IQ). The Ideal Observer (IO) performance has been advocated to provide a figure-of-merit for use in assessing and optimizing imaging systems because the IO sets an upper performance limit among all observers. When joint signal detection and localization tasks are considered, the IO that emplo… ▽ More

    Submitted 14 July, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

    Comments: IEEE Transactions on Medical Imaging (Early Access), 2020

  50. arXiv:2006.00033  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Learning stochastic object models from medical imaging measurements using Progressively-Growing AmbientGANs

    Authors: Weimin Zhou, Sayantan Bhadra, Frank J. Brooks, Hua Li, Mark A. Anastasio

    Abstract: It has been advocated that medical imaging systems and reconstruction algorithms should be assessed and optimized by use of objective measures of image quality that quantify the performance of an observer at specific diagnostic tasks. One important source of variability that can significantly limit observer performance is variation in the objects to-be-imaged. This source of variability can be des… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Submitted to IEEE Transactions on Medical Imaging