Skip to main content

Showing 1–16 of 16 results for author: Bu, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.14661  [pdf, other

    cs.LG stat.ML

    Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

    Authors: Yingyu Lin, Yi-An Ma, Yu-Xiang Wang, Rachel Redberg, Zhiqi Bu

    Abstract: Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,δ)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the… ▽ More

    Submitted 1 May, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  2. arXiv:2305.01794  [pdf, other

    stat.ME cs.LG

    MISNN: Multiple Imputation via Semi-parametric Neural Networks

    Authors: Zhiqi Bu, Zongyu Dai, Yiliang Zhang, Qi Long

    Abstract: Multiple imputation (MI) has been widely applied to missing value problems in biomedical, social and econometric research, in order to avoid improper inference in the downstream data analysis. In the presence of high-dimensional data, imputation models that include feature selection, especially $\ell_1$ regularized regression (such as Lasso, adaptive Lasso, and Elastic Net), are common choices to… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  3. arXiv:2211.13297  [pdf, other

    cs.LG stat.ME

    Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

    Authors: Zongyu Dai, Zhiqi Bu, Qi Long

    Abstract: Missing data are ubiquitous in real world applications and, if not adequately handled, may lead to the loss of information and biased findings in downstream analysis. Particularly, high-dimensional incomplete data with a moderate sample size, such as analysis of multi-omics data, present daunting challenges. Imputation is arguably the most popular method for handling missing data, though existing… ▽ More

    Submitted 21 December, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

  4. arXiv:2207.00306  [pdf, other

    stat.ME stat.ML

    CEDAR: Communication Efficient Distributed Analysis for Regressions

    Authors: Changgee Chang, Zhiqi Bu, Qi Long

    Abstract: Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  5. arXiv:2202.12482  [pdf, other

    stat.ML cs.LG math.ST

    Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity

    Authors: Shiyun Xu, Zhiqi Bu, Pratik Chaudhari, Ian J. Barnett

    Abstract: Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art accuracy among the large family of generalized additive models. In order to empower NAM with feature selection and improve the generalization, we propose the sparse… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  6. arXiv:2112.11507  [pdf, other

    cs.LG stat.AP

    Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems

    Authors: Zongyu Dai, Zhiqi Bu, Qi Long

    Abstract: Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference. In this work, we propose Multiple Imputation via G… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

  7. arXiv:2107.08461  [pdf, other

    cs.LG cs.CR stat.ML

    Differentially Private Bayesian Neural Networks on Accuracy, Privacy and Reliability

    Authors: Qiyiwen Zhang, Zhiqi Bu, Kan Chen, Qi Long

    Abstract: Bayesian neural network (BNN) allows for uncertainty quantification in prediction, offering an advantage over regular neural networks that has not been explored in the differential privacy (DP) framework. We fill this important gap by leveraging recent development in Bayesian deep learning and privacy accounting to offer a more precise analysis of the trade-off between privacy and accuracy in BNN.… ▽ More

    Submitted 18 February, 2023; v1 submitted 18 July, 2021; originally announced July 2021.

  8. arXiv:2107.01266  [pdf, other

    math.ST stat.ME

    Asymptotic Statistical Analysis of Sparse Group LASSO via Approximate Message Passing Algorithm

    Authors: Kan Chen, Zhiqi Bu, Shiyun Xu

    Abstract: Sparse Group LASSO (SGL) is a regularized model for high-dimensional linear regression problems with grouped covariates. SGL applies $l_1$ and $l_2$ penalties on the individual predictors and group predictors, respectively, to guarantee sparse effects both on the inter-group and within-group levels. In this paper, we apply the approximate message passing (AMP) algorithm to efficiently solve the SG… ▽ More

    Submitted 21 February, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Journal ref: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 2021

  9. arXiv:2106.11767  [pdf, other

    cs.CR cs.LG math.ST stat.ML

    Privacy Amplification via Iteration for Shuffled and Online PNSGD

    Authors: Matteo Sordello, Zhiqi Bu, **shuo Dong

    Abstract: In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

  10. arXiv:2106.07830  [pdf, other

    cs.LG stat.ML

    On the Convergence and Calibration of Deep Learning with Differential Privacy

    Authors: Zhiqi Bu, Hua Wang, Zongyu Dai, Qi Long

    Abstract: Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clip** and the n… ▽ More

    Submitted 19 June, 2023; v1 submitted 14 June, 2021; originally announced June 2021.

  11. arXiv:2105.13302  [pdf, other

    math.ST cs.IT cs.LG eess.SP stat.ML

    Characterizing the SLOPE Trade-off: A Variational Perspective and the Donoho-Tanner Limit

    Authors: Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie J. Su

    Abstract: Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this relatively new regularization technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion… ▽ More

    Submitted 5 June, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

    Journal ref: Annals of Statistics 2022

  12. arXiv:2102.07211  [pdf, other

    stat.ML cs.LG stat.ME

    Efficient Designs of SLOPE Penalty Sequences in Finite Dimension

    Authors: Yiliang Zhang, Zhiqi Bu

    Abstract: In linear regression, SLOPE is a new convex analysis method that generalizes the Lasso via the sorted L1 penalty: larger fitted coefficients are penalized more heavily. This magnitude-dependent regularization requires an input of penalty sequence $λ$, instead of a scalar penalty as in the Lasso case, thus making the design extremely expensive in computation. In this paper, we propose two efficient… ▽ More

    Submitted 12 December, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

    Comments: Accepted to AISTATS 2021

  13. arXiv:2011.00417  [pdf, other

    stat.ML cs.LG

    DebiNet: Debiasing Linear Models with Nonlinear Overparameterized Neural Networks

    Authors: Shiyun Xu, Zhiqi Bu

    Abstract: Recent years have witnessed strong empirical performance of over-parameterized neural networks on various tasks and many advances in the theory, e.g. the universal approximation and provable convergence to global minimum. In this paper, we incorporate over-parameterized neural networks into semi-parametric models to bridge the gap between inference and prediction, especially in the high dimensiona… ▽ More

    Submitted 24 January, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

  14. arXiv:2010.13165  [pdf, other

    cs.LG math.DS math.OC stat.ML

    A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

    Authors: Zhiqi Bu, Shiyun Xu, Kan Chen

    Abstract: When equipped with efficient optimization algorithms, the over-parameterized neural networks have demonstrated high level of performance even though the loss function is non-convex and non-smooth. While many works have been focusing on understanding the loss dynamics by training neural networks with the gradient descent (GD), in this work, we consider a broad class of optimization algorithms that… ▽ More

    Submitted 10 March, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: Accepted to AISTATS 2021

  15. arXiv:1911.11607  [pdf, other

    cs.LG cs.CR stat.ML

    Deep Learning with Gaussian Differential Privacy

    Authors: Zhiqi Bu, **shuo Dong, Qi Long, Weijie J. Su

    Abstract: Deep learning models are often trained on datasets that contain sensitive information such as individuals' shop** transactions, personal contacts, and medical records. An increasingly important line of work therefore has sought to train neural networks subject to privacy constraints that are specified by differential privacy or its divergence-based relaxations. These privacy definitions, however… ▽ More

    Submitted 22 July, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: To appear in Harvard Data Science Review

  16. arXiv:1907.07502  [pdf, other

    stat.ML cs.LG eess.SP math.ST

    Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing

    Authors: Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie Su

    Abstract: SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted l1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this paper, we develop an asymptotically exact characterization of the SLOPE solution u… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.