Skip to main content

Showing 1–50 of 108 results for author: Shen, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.01111  [pdf, other

    cs.LG cs.AI stat.ML

    Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

    Authors: Hao Wang, Zhichao Chen, Yuan Shen, Jiajun Fan, Zhaoran Liu, Degui Yang, Xinggao Liu, Haoxuan Li

    Abstract: Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Code is available at https://anonymous.4open.science/status/ncr-B697

  2. arXiv:2406.10499  [pdf, other

    stat.ME stat.AP

    Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US

    Authors: Fangzhi Luo, Jianbin Tan, Donglan Zhang, Hui Huang, Ye Shen

    Abstract: Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. T… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  3. arXiv:2405.15325  [pdf, other

    cs.LG stat.ML

    On the Identification of Temporally Causal Representation with Instantaneous Dependence

    Authors: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Zhengmao Zhu, Guangyi Chen, Kun Zhang

    Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grou** of the observa… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2404.05118  [pdf, ps, other

    stat.ME stat.AP

    BayesPPDSurv: An R Package for Bayesian Sample Size Determination Using the Power and Normalized Power Prior for Time-To-Event Data

    Authors: Yueqi Shen, Matthew A. Psioda, Joseph G. Ibrahim

    Abstract: The BayesPPDSurv (Bayesian Power Prior Design for Survival Data) R package supports Bayesian power and type I error calculations and model fitting using the power and normalized power priors incorporating historical data with for the analysis of time-to-event outcomes. The package implements the stratified proportional hazards regression model with piecewise constant hazard within each stratum. Th… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  5. arXiv:2404.02453  [pdf, other

    stat.ME math.ST

    Exploring the Connection Between the Normalized Power Prior and Bayesian Hierarchical Models

    Authors: Yueqi Shen, Matthew A. Psioda, Luiz M. Carvalho, Joseph G. Ibrahim

    Abstract: The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as a discounting parameter. When the discounting parameter is modeled as random, the normalized power prior is recommended. Bayesian hierarchical modeling is a widely used method for synthesizing information f… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  6. arXiv:2403.02154  [pdf, other

    stat.ME q-bio.GN q-bio.QM

    Double trouble: Predicting new variant counts across two heterogeneous populations

    Authors: Yunyi Shen, Lorenzo Masoero, Joshua G. Schraiber, Tamara Broderick

    Abstract: Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they migh… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  7. arXiv:2402.17641  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Variational Learning is Effective for Large Deep Networks

    Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon

  8. arXiv:2402.10227  [pdf, other

    cs.LG stat.ML

    Correlational Lagrangian Schrödinger Bridge: Learning Dynamics with Population-Level Regularization

    Authors: Yuning You, Ruida Zhou, Yang Shen

    Abstract: Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics. This task often presents significant challenges when (i) observations are limited to cross-sectional samples (where individual trajectories are inaccessible for learning), and moreover, (ii) the behaviors of individual particles are heterogeneous (especially in bio… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  9. arXiv:2402.04602  [pdf, other

    math.ST cs.IT stat.ME

    Online Quantile Regression

    Authors: Yinan Shen, Dong Xia, Wen-Xin Zhou

    Abstract: This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analys… ▽ More

    Submitted 18 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  10. arXiv:2402.03527  [pdf, other

    stat.ML cs.LG stat.ME

    Consistent Validation for Predictive Methods in Spatial Settings

    Authors: David R. Burt, Yunyi Shen, Tamara Broderick

    Abstract: Spatial prediction tasks are key to weather forecasting, studying air pollution, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where… ▽ More

    Submitted 23 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 52 pages, 14 figures

  11. arXiv:2401.14535  [pdf, other

    cs.LG cs.CV stat.ME

    CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process

    Authors: Guangyi Chen, Yifan Shen, Zhenhao Chen, Xiangchen Song, Yuewen Sun, Weiran Yao, Xiao Liu, Kun Zhang

    Abstract: Identifying the underlying time-delayed latent causal processes in sequential data is vital for gras** temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy… ▽ More

    Submitted 30 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: To appear at ICML 2024, 24 pages

  12. arXiv:2312.12708  [pdf, other

    math.ST stat.ME

    Gradient flows for empirical Bayes in high-dimensional linear models

    Authors: Zhou Fan, Leying Guan, Yandi Shen, Yihong Wu

    Abstract: Empirical Bayes provides a powerful approach to learning and adapting to latent structure in data. Theory and algorithms for empirical Bayes have a rich literature for sequence models, but are less understood in settings where latent variables and data interact through more complex designs. In this work, we study empirical Bayes estimation of an i.i.d. prior in Bayesian linear models, via the nonp… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  13. arXiv:2312.08200  [pdf, other

    cs.LG stat.ML

    SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space

    Authors: Yunchen Li, Zhou Yu, Gaoqi He, Yunhang Shen, Ke Li, Xing Sun, Shaohui Lin

    Abstract: Symmetric positive definite~(SPD) matrices have shown important value and applications in statistics and machine learning, such as FMRI analysis and traffic prediction. Previous works on SPD matrices mostly focus on discriminative models, where predictions are made directly on $E(X|y)$, where $y$ is a vector and $X$ is an SPD matrix. However, these methods are challenging to handle for large-scale… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: AAAI2024

  14. arXiv:2311.11216  [pdf, ps, other

    stat.ME

    Valid Randomization Tests in Inexactly Matched Observational Studies via Iterative Convex Programming

    Authors: Siyu Heng, Yanxin Shen, Pengyun Wang

    Abstract: In causal inference, matching is one of the most widely used methods to mimic a randomized experiment using observational (non-experimental) data. Ideally, treated units are exactly matched with control units for the covariates so that the treatments are as-if randomly assigned within each matched set, and valid randomization tests for treatment effects can then be conducted as in a randomized exp… ▽ More

    Submitted 28 November, 2023; v1 submitted 18 November, 2023; originally announced November 2023.

  15. arXiv:2309.02698  [pdf, ps, other

    math.ST cs.IT stat.ME

    Quantile and pseudo-Huber Tensor Decomposition

    Authors: Yinan Shen, Dong Xia

    Abstract: This paper studies the computational and statistical aspects of quantile and pseudo-Huber tensor decomposition. The integrated investigation of computational and statistical issues of robust tensor decomposition poses challenges due to the non-smooth loss functions. We propose a projected sub-gradient descent algorithm for tensor decomposition, equipped with either the pseudo-Huber loss or the qua… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  16. arXiv:2305.18208  [pdf, other

    eess.SP cs.AI cs.LG stat.AP

    A Semi-Supervised Learning Approach for Ranging Error Mitigation Based on UWB Waveform

    Authors: Yuxiao Li, Santiago Mazuelas, Yuan Shen

    Abstract: Localization systems based on ultra-wide band (UWB) measurements can have unsatisfactory performance in harsh environments due to the presence of non-line-of-sight (NLOS) errors. Learning-based methods for error mitigation have shown great performance improvement via directly exploiting the wideband waveform instead of handcrafted features. However, these methods require data samples fully labeled… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, Published in: MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)

    Journal ref: MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM), San Diego, CA, USA, 2021, pp. 533-537

  17. arXiv:2305.18206  [pdf, other

    eess.SP cs.AI cs.LG stat.AP

    Deep Generative Model for Simultaneous Range Error Mitigation and Environment Identification

    Authors: Yuxiao Li, Santiago Mazuelas, Yuan Shen

    Abstract: Received waveforms contain rich information for both range information and environment semantics. However, its full potential is hard to exploit under multipath and non-line-of-sight conditions. This paper proposes a deep generative model (DGM) for simultaneous range error mitigation and environment identification. In particular, we present a Bayesian model for the generative process of the receiv… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 6 pages, 5 figures, Published in: 2021 IEEE Global Communications Conference (GLOBECOM)

    Journal ref: 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 2021, pp. 1-6

  18. Deep GEM-Based Network for Weakly Supervised UWB Ranging Error Mitigation

    Authors: Yuxiao Li, Santiago Mazuelas, Yuan Shen

    Abstract: Ultra-wideband (UWB)-based techniques, while becoming mainstream approaches for high-accurate positioning, tend to be challenged by ranging bias in harsh environments. The emerging learning-based methods for error mitigation have shown great performance improvement via exploiting high semantic features from raw data. However, these methods rely heavily on fully labeled data, leading to a high cost… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 6 pages, 4 figures, Published in: MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM)

    Journal ref: MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM), San Diego, CA, USA, 2021, pp. 528-532

  19. arXiv:2305.06199  [pdf, ps, other

    math.ST cs.IT stat.ME stat.ML

    Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression

    Authors: Yinan Shen, **gyang Li, Jian-Feng Cai, Dong Xia

    Abstract: High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed,… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: This manuscript supersedes an earlier one (arXiv:2203.00953). Two manuscripts share around 60% contents. There will be no further update for the earlier manuscript

  20. arXiv:2302.14230  [pdf, other

    stat.ME stat.AP

    Optimal Priors for the Discounting Parameter of the Normalized Power Prior

    Authors: Yueqi Shen, Luiz M. Carvalho, Matthew A. Psioda, Joseph G. Ibrahim

    Abstract: The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as discounting parameter. When the discounting parameter is modelled as random, the normalized power prior is recommended. In this work, we prove that the marginal posterior for the discounting parameter for g… ▽ More

    Submitted 8 April, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

  21. arXiv:2302.09217  [pdf, other

    q-bio.QM stat.AP

    Identify local limiting factors of species distribution using min-linear logistic regression

    Authors: Hongliang Bu, Yunyi Shen

    Abstract: Logistic regression is a commonly used building block in ecological modeling, but its additive structure among environmental predictors often assumes compensatory relationships between predictors, which can lead to problematic results. In reality, the distribution of species is often determined by the least-favored factor, according to von Liebig's Law of the Minimum, which is not addressed in mod… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  22. arXiv:2302.02488  [pdf, other

    stat.AP q-bio.PE

    A three-state coupled Markov switching model for COVID-19 outbreaks across Quebec based on hospital admissions

    Authors: Dirk Douwes-Schultz, Alexandra M. Schmidt, Yannan Shen, David Buckeridge

    Abstract: Recurrent COVID-19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian three-state coupled Markov switching model to analyze COVID-19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area, we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of th… ▽ More

    Submitted 8 December, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: First revision

  23. arXiv:2302.01186  [pdf, other

    cs.LG eess.SP math.OC stat.ML

    The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

    Authors: Xingyu Xu, Yandi Shen, Yuejie Chi, Cong Ma

    Abstract: We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning t… ▽ More

    Submitted 6 November, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: New analysis in the noisy and the approximately low-rank settings

  24. arXiv:2212.14580  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Heterogeneous Synthetic Learner for Panel Data

    Authors: Ye Shen, Runzhe Wan, Hengrui Cai, Rui Song

    Abstract: In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on th… ▽ More

    Submitted 29 January, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

  25. arXiv:2211.12692  [pdf, other

    math.ST stat.ME

    Empirical Bayes estimation: When does $g$-modeling beat $f$-modeling in theory (and in practice)?

    Authors: Yandi Shen, Yihong Wu

    Abstract: Empirical Bayes (EB) is a popular framework for large-scale inference that aims to find data-driven estimators to compete with the Bayesian oracle that knows the true prior. Two principled approaches to EB estimation have emerged over the years: $f$-modeling, which constructs an approximate Bayes rule by estimating the marginal distribution of the data, and $g$-modeling, which estimates the prior… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  26. arXiv:2211.03054  [pdf, other

    stat.ML cs.LG

    The Importance of Suppressing Complete Reconstruction in Autoencoders for Unsupervised Outlier Detection

    Authors: Yafei Shen, Ling Yang

    Abstract: Autoencoders are widely used in outlier detection due to their superiority in handling high-dimensional and nonlinear datasets. The reconstruction of any dataset by the autoencoder can be considered as a complex regression process. In regression analysis, outliers can usually be divided into high leverage points and influential points. Although the autoencoder has shown good results for the identi… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

  27. arXiv:2211.00463  [pdf, other

    cs.CR stat.ML

    Amplifying Membership Exposure via Data Poisoning

    Authors: Yufei Chen, Chao Shen, Yun Shen, Cong Wang, Yang Zhang

    Abstract: As in-the-wild data are increasingly involved in the training stage, machine learning applications become more susceptible to data poisoning attacks. Such attacks typically lead to test-time accuracy degradation or controlled misprediction. In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples. To this en… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: To Appear in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  28. arXiv:2210.15575  [pdf, other

    cs.LG cs.AI stat.ML

    A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs

    Authors: Hans Hao-Hsun Hsu, Yuesong Shen, Daniel Cremers

    Abstract: Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration e… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Presented at NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022)

  29. Use of Non-concurrent Common Control in Master Protocols in Oncology Trials: Report of an American Statistical Association Biopharmaceutical Section Open Forum Discussion

    Authors: Rajeshwari Sridhara, Olga Marchenko, Qi Jiang, Richard Pazdur, Martin Posch, Scott Berry, Marc Theoret, Yuan Li Shen, Thomas Gwise, Lorenzo Hess, Andrew Raven, Khadija Rantell, Kit Roes, Richard Simon, Mary Redman, Yuan Ji, Cindy Lu

    Abstract: This article summarizes the discussions from the American Statistical Association (ASA) Biopharmaceutical (BIOP) Section Open Forum that took place on December 10, 2020 and was organized by the ASA BIOP Statistical Methods in Oncology Scientific Working Group, in coordination with the US FDA Oncology Center of Excellence. Diverse stakeholders including experts from international regulatory agencie… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    MSC Class: 62P10

    Journal ref: Statistics in Biopharmaceutical Research 14.3 (2022): 353-357

  30. arXiv:2209.04356  [pdf, other

    cs.LG stat.ME

    Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health

    Authors: Yi Shen, Jessilyn Dunn, Michael M. Zavlanos

    Abstract: In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presenc… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  31. arXiv:2209.02838  [pdf, other

    cs.LG cs.GT stat.ML

    A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

    Authors: Zifan Wang, Yi Shen, Zachary I. Bell, Scott Nivison, Michael M. Zavlanos, Karl H. Johansson

    Abstract: We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their action… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  32. arXiv:2207.07020  [pdf, other

    stat.ME

    Estimating sparse direct effects in multivariate regression with the spike-and-slab LASSO

    Authors: Yunyi Shen, Claudia Solís-Lemus, Sameer K. Deshpande

    Abstract: The multivariate regression interpretation of the Gaussian chain graph model simultaneously parametrizes (i) the direct effects of $p$ predictors on $q$ outcomes and (ii) the residual partial covariances between pairs of outcomes. We introduce a new method for fitting sparse Gaussian chain graph models with spike-and-slab LASSO (SSL) priors. We develop an Expectation Conditional Maximization algor… ▽ More

    Submitted 26 March, 2024; v1 submitted 14 July, 2022; originally announced July 2022.

  33. Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League

    Authors: Yuesen Li, Shouxin Zong, Yanfei Shen, Zhiqiang Pu, Miguel-Ángel Gómez, Yixiong Cui

    Abstract: Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a rece… ▽ More

    Submitted 7 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: 40 pages, 5 figures, already published on Journal of Sports Sciences

    ACM Class: I.2.1

  34. arXiv:2204.06963  [pdf, other

    cs.LG cs.CR stat.ML

    Finding MNEMON: Reviving Memories of Node Embeddings

    Authors: Yun Shen, Yufei Han, Zhikun Zhang, Min Chen, Ting Yu, Michael Backes, Yang Zhang, Gianluca Stringhini

    Abstract: Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In th… ▽ More

    Submitted 29 April, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: To Appear in the 29th ACM Conference on Computer and Communications Security (CCS), November 7-11, 2022

  35. arXiv:2203.00953  [pdf, ps, other

    math.ST cs.IT stat.ME stat.ML

    Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation

    Authors: Yinan Shen, **gyang Li, Jian-Feng Cai, Dong Xia

    Abstract: Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to del… ▽ More

    Submitted 10 May, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: This manuscript is superseded by the new one (arXiv:2305.06199). There will be no further update of this manuscript and it will not be submitted for publications

  36. arXiv:2202.10589  [pdf, other

    stat.ML cs.LG

    Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

    Authors: Chengchun Shi, ** Zhu, Ye Shen, Shikai Luo, Hongtu Zhu, Rui Song

    Abstract: This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In th… ▽ More

    Submitted 3 November, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

  37. arXiv:2112.14616  [pdf, other

    stat.AP

    BayesPPD: An R Package for Bayesian Sample Size Determination Using the Power and Normalized Power Prior for Generalized Linear Models

    Authors: Yueqi Shen, Matthew A. Psioda, Joseph G. Ibrahim

    Abstract: The R package BayesPPD (Bayesian Power Prior Design) supports Bayesian power and type I error calculation and model fitting after incorporating historical data with the power prior and the normalized power prior for generalized linear models (GLM). The package accommodates summary level data or subject level data with covariate information. It supports use of multiple historical datasets as well a… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 28 pages, 1 figure

  38. arXiv:2111.07041  [pdf, other

    math.ST stat.ML

    Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model: A new take on Robust Interpolation

    Authors: Stanislav Minsker, Mohamed Ndaoud, Yiqiu Shen

    Abstract: We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model in high dimensions and in the non-asymptotic setting. We first derive a lower and a matching upper bound for the minimax risk of clustering in this framework. We also show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the mini… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

  39. arXiv:2110.15501  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

    Authors: Ye Shen, Hengrui Cai, Rui Song

    Abstract: Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instruction on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a pro… ▽ More

    Submitted 28 January, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

  40. arXiv:2110.03874  [pdf, other

    math.ST stat.ML

    Uncertainty quantification in the Bradley-Terry-Luce model

    Authors: Chao Gao, Yandi Shen, Anderson Y. Zhang

    Abstract: The Bradley-Terry-Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimator… ▽ More

    Submitted 9 August, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  41. arXiv:2110.02631  [pdf, other

    cs.CR cs.LG stat.ML

    Inference Attacks Against Graph Neural Networks

    Authors: Zhikun Zhang, Min Chen, Michael Backes, Yun Shen, Yang Zhang

    Abstract: Graph is an important data representation ubiquitously existing in the real world. However, analyzing the graph data is computationally difficult due to its non-Euclidean nature. Graph embedding is a powerful tool to solve the graph analytics problem by transforming the graph data into low-dimensional vectors. These vectors could also be shared with third parties to gain additional insights of wha… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: 19 pages, 18 figures. To Appear in the 31st USENIX Security Symposium

  42. arXiv:2109.10237  [pdf, other

    stat.ME

    An Empirical Bayes Robust Meta-Analytical-Predictive Prior to Adaptively Leverage External Data

    Authors: Hongtao Zhang, Yueqi Shen, Alan Y Chiang, Judy Li

    Abstract: We propose a novel empirical Bayes robust MAP (EB-rMAP) prior to adaptively leverage external/historical data. Built on Box's prior predictive p-value, the EB-rMAP prior framework balances between model parsimony and flexibility through a tuning parameter. The proposed framework can be applied to binary, normal, and time-to-event endpoints. Computational aspects of the framework are efficient. Sim… ▽ More

    Submitted 7 December, 2021; v1 submitted 21 September, 2021; originally announced September 2021.

  43. arXiv:2109.09264  [pdf, other

    cs.LG stat.ML

    Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection

    Authors: Yihang Shen, Carl Kingsford

    Abstract: Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, develo** effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of e… ▽ More

    Submitted 12 February, 2024; v1 submitted 19 September, 2021; originally announced September 2021.

    Comments: This work has already been accepted in AutoML 2023

  44. arXiv:2107.13763  [pdf, other

    stat.AP

    CARlasso: An R package for the estimation of sparse microbial networks with predictors

    Authors: Yunyi Shen, Claudia Solis-Lemus

    Abstract: Microbiome data analyses require statistical tools that can simultaneously decode microbes' reactions to the environment and interactions among microbes. We introduce CARlasso, the first user-friendly open-source and publicly available R package to fit a chain graph model for the inference of sparse microbial networks that represent both interactions among nodes and effects of a set of predictors.… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  45. arXiv:2107.13059  [pdf, other

    cs.LG stat.ML

    Explicit Pairwise Factorized Graph Neural Network for Semi-Supervised Node Classification

    Authors: Yu Wang, Yuesong Shen, Daniel Cremers

    Abstract: Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations,… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  46. arXiv:2107.05143  [pdf, other

    math.ST stat.ML

    Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning

    Authors: Pierre C Bellec, Yiwei Shen

    Abstract: This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formula… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

  47. arXiv:2107.03826  [pdf, other

    math.ST stat.ML

    Asymptotic normality of robust $M$-estimators with convex penalty

    Authors: Pierre C Bellec, Yiwei Shen, Cun-Hui Zhang

    Abstract: This paper develops asymptotic normality results for individual coordinates of robust M-estimators with convex penalty in high-dimensions, where the dimension $p$ is at most of the same order as the sample size $n$, i.e, $p/n\leγ$ for some fixed constant $γ>0$. The asymptotic normality requires a bias correction and holds for most coordinates of the M-estimator for a large class of loss functions… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

  48. arXiv:2107.01306  [pdf, other

    stat.ME math.ST

    The Effect of the Prior and the Experimental Design on the Inference of the Precision Matrix in Gaussian Chain Graph Models

    Authors: Yunyi Shen, Claudia Solis-Lemus

    Abstract: Here, we investigate whether (and how) experimental design could aid in the estimation of the precision matrix in a Gaussian chain graph model, especially the interplay between the design, the effect of the experiment and prior knowledge about the effect. Estimation of the precision matrix is a fundamental task to infer biological graphical structures like microbial networks. We compare the margin… ▽ More

    Submitted 29 November, 2023; v1 submitted 2 July, 2021; originally announced July 2021.

  49. arXiv:2106.08441  [pdf, other

    cs.LG stat.ML

    Online Learning with Uncertain Feedback Graphs

    Authors: Pouya M Ghari, Yanning Shen

    Abstract: Online learning with expert advice is widely used in various machine learning tasks. It considers the problem where a learner chooses one from a set of experts to take advice and make a decision. In many learning problems, experts may be related, henceforth the learner can observe the losses associated with a subset of experts that are related to the chosen one. In this context, the relationship a… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  50. arXiv:2106.06333  [pdf, other

    cs.LG stat.ML

    Invariant Information Bottleneck for Domain Generalization

    Authors: Bo Li, Yifei Shen, Yezhen Wang, Wenzhen Zhu, Colorado J. Reed, Jun Zhang, Dongsheng Li, Kurt Keutzer, Han Zhao

    Abstract: Invariant risk minimization (IRM) has recently emerged as a promising alternative for domain generalization. Nevertheless, the loss function is difficult to optimize for nonlinear classifiers and the original optimization objective could fail when pseudo-invariant features and geometric skews exist. Inspired by IRM, in this paper we propose a novel formulation for domain generalization, dubbed inv… ▽ More

    Submitted 21 March, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: AAAI 2022