-
Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation
Authors:
Hao Wang,
Zhichao Chen,
Yuan Shen,
Jiajun Fan,
Zhaoran Liu,
Degui Yang,
Xinggao Liu,
Haoxuan Li
Abstract:
Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In…
▽ More
Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-aware Counterfactual Regression (PCR) to exploit proximity for representation balancing within the HTE estimation context. Specifically, we introduce a local proximity preservation regularizer based on optimal transport to depict the local proximity in discrepancy calculation. Furthermore, to overcome the curse of dimensionality that renders the estimation of discrepancy ineffective, exacerbated by limited data availability for HTE estimation, we develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that PCR accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://anonymous.4open.science/status/ncr-B697.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US
Authors:
Fangzhi Luo,
Jianbin Tan,
Donglan Zhang,
Hui Huang,
Ye Shen
Abstract:
Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. T…
▽ More
Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. To fill this gap, we propose a novel clustering method for SDOH -- stroke mortality associations in the US counties. To enhance interpretability and statistical efficiency of the clustering outcomes, we introduce a new class of smoothness-sparsity pursued penalties for simultaneous clustering and variable selection in the longitudinal associations. As a result, we can identify important SDOH that contribute to longitudinal changes in the stroke mortality, facilitating clustering of US counties into several regions based on how these SDOH relate to stroke mortality. The effectiveness of our proposed method is demonstrated through extensive numerical studies. By applying our method to a county-level SDOH and stroke mortality longitudinal data, we identify 18 important SDOH for stroke mortality and divide the US counties into two clusters based on these selected SDOH. Our findings unveil complex regional heterogeneity in the longitudinal associations between SDOH and stroke mortality, providing valuable insights in region-specific SDOH adjustments for mitigating stroke mortality.
△ Less
Submitted 20 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
On the Identification of Temporally Causal Representation with Instantaneous Dependence
Authors:
Zijian Li,
Yifan Shen,
Kaitao Zheng,
Ruichu Cai,
Xiangchen Song,
Mingming Gong,
Zhengmao Zhu,
Guangyi Chen,
Kun Zhang
Abstract:
Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grou** of the observa…
▽ More
Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grou** of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.
△ Less
Submitted 7 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
BayesPPDSurv: An R Package for Bayesian Sample Size Determination Using the Power and Normalized Power Prior for Time-To-Event Data
Authors:
Yueqi Shen,
Matthew A. Psioda,
Joseph G. Ibrahim
Abstract:
The BayesPPDSurv (Bayesian Power Prior Design for Survival Data) R package supports Bayesian power and type I error calculations and model fitting using the power and normalized power priors incorporating historical data with for the analysis of time-to-event outcomes. The package implements the stratified proportional hazards regression model with piecewise constant hazard within each stratum. Th…
▽ More
The BayesPPDSurv (Bayesian Power Prior Design for Survival Data) R package supports Bayesian power and type I error calculations and model fitting using the power and normalized power priors incorporating historical data with for the analysis of time-to-event outcomes. The package implements the stratified proportional hazards regression model with piecewise constant hazard within each stratum. The package allows the historical data to inform the treatment effect parameter, parameter effects for other covariates in the regression model, as well as the baseline hazard parameters. The use of multiple historical datasets is supported. A novel algorithm is developed for computationally efficient use of the normalized power prior. In addition, the package supports the use of arbitrary sampling priors for computing Bayesian power and type I error rates, and has built-in features that semi-automatically generate sampling priors from the historical data. We demonstrate the use of BayesPPDSurv in a comprehensive case study for a melanoma clinical trial design.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Exploring the Connection Between the Normalized Power Prior and Bayesian Hierarchical Models
Authors:
Yueqi Shen,
Matthew A. Psioda,
Luiz M. Carvalho,
Joseph G. Ibrahim
Abstract:
The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as a discounting parameter. When the discounting parameter is modeled as random, the normalized power prior is recommended. Bayesian hierarchical modeling is a widely used method for synthesizing information f…
▽ More
The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as a discounting parameter. When the discounting parameter is modeled as random, the normalized power prior is recommended. Bayesian hierarchical modeling is a widely used method for synthesizing information from different sources, including historical data. In this work, we examine the analytical relationship between the normalized power prior (NPP) and Bayesian hierarchical models (BHM) for \emph{i.i.d.} normal data. We establish a direct relationship between the prior for the discounting parameter of the NPP and the prior for the variance parameter of the BHM. Such a relationship is first established for the case of a single historical dataset, and then extended to the case with multiple historical datasets with dataset-specific discounting parameters. For multiple historical datasets, we develop and establish theory for the BHM-matching NPP (BNPP) which establishes dependence between the dataset-specific discounting parameters leading to inferences that are identical to the BHM. Establishing this relationship not only justifies the NPP from the perspective of hierarchical modeling, but also provides insight on prior elicitation for the NPP. We present strategies on inducing priors on the discounting parameter based on hierarchical models, and investigate the borrowing properties of the BNPP.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Double trouble: Predicting new variant counts across two heterogeneous populations
Authors:
Yunyi Shen,
Lorenzo Masoero,
Joshua G. Schraiber,
Tamara Broderick
Abstract:
Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they migh…
▽ More
Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we show that these predictions can fare poorly across multiple populations that are heterogeneous. We prove that, surprisingly, a natural extension of a state-of-the-art single-population predictor to multiple populations fails for fundamental reasons. We provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Variational Learning is Effective for Large Deep Networks
Authors:
Yuesong Shen,
Nico Daheim,
Bai Cong,
Peter Nickl,
Gian Maria Marconi,
Clement Bazan,
Rio Yokota,
Iryna Gurevych,
Daniel Cremers,
Mohammad Emtiyaz Khan,
Thomas Möllenhoff
Abstract:
We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint…
▽ More
We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective.
△ Less
Submitted 6 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Correlational Lagrangian Schrödinger Bridge: Learning Dynamics with Population-Level Regularization
Authors:
Yuning You,
Ruida Zhou,
Yang Shen
Abstract:
Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics. This task often presents significant challenges when (i) observations are limited to cross-sectional samples (where individual trajectories are inaccessible for learning), and moreover, (ii) the behaviors of individual particles are heterogeneous (especially in bio…
▽ More
Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics. This task often presents significant challenges when (i) observations are limited to cross-sectional samples (where individual trajectories are inaccessible for learning), and moreover, (ii) the behaviors of individual particles are heterogeneous (especially in biological systems due to biodiversity). To address them, we introduce a novel framework dubbed correlational Lagrangian Schrödinger bridge (CLSB), aiming to seek for the evolution "bridging" among cross-sectional observations, while regularized for the minimal population "cost". In contrast to prior methods relying on \textit{individual}-level regularizers for all particles \textit{homogeneously} (e.g. restraining individual motions), CLSB operates at the population level admitting the heterogeneity nature, resulting in a more generalizable modeling in practice. To this end, our contributions include (1) a new class of population regularizers capturing the temporal variations in multivariate relations, with the tractable formulation derived, (2) three domain-informed instantiations based on genetic co-expression stability, and (3) an integration of population regularizers into data-driven generative models as constrained optimization, and a numerical solution, with further extension to conditional generative models. Empirically, we demonstrate the superiority of CLSB in single-cell sequencing data analyses such as simulating cell development over time and predicting cellular responses to drugs of varied doses.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Online Quantile Regression
Authors:
Yinan Shen,
Dong Xia,
Wen-Xin Zhou
Abstract:
This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analys…
▽ More
This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analysis, we unveil the delicate interplay between updating iterates based on individual observations versus batches of observations, revealing distinct regularity properties in each scenario. Our method ensures long-term optimal estimation irrespective of the chosen update strategy. Importantly, our contributions go beyond prior works by achieving exponential-type concentration inequalities and attaining optimal regret and error rates that exhibit only \textsf{ short-term} sensitivity to initial errors. A key insight from our study is the delicate statistical analyses and the revelation that appropriate stepsize schemes significantly mitigate the impact of initial errors on subsequent errors and regrets. This underscores the robustness of stochastic sub-gradient descent in handling initial uncertainties, emphasizing its efficacy in scenarios where the sequential arrival of data introduces uncertainties regarding both the horizon and the total number of observations. Additionally, when the initial error rate is well-controlled, there is a trade-off between short-term error rate and long-term optimality. Due to the lack of delicate statistical analysis for squared loss, we also briefly discuss its properties and proper schemes. Extensive simulations support our theoretical findings.
△ Less
Submitted 18 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Consistent Validation for Predictive Methods in Spatial Settings
Authors:
David R. Burt,
Yunyi Shen,
Tamara Broderick
Abstract:
Spatial prediction tasks are key to weather forecasting, studying air pollution, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where…
▽ More
Spatial prediction tasks are key to weather forecasting, studying air pollution, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where we want to make predictions. This mismatch is often not an instance of covariate shift (as commonly formalized) because the validation and test locations are fixed (e.g., on a grid or at select points) rather than i.i.d. from two distributions. In the present work, we formalize a check on validation methods: that they become arbitrarily accurate as validation data becomes arbitrarily dense. We show that classical and covariate-shift methods can fail this check. We instead propose a method that builds from existing ideas in the covariate-shift literature, but adapts them to the validation data at hand. We prove that our proposal passes our check. And we demonstrate its advantages empirically on simulated and real data.
△ Less
Submitted 23 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process
Authors:
Guangyi Chen,
Yifan Shen,
Zhenhao Chen,
Xiangchen Song,
Yuewen Sun,
Weiran Yao,
Xiao Liu,
Kun Zhang
Abstract:
Identifying the underlying time-delayed latent causal processes in sequential data is vital for gras** temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy…
▽ More
Identifying the underlying time-delayed latent causal processes in sequential data is vital for gras** temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy in real-world applications containing information loss. For instance, the visual perception process translates a 3D space into 2D images, or the phenomenon of persistence of vision incorporates historical data into current perceptions. To address this challenge, we establish an identifiability theory that allows for the recovery of independent latent components even when they come from a nonlinear and non-invertible mix. Using this theory as a foundation, we propose a principled approach, CaRiNG, to learn the CAusal RepresentatIon of Non-invertible Generative temporal data with identifiability guarantees. Specifically, we utilize temporal context to recover lost latent information and apply the conditions in our theory to guide the training process. Through experiments conducted on synthetic datasets, we validate that our CaRiNG method reliably identifies the causal process, even when the generation process is non-invertible. Moreover, we demonstrate that our approach considerably improves temporal understanding and reasoning in practical applications.
△ Less
Submitted 30 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Gradient flows for empirical Bayes in high-dimensional linear models
Authors:
Zhou Fan,
Leying Guan,
Yandi Shen,
Yihong Wu
Abstract:
Empirical Bayes provides a powerful approach to learning and adapting to latent structure in data. Theory and algorithms for empirical Bayes have a rich literature for sequence models, but are less understood in settings where latent variables and data interact through more complex designs. In this work, we study empirical Bayes estimation of an i.i.d. prior in Bayesian linear models, via the nonp…
▽ More
Empirical Bayes provides a powerful approach to learning and adapting to latent structure in data. Theory and algorithms for empirical Bayes have a rich literature for sequence models, but are less understood in settings where latent variables and data interact through more complex designs. In this work, we study empirical Bayes estimation of an i.i.d. prior in Bayesian linear models, via the nonparametric maximum likelihood estimator (NPMLE). We introduce and study a system of gradient flow equations for optimizing the marginal log-likelihood, jointly over the prior and posterior measures in its Gibbs variational representation using a smoothed reparametrization of the regression coefficients. A diffusion-based implementation yields a Langevin dynamics MCEM algorithm, where the prior law evolves continuously over time to optimize a sequence-model log-likelihood defined by the coordinates of the current Langevin iterate. We show consistency of the NPMLE as $n, p \rightarrow \infty$ under mild conditions, including settings of random sub-Gaussian designs when $n \asymp p$. In high noise, we prove a uniform log-Sobolev inequality for the mixing of Langevin dynamics, for possibly misspecified priors and non-log-concave posteriors. We then establish polynomial-time convergence of the joint gradient flow to a near-NPMLE if the marginal negative log-likelihood is convex in a sub-level set of the initialization.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space
Authors:
Yunchen Li,
Zhou Yu,
Gaoqi He,
Yunhang Shen,
Ke Li,
Xing Sun,
Shaohui Lin
Abstract:
Symmetric positive definite~(SPD) matrices have shown important value and applications in statistics and machine learning, such as FMRI analysis and traffic prediction. Previous works on SPD matrices mostly focus on discriminative models, where predictions are made directly on $E(X|y)$, where $y$ is a vector and $X$ is an SPD matrix. However, these methods are challenging to handle for large-scale…
▽ More
Symmetric positive definite~(SPD) matrices have shown important value and applications in statistics and machine learning, such as FMRI analysis and traffic prediction. Previous works on SPD matrices mostly focus on discriminative models, where predictions are made directly on $E(X|y)$, where $y$ is a vector and $X$ is an SPD matrix. However, these methods are challenging to handle for large-scale data, as they need to access and process the whole data. In this paper, inspired by denoising diffusion probabilistic model~(DDPM), we propose a novel generative model, termed SPD-DDPM, by introducing Gaussian distribution in the SPD space to estimate $E(X|y)$. Moreover, our model is able to estimate $p(X)$ unconditionally and flexibly without giving $y$. On the one hand, the model conditionally learns $p(X|y)$ and utilizes the mean of samples to obtain $E(X|y)$ as a prediction. On the other hand, the model unconditionally learns the probability distribution of the data $p(X)$ and generates samples that conform to this distribution. Furthermore, we propose a new SPD net which is much deeper than the previous networks and allows for the inclusion of conditional factors. Experiment results on toy data and real taxi data demonstrate that our models effectively fit the data distribution both unconditionally and unconditionally and provide accurate predictions.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Valid Randomization Tests in Inexactly Matched Observational Studies via Iterative Convex Programming
Authors:
Siyu Heng,
Yanxin Shen,
Pengyun Wang
Abstract:
In causal inference, matching is one of the most widely used methods to mimic a randomized experiment using observational (non-experimental) data. Ideally, treated units are exactly matched with control units for the covariates so that the treatments are as-if randomly assigned within each matched set, and valid randomization tests for treatment effects can then be conducted as in a randomized exp…
▽ More
In causal inference, matching is one of the most widely used methods to mimic a randomized experiment using observational (non-experimental) data. Ideally, treated units are exactly matched with control units for the covariates so that the treatments are as-if randomly assigned within each matched set, and valid randomization tests for treatment effects can then be conducted as in a randomized experiment. However, inexact matching typically exists, especially when there are continuous or many observed covariates or when unobserved covariates exist. Previous matched observational studies routinely conducted downstream randomization tests as if matching was exact, as long as the matched datasets satisfied some prespecified balance criteria or passed some balance tests. Some recent studies showed that this routine practice could render a highly inflated type-I error rate of randomization tests, especially when the sample size is large. To handle this problem, we propose an iterative convex programming framework for randomization tests with inexactly matched datasets. Under some commonly used regularity conditions, we show that our approach can produce valid randomization tests (i.e., robustly controlling the type-I error rate) for any inexactly matched datasets, even when unobserved covariates exist. Our framework allows the incorporation of flexible machine learning models to better extract information from covariate imbalance while robustly controlling the type-I error rate.
△ Less
Submitted 28 November, 2023; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Quantile and pseudo-Huber Tensor Decomposition
Authors:
Yinan Shen,
Dong Xia
Abstract:
This paper studies the computational and statistical aspects of quantile and pseudo-Huber tensor decomposition. The integrated investigation of computational and statistical issues of robust tensor decomposition poses challenges due to the non-smooth loss functions. We propose a projected sub-gradient descent algorithm for tensor decomposition, equipped with either the pseudo-Huber loss or the qua…
▽ More
This paper studies the computational and statistical aspects of quantile and pseudo-Huber tensor decomposition. The integrated investigation of computational and statistical issues of robust tensor decomposition poses challenges due to the non-smooth loss functions. We propose a projected sub-gradient descent algorithm for tensor decomposition, equipped with either the pseudo-Huber loss or the quantile loss. In the presence of both heavy-tailed noise and Huber's contamination error, we demonstrate that our algorithm exhibits a so-called phenomenon of two-phase convergence with a carefully chosen step size schedule. The algorithm converges linearly and delivers an estimator that is statistically optimal with respect to both the heavy-tailed noise and arbitrary corruptions. Interestingly, our results achieve the first minimax optimal rates under Huber's contamination model for noisy tensor decomposition. Compared with existing literature, quantile tensor decomposition removes the requirement of specifying a sparsity level in advance, making it more flexible for practical use. We also demonstrate the effectiveness of our algorithms in the presence of missing values. Our methods are subsequently applied to the food balance dataset and the international trade flow dataset, both of which yield intriguing findings.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
A Semi-Supervised Learning Approach for Ranging Error Mitigation Based on UWB Waveform
Authors:
Yuxiao Li,
Santiago Mazuelas,
Yuan Shen
Abstract:
Localization systems based on ultra-wide band (UWB) measurements can have unsatisfactory performance in harsh environments due to the presence of non-line-of-sight (NLOS) errors. Learning-based methods for error mitigation have shown great performance improvement via directly exploiting the wideband waveform instead of handcrafted features. However, these methods require data samples fully labeled…
▽ More
Localization systems based on ultra-wide band (UWB) measurements can have unsatisfactory performance in harsh environments due to the presence of non-line-of-sight (NLOS) errors. Learning-based methods for error mitigation have shown great performance improvement via directly exploiting the wideband waveform instead of handcrafted features. However, these methods require data samples fully labeled with actual measurement errors for training, which leads to time-consuming data collection. In this paper, we propose a semi-supervised learning method based on variational Bayes for UWB ranging error mitigation. Combining deep learning techniques and statistic tools, our method can efficiently accumulate knowledge from both labeled and unlabeled data samples. Extensive experiments illustrate the effectiveness of the proposed method under different supervision rates, and the superiority compared to other fully supervised methods even at a low supervision rate.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Deep Generative Model for Simultaneous Range Error Mitigation and Environment Identification
Authors:
Yuxiao Li,
Santiago Mazuelas,
Yuan Shen
Abstract:
Received waveforms contain rich information for both range information and environment semantics. However, its full potential is hard to exploit under multipath and non-line-of-sight conditions. This paper proposes a deep generative model (DGM) for simultaneous range error mitigation and environment identification. In particular, we present a Bayesian model for the generative process of the receiv…
▽ More
Received waveforms contain rich information for both range information and environment semantics. However, its full potential is hard to exploit under multipath and non-line-of-sight conditions. This paper proposes a deep generative model (DGM) for simultaneous range error mitigation and environment identification. In particular, we present a Bayesian model for the generative process of the received waveform composed by latent variables for both range-related features and environment semantics. The simultaneous range error mitigation and environment identification is interpreted as an inference problem based on the DGM, and implemented in a unique end-to-end learning scheme. Comprehensive experiments on a general Ultra-wideband dataset demonstrate the superior performance on range error mitigation, scalability to different environments, and novel capability on simultaneous environment identification.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Deep GEM-Based Network for Weakly Supervised UWB Ranging Error Mitigation
Authors:
Yuxiao Li,
Santiago Mazuelas,
Yuan Shen
Abstract:
Ultra-wideband (UWB)-based techniques, while becoming mainstream approaches for high-accurate positioning, tend to be challenged by ranging bias in harsh environments. The emerging learning-based methods for error mitigation have shown great performance improvement via exploiting high semantic features from raw data. However, these methods rely heavily on fully labeled data, leading to a high cost…
▽ More
Ultra-wideband (UWB)-based techniques, while becoming mainstream approaches for high-accurate positioning, tend to be challenged by ranging bias in harsh environments. The emerging learning-based methods for error mitigation have shown great performance improvement via exploiting high semantic features from raw data. However, these methods rely heavily on fully labeled data, leading to a high cost for data acquisition. We present a learning framework based on weak supervision for UWB ranging error mitigation. Specifically, we propose a deep learning method based on the generalized expectation-maximization (GEM) algorithm for robust UWB ranging error mitigation under weak supervision. Such method integrate probabilistic modeling into the deep learning scheme, and adopt weakly supervised labels as prior information. Extensive experiments in various supervision scenarios illustrate the superiority of the proposed method.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression
Authors:
Yinan Shen,
**gyang Li,
Jian-Feng Cai,
Dong Xia
Abstract:
High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed,…
▽ More
High-dimensional linear regression under heavy-tailed noise or outlier corruption is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since the robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a projected sub-gradient descent algorithm for both the sparse linear regression and low-rank linear regression problems. The algorithm is not only computationally efficient with linear convergence but also statistically optimal, be the noise Gaussian or heavy-tailed with a finite 1 + epsilon moment. The convergence theory is established for a general framework and its specific applications to absolute loss, Huber loss and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of two-phase convergence. In phase one, the algorithm behaves as in typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator, which is already observed in the existing literature. Interestingly, during phase two, the algorithm converges linearly as if minimizing a smooth and strongly convex objective function, and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Numerical simulations confirm our theoretical discovery and showcase the superiority of our algorithm over prior methods.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Optimal Priors for the Discounting Parameter of the Normalized Power Prior
Authors:
Yueqi Shen,
Luiz M. Carvalho,
Matthew A. Psioda,
Joseph G. Ibrahim
Abstract:
The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as discounting parameter. When the discounting parameter is modelled as random, the normalized power prior is recommended. In this work, we prove that the marginal posterior for the discounting parameter for g…
▽ More
The power prior is a popular class of informative priors for incorporating information from historical data. It involves raising the likelihood for the historical data to a power, which acts as discounting parameter. When the discounting parameter is modelled as random, the normalized power prior is recommended. In this work, we prove that the marginal posterior for the discounting parameter for generalized linear models converges to a point mass at zero if there is any discrepancy between the historical and current data, and that it does not converge to a point mass at one when they are fully compatible. In addition, we explore the construction of optimal priors for the discounting parameter in a normalized power prior. In particular, we are interested in achieving the dual objectives of encouraging borrowing when the historical and current data are compatible and limiting borrowing when they are in conflict. We propose intuitive procedures for eliciting the shape parameters of a beta prior for the discounting parameter based on two minimization criteria, the Kullback-Leibler divergence and the mean squared error. Based on the proposed criteria, the optimal priors derived are often quite different from commonly used priors such as the uniform prior.
△ Less
Submitted 8 April, 2024; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Identify local limiting factors of species distribution using min-linear logistic regression
Authors:
Hongliang Bu,
Yunyi Shen
Abstract:
Logistic regression is a commonly used building block in ecological modeling, but its additive structure among environmental predictors often assumes compensatory relationships between predictors, which can lead to problematic results. In reality, the distribution of species is often determined by the least-favored factor, according to von Liebig's Law of the Minimum, which is not addressed in mod…
▽ More
Logistic regression is a commonly used building block in ecological modeling, but its additive structure among environmental predictors often assumes compensatory relationships between predictors, which can lead to problematic results. In reality, the distribution of species is often determined by the least-favored factor, according to von Liebig's Law of the Minimum, which is not addressed in modeling. To address this issue, we introduced the min-linear logistic regression model, which has a built-in minimum structure of competing factors. In our empirical analysis of the distribution of Asiatic black bears ($\textit{Ursus thibetanus}$), we found that the min-linear model performs well compared to other methods and has several advantages. By using the model, we were able to identify ecologically meaningful limiting factors on bear distribution across the survey area. The model's inherent simplicity and interpretability make it a promising tool for extending into other widely used ecological models.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
A three-state coupled Markov switching model for COVID-19 outbreaks across Quebec based on hospital admissions
Authors:
Dirk Douwes-Schultz,
Alexandra M. Schmidt,
Yannan Shen,
David Buckeridge
Abstract:
Recurrent COVID-19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian three-state coupled Markov switching model to analyze COVID-19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area, we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of th…
▽ More
Recurrent COVID-19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian three-state coupled Markov switching model to analyze COVID-19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area, we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of the smaller areas, endemic and outbreak. Then we assume the disease switches between the three states in each area through a series of coupled nonhomogeneous hidden Markov chains. Unlike previous approaches, the transition probabilities may depend on covariates and the occurrence of outbreaks in neighboring areas, to account for geographical outbreak spread. Additionally, to prevent rapid switching between endemic and outbreak periods we introduce clone states into the model which enforce minimum endemic and outbreak durations. We make some interesting findings, such as that mobility in retail and recreation venues had a positive association with the development and persistence of new COVID-19 outbreaks in Quebec. Based on model comparison our contributions show promise in improving state estimation retrospectively and in real-time, especially when there are smaller areas and highly spatially synchronized outbreaks. Furthermore, our approach offers new and interesting epidemiological interpretations, such as being able to estimate the effect of covariates on disease extinction.
△ Less
Submitted 8 December, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing
Authors:
Xingyu Xu,
Yandi Shen,
Yuejie Chi,
Cong Ma
Abstract:
We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning t…
▽ More
We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. Using overparametrized factor representations, $\textsf{ScaledGD($λ$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and ill-conditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($λ$)}$ is remarkably robust to ill-conditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($λ$)}$ converges to the true low-rank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
△ Less
Submitted 6 November, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Heterogeneous Synthetic Learner for Panel Data
Authors:
Ye Shen,
Runzhe Wan,
Hengrui Cai,
Rui Song
Abstract:
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on th…
▽ More
In the new era of personalization, learning the heterogeneous treatment effect (HTE) becomes an inevitable trend with numerous applications. Yet, most existing HTE estimation methods focus on independently and identically distributed observations and cannot handle the non-stationarity and temporal dependency in the common panel data setting. The treatment evaluators developed for panel data, on the other hand, typically ignore the individualized information. To fill the gap, in this paper, we initialize the study of HTE estimation in panel data. Under different assumptions for HTE identifiability, we propose the corresponding heterogeneous one-side and two-side synthetic learner, namely H1SL and H2SL, by leveraging the state-of-the-art HTE estimator for non-panel data and generalizing the synthetic control method that allows flexible data generating process. We establish the convergence rates of the proposed estimators. The superior performance of the proposed methods over existing ones is demonstrated by extensive numerical studies.
△ Less
Submitted 29 January, 2023; v1 submitted 30 December, 2022;
originally announced December 2022.
-
Empirical Bayes estimation: When does $g$-modeling beat $f$-modeling in theory (and in practice)?
Authors:
Yandi Shen,
Yihong Wu
Abstract:
Empirical Bayes (EB) is a popular framework for large-scale inference that aims to find data-driven estimators to compete with the Bayesian oracle that knows the true prior. Two principled approaches to EB estimation have emerged over the years: $f$-modeling, which constructs an approximate Bayes rule by estimating the marginal distribution of the data, and $g$-modeling, which estimates the prior…
▽ More
Empirical Bayes (EB) is a popular framework for large-scale inference that aims to find data-driven estimators to compete with the Bayesian oracle that knows the true prior. Two principled approaches to EB estimation have emerged over the years: $f$-modeling, which constructs an approximate Bayes rule by estimating the marginal distribution of the data, and $g$-modeling, which estimates the prior from data and then applies the learned Bayes rule. For the Poisson model, the prototypical examples are the celebrated Robbins estimator and the nonparametric MLE (NPMLE), respectively. It has long been recognized in practice that the Robbins estimator, while being conceptually appealing and computationally simple, lacks robustness and can be easily derailed by "outliers" (data points that were rarely observed before), unlike the NPMLE which provides more stable and interpretable fit thanks to its Bayes form. On the other hand, not only do the existing theories shed little light on this phenomenon, but they all point to the opposite, as both methods have recently been shown optimal in terms of the \emph{regret} (excess over the Bayes risk) for compactly supported and subexponential priors with exact logarithmic factors.
In this paper we provide a theoretical justification for the superiority of NPMLE over Robbins for heavy-tailed data by considering priors with bounded $p$th moment previously studied for the Gaussian model. For the Poisson model with sample size $n$, assuming $p>1$ (for otherwise triviality arises), we show that the NPMLE with appropriate regularization and truncation achieves a total regret $\tilde Θ(n^{\frac{3}{2p+1}})$, which is minimax optimal within logarithmic factors. In contrast, the total regret of Robbins estimator (with similar truncation) is $\tildeΘ(n^{\frac{3}{p+2}})$ and hence suboptimal by a polynomial factor.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
The Importance of Suppressing Complete Reconstruction in Autoencoders for Unsupervised Outlier Detection
Authors:
Yafei Shen,
Ling Yang
Abstract:
Autoencoders are widely used in outlier detection due to their superiority in handling high-dimensional and nonlinear datasets. The reconstruction of any dataset by the autoencoder can be considered as a complex regression process. In regression analysis, outliers can usually be divided into high leverage points and influential points. Although the autoencoder has shown good results for the identi…
▽ More
Autoencoders are widely used in outlier detection due to their superiority in handling high-dimensional and nonlinear datasets. The reconstruction of any dataset by the autoencoder can be considered as a complex regression process. In regression analysis, outliers can usually be divided into high leverage points and influential points. Although the autoencoder has shown good results for the identification of influential points, there are still some problems when detect high leverage points. Through theoretical derivation, we found that most outliers are detected in the direction corresponding to the worst-recovered principal component, but in the direction of the well-recovered principal components, the anomalies are often ignored. We propose a new loss function which solve the above deficiencies in outlier detection. The core idea of our scheme is that in order to better detect high leverage points, we should suppress the complete reconstruction of the dataset to convert high leverage points into influential points, and it is also necessary to ensure that the differences between the eigenvalues of the covariance matrix of the original dataset and their corresponding reconstructed results in the direction of each principal component are equal. Besides, we explain the rationality of our scheme through rigorous theoretical derivation. Finally, our experiments on multiple datasets confirm that our scheme significantly improves the accuracy of outlier detection.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Amplifying Membership Exposure via Data Poisoning
Authors:
Yufei Chen,
Chao Shen,
Yun Shen,
Cong Wang,
Yang Zhang
Abstract:
As in-the-wild data are increasingly involved in the training stage, machine learning applications become more susceptible to data poisoning attacks. Such attacks typically lead to test-time accuracy degradation or controlled misprediction. In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples. To this en…
▽ More
As in-the-wild data are increasingly involved in the training stage, machine learning applications become more susceptible to data poisoning attacks. Such attacks typically lead to test-time accuracy degradation or controlled misprediction. In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples. To this end, we demonstrate a set of data poisoning attacks to amplify the membership exposure of the targeted class. We first propose a generic dirty-label attack for supervised classification algorithms. We then propose an optimization-based clean-label attack in the transfer learning scenario, whereby the poisoning samples are correctly labeled and look "natural" to evade human moderation. We extensively evaluate our attacks on computer vision benchmarks. Our results show that the proposed attacks can substantially increase the membership inference precision with minimum overall test-time model performance degradation. To mitigate the potential negative impacts of our attacks, we also investigate feasible countermeasures.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs
Authors:
Hans Hao-Hsun Hsu,
Yuesong Shen,
Daniel Cremers
Abstract:
Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration e…
▽ More
Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Use of Non-concurrent Common Control in Master Protocols in Oncology Trials: Report of an American Statistical Association Biopharmaceutical Section Open Forum Discussion
Authors:
Rajeshwari Sridhara,
Olga Marchenko,
Qi Jiang,
Richard Pazdur,
Martin Posch,
Scott Berry,
Marc Theoret,
Yuan Li Shen,
Thomas Gwise,
Lorenzo Hess,
Andrew Raven,
Khadija Rantell,
Kit Roes,
Richard Simon,
Mary Redman,
Yuan Ji,
Cindy Lu
Abstract:
This article summarizes the discussions from the American Statistical Association (ASA) Biopharmaceutical (BIOP) Section Open Forum that took place on December 10, 2020 and was organized by the ASA BIOP Statistical Methods in Oncology Scientific Working Group, in coordination with the US FDA Oncology Center of Excellence. Diverse stakeholders including experts from international regulatory agencie…
▽ More
This article summarizes the discussions from the American Statistical Association (ASA) Biopharmaceutical (BIOP) Section Open Forum that took place on December 10, 2020 and was organized by the ASA BIOP Statistical Methods in Oncology Scientific Working Group, in coordination with the US FDA Oncology Center of Excellence. Diverse stakeholders including experts from international regulatory agencies, academicians, and representatives of the pharmaceutical industry engaged in a discussion on the use of non-concurrent control in Master Protocols for oncology trials. While the use of non-concurrent control with the concurrent control may increase the power of detecting the therapeutic difference between a treatment and the control, the panelists had diverse opinion on the statistical approaches for modeling non-concurrent and concurrent controls. Some were more concerned about the temporality of the non-concurrent control and bias introduced by different confounders related to time, e.g., changes in standard of care, changes in patient population, changes in recruiting strategies, changes in assessment of endpoints. Nevertheless, in some situations such as when the recruitment is extremely challenging for a rare disease, the panelists concluded that the use of a non-concurrent control can be justified.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health
Authors:
Yi Shen,
Jessilyn Dunn,
Michael M. Zavlanos
Abstract:
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presenc…
▽ More
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
Authors:
Zifan Wang,
Yi Shen,
Zachary I. Bell,
Scott Nivison,
Michael M. Zavlanos,
Karl H. Johansson
Abstract:
We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their action…
▽ More
We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Estimating sparse direct effects in multivariate regression with the spike-and-slab LASSO
Authors:
Yunyi Shen,
Claudia Solís-Lemus,
Sameer K. Deshpande
Abstract:
The multivariate regression interpretation of the Gaussian chain graph model simultaneously parametrizes (i) the direct effects of $p$ predictors on $q$ outcomes and (ii) the residual partial covariances between pairs of outcomes. We introduce a new method for fitting sparse Gaussian chain graph models with spike-and-slab LASSO (SSL) priors. We develop an Expectation Conditional Maximization algor…
▽ More
The multivariate regression interpretation of the Gaussian chain graph model simultaneously parametrizes (i) the direct effects of $p$ predictors on $q$ outcomes and (ii) the residual partial covariances between pairs of outcomes. We introduce a new method for fitting sparse Gaussian chain graph models with spike-and-slab LASSO (SSL) priors. We develop an Expectation Conditional Maximization algorithm to obtain sparse estimates of the $p \times q$ matrix of direct effects and the $q \times q$ residual precision matrix. Our algorithm iteratively solves a sequence of penalized maximum likelihood problems with self-adaptive penalties that gradually filter out negligible regression coefficients and partial covariances. Because it adaptively penalizes individual model parameters, our method is seen to outperform fixed-penalty competitors on simulated data. We establish the posterior contraction rate for our model, buttressing our method's excellent empirical performance with strong theoretical guarantees. Using our method, we estimated the direct effects of diet and residence type on the composition of the gut microbiome of elderly adults.
△ Less
Submitted 26 March, 2024; v1 submitted 14 July, 2022;
originally announced July 2022.
-
Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League
Authors:
Yuesen Li,
Shouxin Zong,
Yanfei Shen,
Zhiqiang Pu,
Miguel-Ángel Gómez,
Yixiong Cui
Abstract:
Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a rece…
▽ More
Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a recently adopted Player Vectors framework. Data of 960 matches from 2016-2019 CSL were used. Match ratings, and ten types of match events with the corresponding coordinates for all the lineup players whose on-pitch time exceeded 45 minutes were extracted. Players were first clustered into 8 positions. A player vector was constructed for each player in each match based on the Player Vectors using Nonnegative Matrix Factorization (NMF). Another NMF process was run on the player vectors to extract different types of playing styles. The resulting player vectors discovered 18 different playing styles in the CSL. Six performance indicators of each style were investigated to observe their contributions. In general, the playing styles of forwards and midfielders are in line with football performance evolution trends, while the styles of defenders should be reconsidered. Multifunctional playing styles were also found in high rated CSL players.
△ Less
Submitted 7 July, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
Finding MNEMON: Reviving Memories of Node Embeddings
Authors:
Yun Shen,
Yufei Han,
Zhikun Zhang,
Min Chen,
Ting Yu,
Michael Backes,
Yang Zhang,
Gianluca Stringhini
Abstract:
Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In th…
▽ More
Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.
△ Less
Submitted 29 April, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation
Authors:
Yinan Shen,
**gyang Li,
Jian-Feng Cai,
Dong Xia
Abstract:
Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to del…
▽ More
Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
△ Less
Submitted 10 May, 2023; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process
Authors:
Chengchun Shi,
** Zhu,
Ye Shen,
Shikai Luo,
Hongtu Zhu,
Rui Song
Abstract:
This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In th…
▽ More
This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies. A Python implementation of the proposed procedure is available at https://github.com/Mamba413/cope.
△ Less
Submitted 3 November, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
BayesPPD: An R Package for Bayesian Sample Size Determination Using the Power and Normalized Power Prior for Generalized Linear Models
Authors:
Yueqi Shen,
Matthew A. Psioda,
Joseph G. Ibrahim
Abstract:
The R package BayesPPD (Bayesian Power Prior Design) supports Bayesian power and type I error calculation and model fitting after incorporating historical data with the power prior and the normalized power prior for generalized linear models (GLM). The package accommodates summary level data or subject level data with covariate information. It supports use of multiple historical datasets as well a…
▽ More
The R package BayesPPD (Bayesian Power Prior Design) supports Bayesian power and type I error calculation and model fitting after incorporating historical data with the power prior and the normalized power prior for generalized linear models (GLM). The package accommodates summary level data or subject level data with covariate information. It supports use of multiple historical datasets as well as design without historical data. Supported distributions for responses include normal, binary (Bernoulli/binomial), Poisson and exponential. The power parameter $a_0$ can be fixed or modeled as random using a normalized power prior for each of these distributions. In addition, the package supports the use of arbitrary sampling priors for computing Bayesian power and type I error rates, and has specific features for GLMs that semi-automatically generate sampling priors from historical data. Since sample size determination (SSD) for GLMs is computationally intensive, an approximation method based on asymptotic theory has been implemented to support applications using the power prior. In addition to describing the statistical methodology and functions implemented in the package to enable SSD, we also demonstrate the use of BayesPPD in two comprehensive case studies.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model: A new take on Robust Interpolation
Authors:
Stanislav Minsker,
Mohamed Ndaoud,
Yiqiu Shen
Abstract:
We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model in high dimensions and in the non-asymptotic setting. We first derive a lower and a matching upper bound for the minimax risk of clustering in this framework. We also show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the mini…
▽ More
We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model in high dimensions and in the non-asymptotic setting. We first derive a lower and a matching upper bound for the minimax risk of clustering in this framework. We also show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense. Next, we characterize precisely the risk of $\ell_2$-regularized supervised least squares classifiers. We deduce the fact that the interpolating solution may outperform the regularized classifier, under mild assumptions on the covariance structure of the noise. Our analysis also shows that interpolation can be robust to corruption in the covariance of the noise when the signal is aligned with the "clean" part of the covariance, for the properly defined notion of alignment. To the best of our knowledge, this peculiar phenomenon has not yet been investigated in the rapidly growing literature related to interpolation. We conclude that interpolation is not only benign but can also be optimal, and in some cases robust.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning
Authors:
Ye Shen,
Hengrui Cai,
Rui Song
Abstract:
Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instruction on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a pro…
▽ More
Evaluating the performance of an ongoing policy plays a vital role in many areas such as medicine and economics, to provide crucial instruction on the early-stop of the online experiment and timely feedback from the environment. Policy evaluation in online learning thus attracts increasing attention by inferring the mean outcome of the optimal policy (i.e., the value) in real-time. Yet, such a problem is particularly challenging due to the dependent data generated in the online environment, the unknown optimal policy, and the complex exploration and exploitation trade-off in the adaptive experiment. In this paper, we aim to overcome these difficulties in policy evaluation for online learning. We explicitly derive the probability of exploration that quantifies the probability of exploring the non-optimal actions under commonly used bandit algorithms. We use this probability to conduct valid inference on the online conditional mean estimator under each action and develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning. The proposed value estimator provides double protection on the consistency and is asymptotically normal with a Wald-type confidence interval provided. Extensive simulations and real data applications are conducted to demonstrate the empirical validity of the proposed DREAM method.
△ Less
Submitted 28 January, 2023; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Uncertainty quantification in the Bradley-Terry-Luce model
Authors:
Chao Gao,
Yandi Shen,
Anderson Y. Zhang
Abstract:
The Bradley-Terry-Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimator…
▽ More
The Bradley-Terry-Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimators that have received much recent attention: the maximum likelihood estimator (MLE) and the spectral estimator. Using a unified proof strategy, we derive sharp and uniform non-asymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph. These expansions allow us to obtain: (i) finite-dimensional central limit theorems for both estimators; (ii) construction of confidence intervals for individual ranks; (iii) optimal constant of $\ell_2$ estimation, which is achieved by the MLE but not by the spectral estimator. Our proof is based on a self-consistent equation of the second-order remainder vector and a novel leave-two-out analysis.
△ Less
Submitted 9 August, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Inference Attacks Against Graph Neural Networks
Authors:
Zhikun Zhang,
Min Chen,
Michael Backes,
Yun Shen,
Yang Zhang
Abstract:
Graph is an important data representation ubiquitously existing in the real world. However, analyzing the graph data is computationally difficult due to its non-Euclidean nature. Graph embedding is a powerful tool to solve the graph analytics problem by transforming the graph data into low-dimensional vectors. These vectors could also be shared with third parties to gain additional insights of wha…
▽ More
Graph is an important data representation ubiquitously existing in the real world. However, analyzing the graph data is computationally difficult due to its non-Euclidean nature. Graph embedding is a powerful tool to solve the graph analytics problem by transforming the graph data into low-dimensional vectors. These vectors could also be shared with third parties to gain additional insights of what is behind the data. While sharing graph embedding is intriguing, the associated privacy risks are unexplored. In this paper, we systematically investigate the information leakage of the graph embedding by mounting three inference attacks. First, we can successfully infer basic graph properties, such as the number of nodes, the number of edges, and graph density, of the target graph with up to 0.89 accuracy. Second, given a subgraph of interest and the graph embedding, we can determine with high confidence that whether the subgraph is contained in the target graph. For instance, we achieve 0.98 attack AUC on the DD dataset. Third, we propose a novel graph reconstruction attack that can reconstruct a graph that has similar graph structural statistics to the target graph. We further propose an effective defense mechanism based on graph embedding perturbation to mitigate the inference attacks without noticeable performance degradation for graph classification tasks. Our code is available at https://github.com/Zhangzhk0819/GNN-Embedding-Leaks.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
An Empirical Bayes Robust Meta-Analytical-Predictive Prior to Adaptively Leverage External Data
Authors:
Hongtao Zhang,
Yueqi Shen,
Alan Y Chiang,
Judy Li
Abstract:
We propose a novel empirical Bayes robust MAP (EB-rMAP) prior to adaptively leverage external/historical data. Built on Box's prior predictive p-value, the EB-rMAP prior framework balances between model parsimony and flexibility through a tuning parameter. The proposed framework can be applied to binary, normal, and time-to-event endpoints. Computational aspects of the framework are efficient. Sim…
▽ More
We propose a novel empirical Bayes robust MAP (EB-rMAP) prior to adaptively leverage external/historical data. Built on Box's prior predictive p-value, the EB-rMAP prior framework balances between model parsimony and flexibility through a tuning parameter. The proposed framework can be applied to binary, normal, and time-to-event endpoints. Computational aspects of the framework are efficient. Simulations results with different endpoints demonstrate that the EB-rMAP prior is robust in the presence of prior-data conflict while preserving statistical power. The proposed EB-rMAP prior is then applied to a clinical dataset that comprises of ten oncology clinical trials, including the perspective study.
△ Less
Submitted 7 December, 2021; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection
Authors:
Yihang Shen,
Carl Kingsford
Abstract:
Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, develo** effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of e…
▽ More
Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, develo** effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of embedding the high-dimensional space to the one with low dimension are sensitive to the choice of the embedding dimension, which needs to be pre-specified. We develop a new computationally efficient high-dimensional BO method that exploits variable selection. Our method is able to automatically learn axis-aligned sub-spaces, i.e. spaces containing selected variables, without the demand of any pre-specified hyperparameters. We theoretically analyze the computational complexity of our algorithm and derive the regret bound. We empirically show the efficacy of our method on several synthetic and real problems.
△ Less
Submitted 12 February, 2024; v1 submitted 19 September, 2021;
originally announced September 2021.
-
CARlasso: An R package for the estimation of sparse microbial networks with predictors
Authors:
Yunyi Shen,
Claudia Solis-Lemus
Abstract:
Microbiome data analyses require statistical tools that can simultaneously decode microbes' reactions to the environment and interactions among microbes. We introduce CARlasso, the first user-friendly open-source and publicly available R package to fit a chain graph model for the inference of sparse microbial networks that represent both interactions among nodes and effects of a set of predictors.…
▽ More
Microbiome data analyses require statistical tools that can simultaneously decode microbes' reactions to the environment and interactions among microbes. We introduce CARlasso, the first user-friendly open-source and publicly available R package to fit a chain graph model for the inference of sparse microbial networks that represent both interactions among nodes and effects of a set of predictors. Unlike in standard regression approaches, the edges represent the correct conditional structure among responses and predictors that allows the incorporation of prior knowledge from controlled experiments. In addition, CARlasso 1) enforces sparsity in the network via LASSO; 2) allows for an adaptive extension to include different shrinkage to different edges; 3) is computationally inexpensive through an efficient Gibbs sampling algorithm so it can equally handle small and big data; 4) allows for continuous, binary, counting and compositional responses via proper hierarchical structure, and 5) has a similar syntax to lm for ease of use. The package also supports Bayesian graphical LASSO and several of its hierarchical models as well as lower level one-step sampling functions of the CAR-LASSO model for users to extend.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Explicit Pairwise Factorized Graph Neural Network for Semi-Supervised Node Classification
Authors:
Yu Wang,
Yuesong Shen,
Daniel Cremers
Abstract:
Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations,…
▽ More
Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations, despite their direct connections in the graph. To learn the direct influence among output nodes in a graph, we propose the Explicit Pairwise Factorized Graph Neural Network (EPFGNN), which models the whole graph as a partially observed Markov Random Field. It contains explicit pairwise factors to model output-output relations and uses a GNN backbone to model input-output relations. To balance model complexity and expressivity, the pairwise factors have a shared component and a separate scaling coefficient for each edge. We apply the EM algorithm to train our model, and utilize a star-shaped piecewise likelihood for the tractable surrogate objective. We conduct experiments on various datasets, which shows that our model can effectively improve the performance for semi-supervised node classification on graphs.
△ Less
Submitted 27 July, 2021;
originally announced July 2021.
-
Derivatives and residual distribution of regularized M-estimators with application to adaptive tuning
Authors:
Pierre C Bellec,
Yiwei Shen
Abstract:
This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formula…
▽ More
This paper studies M-estimators with gradient-Lipschitz loss function regularized with convex penalty in linear models with Gaussian design matrix and arbitrary noise distribution. A practical example is the robust M-estimator constructed with the Huber loss and the Elastic-Net penalty and the noise distribution has heavy-tails. Our main contributions are three-fold. (i) We provide general formulae for the derivatives of regularized M-estimators $\hatβ(y,X)$ where differentiation is taken with respect to both $y$ and $X$; this reveals a simple differentiability structure shared by all convex regularized M-estimators. (ii) Using these derivatives, we characterize the distribution of the residual $r_i = y_i-x_i^\top\hatβ$ in the intermediate high-dimensional regime where dimension and sample size are of the same order. (iii) Motivated by the distribution of the residuals, we propose a novel adaptive criterion to select tuning parameters of regularized M-estimators. The criterion approximates the out-of-sample error up to an additive constant independent of the estimator, so that minimizing the criterion provides a proxy for minimizing the out-of-sample error. The proposed adaptive criterion does not require the knowledge of the noise distribution or of the covariance of the design. Simulated data confirms the theoretical findings, regarding both the distribution of the residuals and the success of the criterion as a proxy of the out-of-sample error. Finally our results reveal new relationships between the derivatives of $\hatβ(y,X)$ and the effective degrees of freedom of the M-estimator, which are of independent interest.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.
-
Asymptotic normality of robust $M$-estimators with convex penalty
Authors:
Pierre C Bellec,
Yiwei Shen,
Cun-Hui Zhang
Abstract:
This paper develops asymptotic normality results for individual coordinates of robust M-estimators with convex penalty in high-dimensions, where the dimension $p$ is at most of the same order as the sample size $n$, i.e, $p/n\leγ$ for some fixed constant $γ>0$. The asymptotic normality requires a bias correction and holds for most coordinates of the M-estimator for a large class of loss functions…
▽ More
This paper develops asymptotic normality results for individual coordinates of robust M-estimators with convex penalty in high-dimensions, where the dimension $p$ is at most of the same order as the sample size $n$, i.e, $p/n\leγ$ for some fixed constant $γ>0$. The asymptotic normality requires a bias correction and holds for most coordinates of the M-estimator for a large class of loss functions including the Huber loss and its smoothed versions regularized with a strongly convex penalty.
The asymptotic variance that characterizes the width of the resulting confidence intervals is estimated with data-driven quantities. This estimate of the variance adapts automatically to low ($p/n\to0)$ or high ($p/n \le γ$) dimensions and does not involve the proximal operators seen in previous works on asymptotic normality of M-estimators. For the Huber loss, the estimated variance has a simple expression involving an effective degrees-of-freedom as well as an effective sample size. The case of the Huber loss with Elastic-Net penalty is studied in details and a simulation study confirms the theoretical findings. The asymptotic normality results follow from Stein formulae for high-dimensional random vectors on the sphere developed in the paper which are of independent interest.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
The Effect of the Prior and the Experimental Design on the Inference of the Precision Matrix in Gaussian Chain Graph Models
Authors:
Yunyi Shen,
Claudia Solis-Lemus
Abstract:
Here, we investigate whether (and how) experimental design could aid in the estimation of the precision matrix in a Gaussian chain graph model, especially the interplay between the design, the effect of the experiment and prior knowledge about the effect. Estimation of the precision matrix is a fundamental task to infer biological graphical structures like microbial networks. We compare the margin…
▽ More
Here, we investigate whether (and how) experimental design could aid in the estimation of the precision matrix in a Gaussian chain graph model, especially the interplay between the design, the effect of the experiment and prior knowledge about the effect. Estimation of the precision matrix is a fundamental task to infer biological graphical structures like microbial networks. We compare the marginal posterior precision of the precision matrix under four priors: flat, conjugate Normal-Wishart, Normal-MGIG and a general independent. Under the flat and conjugate priors, the Laplace-approximated posterior precision is not a function of the design matrix rendering useless any efforts to find an optimal experimental design to infer the precision matrix. In contrast, the Normal-MGIG and general independent priors do allow for the search of optimal experimental designs, yet there is a sharp upper bound on the information that can be extracted from a given experiment. We confirm our theoretical findings via a simulation study comparing i) the KL divergence between prior and posterior and ii) the Stein's loss difference of MAPs between random and no experiment. Our findings provide practical advice for domain scientists conducting experiments to better infer the precision matrix as a representation of a biological network.
△ Less
Submitted 29 November, 2023; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Online Learning with Uncertain Feedback Graphs
Authors:
Pouya M Ghari,
Yanning Shen
Abstract:
Online learning with expert advice is widely used in various machine learning tasks. It considers the problem where a learner chooses one from a set of experts to take advice and make a decision. In many learning problems, experts may be related, henceforth the learner can observe the losses associated with a subset of experts that are related to the chosen one. In this context, the relationship a…
▽ More
Online learning with expert advice is widely used in various machine learning tasks. It considers the problem where a learner chooses one from a set of experts to take advice and make a decision. In many learning problems, experts may be related, henceforth the learner can observe the losses associated with a subset of experts that are related to the chosen one. In this context, the relationship among experts can be captured by a feedback graph, which can be used to assist the learner's decision making. However, in practice, the nominal feedback graph often entails uncertainties, which renders it impossible to reveal the actual relationship among experts. To cope with this challenge, the present work studies various cases of potential uncertainties, and develops novel online learning algorithms to deal with uncertainties while making use of the uncertain feedback graph. The proposed algorithms are proved to enjoy sublinear regret under mild conditions. Experiments on real datasets are presented to demonstrate the effectiveness of the novel algorithms.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Invariant Information Bottleneck for Domain Generalization
Authors:
Bo Li,
Yifei Shen,
Yezhen Wang,
Wenzhen Zhu,
Colorado J. Reed,
Jun Zhang,
Dongsheng Li,
Kurt Keutzer,
Han Zhao
Abstract:
Invariant risk minimization (IRM) has recently emerged as a promising alternative for domain generalization. Nevertheless, the loss function is difficult to optimize for nonlinear classifiers and the original optimization objective could fail when pseudo-invariant features and geometric skews exist. Inspired by IRM, in this paper we propose a novel formulation for domain generalization, dubbed inv…
▽ More
Invariant risk minimization (IRM) has recently emerged as a promising alternative for domain generalization. Nevertheless, the loss function is difficult to optimize for nonlinear classifiers and the original optimization objective could fail when pseudo-invariant features and geometric skews exist. Inspired by IRM, in this paper we propose a novel formulation for domain generalization, dubbed invariant information bottleneck (IIB). IIB aims at minimizing invariant risks for nonlinear classifiers and simultaneously mitigating the impact of pseudo-invariant features and geometric skews. Specifically, we first present a novel formulation for invariant causal prediction via mutual information. Then we adopt the variational formulation of the mutual information to develop a tractable loss function for nonlinear classifiers. To overcome the failure modes of IRM, we propose to minimize the mutual information between the inputs and the corresponding representations. IIB significantly outperforms IRM on synthetic datasets, where the pseudo-invariant features and geometric skews occur, showing the effectiveness of proposed formulation in overcoming failure modes of IRM. Furthermore, experiments on DomainBed show that IIB outperforms $13$ baselines by $0.9\%$ on average across $7$ real datasets.
△ Less
Submitted 21 March, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.