-
Bayesian Structured Mediation Analysis With Unobserved Confounders
Authors:
Yuliang Xu,
Shu Yang,
Jian Kang
Abstract:
We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent individual effects, which influence the structured mediators, as unobserved confounders in the outcome model, thereby potentially debiasing the mediation effects.…
▽ More
We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent individual effects, which influence the structured mediators, as unobserved confounders in the outcome model, thereby potentially debiasing the mediation effects. We develop BAyesian Structured Mediation analysis with Unobserved confounders (BASMU) framework, and establish its model identifiability conditions. Theoretical analysis is conducted on the asymptotic bias of the Natural Indirect Effect (NIE) and the Natural Direct Effect (NDE) when the unobserved confounders are omitted in mediation analysis. For BASMU, we propose a two-stage estimation algorithm to mitigate the impact of these unobserved confounders on estimating the mediation effect. Extensive simulations demonstrate that BASMU substantially reduces the bias in various scenarios. We apply BASMU to the analysis of fMRI data in the Adolescent Brain Cognitive Development (ABCD) study, focusing on four brain regions previously reported to exhibit meaningful mediation effects. Compared with the existing image mediation analysis method, BASMU identifies two to four times more voxels that have significant mediation effects, with the NIE increased by 41%, and the NDE decreased by 26%.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Spatially Structured Regression for Non-conformable Spaces: Integrating Pathology Imaging and Genomics Data in Cancer
Authors:
Nathaniel Osher,
Jian Kang,
Arvind Rao,
Veerabhadran Baladandayuthapani
Abstract:
The spatial composition and cellular heterogeneity of the tumor microenvironment plays a critical role in cancer development and progression. High-definition pathology imaging of tumor biopsies provide a high-resolution view of the spatial organization of different types of cells. This allows for systematic assessment of intra- and inter-patient spatial cellular interactions and heterogeneity by i…
▽ More
The spatial composition and cellular heterogeneity of the tumor microenvironment plays a critical role in cancer development and progression. High-definition pathology imaging of tumor biopsies provide a high-resolution view of the spatial organization of different types of cells. This allows for systematic assessment of intra- and inter-patient spatial cellular interactions and heterogeneity by integrating accompanying patient-level genomics data. However, joint modeling across tumor biopsies presents unique challenges due to non-conformability (lack of a common spatial domain across biopsies) as well as high-dimensionality. To address this problem, we propose the Dual random effect and main effect selection model for Spatially structured regression model (DreameSpase). DreameSpase employs a Bayesian variable selection framework that facilitates the assessment of spatial heterogeneity with respect to covariates both within (through fixed effects) and between spaces (through spatial random effects) for non-conformable spatial domains. We demonstrate the efficacy of DreameSpase via simulations and integrative analyses of pathology imaging and gene expression data obtained from $335$ melanoma biopsies. Our findings confirm several existing relationships, e.g. neutrophil genes being associated with both inter- and intra-patient spatial heterogeneity, as well as discovering novel associations. We also provide freely available and computationally efficient software for implementing DreameSpase.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Flat Posterior Does Matter For Bayesian Transfer Learning
Authors:
Sungjun Lim,
Jeyoon Yeom,
Sooyon Kim,
Hoyoon Byun,
**ho Kang,
Yohan Jung,
Jiyoung Jung,
Kyungwoo Song
Abstract:
The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning…
▽ More
The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning for BNNs has not been widely investigated and shows limited improvement. We hypothesize that this issue arises from the inability to find flat minima, which is crucial for generalization performance. To address this, we evaluate the sharpness of BNNs in various settings, revealing their insufficiency in seeking flat minima and the influence of flatness on BMA performance. Therefore, we propose Sharpness-aware Bayesian Model Averaging (SA-BMA), a Bayesian-fitting flat posterior seeking optimizer integrated with Bayesian transfer learning. SA-BMA calculates the divergence between posteriors in the parameter space, aligning with the nature of BNNs, and serves as a generalized version of existing sharpness-aware optimizers. We validate that SA-BMA improves generalization performance in few-shot classification and distribution shift scenarios by ensuring flatness.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Adaptive Bayesian Multivariate Spline Knot Inference with Prior Specifications on Model Complexity
Authors:
Junhui He,
Ying Yang,
Jian Kang
Abstract:
In multivariate spline regression, the number and locations of knots influence the performance and interpretability significantly. However, due to non-differentiability and varying dimensions, there is no desirable frequentist method to make inference on knots. In this article, we propose a fully Bayesian approach for knot inference in multivariate spline regression. The existing Bayesian method o…
▽ More
In multivariate spline regression, the number and locations of knots influence the performance and interpretability significantly. However, due to non-differentiability and varying dimensions, there is no desirable frequentist method to make inference on knots. In this article, we propose a fully Bayesian approach for knot inference in multivariate spline regression. The existing Bayesian method often uses BIC to calculate the posterior, but BIC is too liberal and it will heavily overestimate the knot number when the candidate model space is large. We specify a new prior on the knot number to take into account the complexity of the model space and derive an analytic formula in the normal model. In the non-normal cases, we utilize the extended Bayesian information criterion to approximate the posterior density. The samples are simulated in the space with differing dimensions via reversible jump Markov chain Monte Carlo. We apply the proposed method in knot inference and manifold denoising. Experiments demonstrate the splendid capability of the algorithm, especially in function fitting with jum** discontinuity.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Scalable Bayesian inference for heat kernel Gaussian processes on manifolds
Authors:
Junhui He,
Guoxuan Ma,
Jian Kang,
Ying Yang
Abstract:
We develop scalable manifold learning methods and theory, motivated by the problem of estimating manifold of fMRI activation in the Human Connectome Project (HCP). We propose the Fast Graph Laplacian Estimation for Heat Kernel Gaussian Processes (FLGP) in the natural exponential family model. FLGP handles large sample sizes $ n $, preserves the intrinsic geometry of data, and significantly reduces…
▽ More
We develop scalable manifold learning methods and theory, motivated by the problem of estimating manifold of fMRI activation in the Human Connectome Project (HCP). We propose the Fast Graph Laplacian Estimation for Heat Kernel Gaussian Processes (FLGP) in the natural exponential family model. FLGP handles large sample sizes $ n $, preserves the intrinsic geometry of data, and significantly reduces computational complexity from $ \mathcal{O}(n^3) $ to $ \mathcal{O}(n) $ via a novel reduced-rank approximation of the graph Laplacian's transition matrix and truncated Singular Value Decomposition for eigenpair computation. Our numerical experiments demonstrate FLGP's scalability and improved accuracy for manifold learning from large-scale complex data.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis
Authors:
Yuliang Xu,
Timothy D. Johnson,
Thomas E. Nichols,
Jian Kang
Abstract:
Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain m…
▽ More
Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain masks that deviate from the common mask typically used in standard ISR approaches. To address these challenges, we introduce a novel Bayesian ISR model that is scalable and accommodates inconsistent brain masks across subjects in large-scale imaging studies. Our model leverages Gaussian process priors and integrates salience area indicators to facilitate ISR. We develop a cutting-edge scalable posterior computation algorithm that employs stochastic gradient Langevin dynamics coupled with memory map** techniques, ensuring that computation time scales linearly with subsample size and memory usage is constrained only by the batch size. Our approach uniquely enables direct spatial posterior inferences on brain activation regions. The efficacy of our method is demonstrated through simulations and analysis of the UK Biobank task fMRI data, encompassing 38,639 subjects and over 120,000 voxels per image, showing that it can achieve a speed increase of 4 to 11 times and enhance statistical power by 8% to 18% compared to traditional Gibbs sampling with zero-imputation in various simulation scenarios.
△ Less
Submitted 15 June, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Scalable Scalar-on-Image Cortical Surface Regression with a Relaxed-Thresholded Gaussian Process Prior
Authors:
Anna Menacher,
Thomas E. Nichols,
Timothy D. Johnson,
Jian Kang
Abstract:
In addressing the challenge of analysing the large-scale Adolescent Brain Cognition Development (ABCD) fMRI dataset, involving over 5,000 subjects and extensive neuroimaging data, we propose a scalable Bayesian scalar-on-image regression model for computational feasibility and efficiency. Our model employs a relaxed-thresholded Gaussian process (RTGP), integrating piecewise-smooth, sparse, and con…
▽ More
In addressing the challenge of analysing the large-scale Adolescent Brain Cognition Development (ABCD) fMRI dataset, involving over 5,000 subjects and extensive neuroimaging data, we propose a scalable Bayesian scalar-on-image regression model for computational feasibility and efficiency. Our model employs a relaxed-thresholded Gaussian process (RTGP), integrating piecewise-smooth, sparse, and continuous functions capable of both hard- and soft-thresholding. This approach introduces additional flexibility in feature selection in scalar-on-image regression and leads to scalable posterior computation by adopting a variational approximation and utilising the Karhunen-Loève expansion for Gaussian processes. This advancement substantially reduces the computational costs in vertex-wise analysis of cortical surface data in large-scale Bayesian spatial models. The model's parameter estimation and prediction accuracy and feature selection performance are validated through extensive simulation studies and an application to the ABCD study. Here, we perform regression analysis correlating intelligence scores with task-based functional MRI data, taking into account confounding factors including age, sex, and parental education level. This validation highlights our model's capability to handle large-scale neuroimaging data while maintaining computational feasibility and accuracy.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Bayesian Signal Matching for Transfer Learning in ERP-Based Brain Computer Interface
Authors:
Tianwen Ma,
Jane E. Huggins,
Jian Kang
Abstract:
An Event-Related Potential (ERP)-based Brain-Computer Interface (BCI) Speller System assists people with disabilities communicate by decoding electroencephalogram (EEG) signals. A P300-ERP embedded in EEG signals arises in response to a rare, but relevant event (target) among a series of irrelevant events (non-target). Different machine learning methods have constructed binary classifiers to detec…
▽ More
An Event-Related Potential (ERP)-based Brain-Computer Interface (BCI) Speller System assists people with disabilities communicate by decoding electroencephalogram (EEG) signals. A P300-ERP embedded in EEG signals arises in response to a rare, but relevant event (target) among a series of irrelevant events (non-target). Different machine learning methods have constructed binary classifiers to detect target events, known as calibration. Existing calibration strategy only uses data from participants themselves with lengthy training time, causing biased P300 estimation and decreasing prediction accuracy. To resolve this issue, we propose a Bayesian signal matching (BSM) framework for calibrating the EEG signals from a new participant using data from source participants. BSM specifies the joint distribution of stimulus-specific EEG signals among source participants via a Bayesian hierarchical mixture model. We apply the inference strategy: if source and new participants are similar, they share the same set of model parameters, otherwise, they keep their own sets of model parameters; we predict on the testing data using parameters of the baseline cluster directly. Our hierarchical framework can be generalized to other base classifiers with clear likelihood specifications. We demonstrate the advantages of BSM using simulations and focus on the real data analysis among participants with neuro-degenerative diseases.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Empirical Bayes Covariance Decomposition, and a solution to the Multiple Tuning Problem in Sparse PCA
Authors:
Joonsuk Kang,
Matthew Stephens
Abstract:
Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introd…
▽ More
Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Bayesian Functional Analysis for Untargeted Metabolomics Data with Matching Uncertainty and Small Sample Sizes
Authors:
Guoxuan Ma,
Jian Kang,
Tianwei Yu
Abstract:
Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application given its ability to depict the global metabolic pattern in biological samples. However, the data is noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while…
▽ More
Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application given its ability to depict the global metabolic pattern in biological samples. However, the data is noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection, and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible.
△ Less
Submitted 23 March, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Bayesian Image-on-Image Regression via Deep Kernel Learning based Gaussian Processes
Authors:
Guoxuan Ma,
Bangyao Zhao,
Hasan Abu-Amara,
Jian Kang
Abstract:
In neuroimaging studies, it becomes increasingly important to study associations between different imaging modalities using image-on-image regression (IIR), which faces challenges in interpretation, statistical inference, and prediction. Our motivating problem is how to predict task-evoked fMRI activity using resting-state fMRI data in the Human Connectome Project (HCP). The main difficulty lies i…
▽ More
In neuroimaging studies, it becomes increasingly important to study associations between different imaging modalities using image-on-image regression (IIR), which faces challenges in interpretation, statistical inference, and prediction. Our motivating problem is how to predict task-evoked fMRI activity using resting-state fMRI data in the Human Connectome Project (HCP). The main difficulty lies in effectively combining different types of imaging predictors with varying resolutions and spatial domains in IIR. To address these issues, we develop Bayesian Image-on-image Regression via Deep Kernel Learning Gaussian Processes (BIRD-GP) and develop efficient posterior computation methods through Stein variational gradient descent. We demonstrate the advantages of BIRD-GP over state-of-the-art IIR methods using simulations. For HCP data analysis using BIRD-GP, we combine the voxel-wise fALFF maps and region-wise connectivity matrices to predict fMRI contrast maps for language and social recognition tasks. We show that fALFF is less predictive than the connectivity matrix for both tasks, but combining both yields improved results. Angular Gyrus Right emerges as the most predictable region for the language task (75.9% predictable voxels), while Superior Parietal Gyrus Right tops for the social recognition task (48.9% predictable voxels). Additionally, we identify features from the resting-state fMRI data that are important for task fMRI prediction.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Robust Bayesian Graphical Regression Models for Assessing Tumor Heterogeneity in Proteomic Networks
Authors:
Tsung-Hung Yao,
Yang Ni,
Anindya Bhadra,
Jian Kang,
Veerabhadran Baladandayuthapani
Abstract:
Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of the two canonical assumptions: (i) a homogeneous graph with a common network for all subjects; or (ii) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hol…
▽ More
Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of the two canonical assumptions: (i) a homogeneous graph with a common network for all subjects; or (ii) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hold in certain applications such as proteomic networks in cancer. To this end, we propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality through random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity through graphical regression techniques. We formulate a new characterization of edge dependencies in such models called conditional sign independence with covariates along with an efficient posterior sampling algorithm. In simulation studies, we demonstrate that rBGR outperforms existing graphical regression models for data generated under various levels of non-normality in both edge and covariate selection. We use rBGR to assess proteomic networks across two cancers: lung and ovarian, to systematically investigate the effects of immunogenic heterogeneity within tumors. Our analyses reveal several important protein-protein interactions that are differentially impacted by the immune cell abundance; some corroborate existing biological knowledge whereas others are novel findings.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Bayesian Image Mediation Analysis
Authors:
Yuliang Xu,
Jian Kang
Abstract:
Mediation analysis aims to separate the indirect effect through mediators from the direct effect of the exposure on the outcome. It is challenging to perform mediation analysis with neuroimaging data which involves high dimensionality, complex spatial correlations, sparse activation patterns and relatively low signal-to-noise ratio. To address these issues, we develop a new spatially varying coeff…
▽ More
Mediation analysis aims to separate the indirect effect through mediators from the direct effect of the exposure on the outcome. It is challenging to perform mediation analysis with neuroimaging data which involves high dimensionality, complex spatial correlations, sparse activation patterns and relatively low signal-to-noise ratio. To address these issues, we develop a new spatially varying coefficient structural equation model for Bayesian Image Mediation Analysis (BIMA). We define spatially varying mediation effects within the potential outcome framework, employing the soft-thresholded Gaussian process prior for functional parameters. We establish the posterior consistency for spatially varying mediation effects along with selection consistency on important regions that contribute to the mediation effects. We develop an efficient posterior computation algorithm scalable to analysis of large-scale imaging data. Through extensive simulations, we show that BIMA can improve the estimation accuracy and computational efficiency for high-dimensional mediation analysis over the existing methods. We apply BIMA to analyze the behavioral and fMRI data in the Adolescent Brain Cognitive Development (ABCD) study with a focus on inferring the mediation effects of the parental education level on the children's general cognitive ability that are mediated through the working memory brain activities.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Deceptive Fairness Attacks on Graphs via Meta Learning
Authors:
Jian Kang,
Yinglong Xia,
Ross Maciejewski,
Jiebo Luo,
Hanghang Tong
Abstract:
We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as wel…
▽ More
We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
On the Temperature of Bayesian Graph Neural Networks for Conformal Prediction
Authors:
Seohyeon Cha,
Honggu Kang,
Joonhyuk Kang
Abstract:
Accurate uncertainty quantification in graph neural networks (GNNs) is essential, especially in high-stakes domains where GNNs are frequently employed. Conformal prediction (CP) offers a promising framework for quantifying uncertainty by providing $\textit{valid}$ prediction sets for any black-box model. CP ensures formal probabilistic guarantees that a prediction set contains a true label with a…
▽ More
Accurate uncertainty quantification in graph neural networks (GNNs) is essential, especially in high-stakes domains where GNNs are frequently employed. Conformal prediction (CP) offers a promising framework for quantifying uncertainty by providing $\textit{valid}$ prediction sets for any black-box model. CP ensures formal probabilistic guarantees that a prediction set contains a true label with a desired probability. However, the size of prediction sets, known as $\textit{inefficiency}$, is influenced by the underlying model and data generating process. On the other hand, Bayesian learning also provides a credible region based on the estimated posterior distribution, but this region is $\textit{well-calibrated}$ only when the model is correctly specified. Building on a recent work that introduced a scaling parameter for constructing valid credible regions from posterior estimate, our study explores the advantages of incorporating a temperature parameter into Bayesian GNNs within CP framework. We empirically demonstrate the existence of temperatures that result in more efficient prediction sets. Furthermore, we conduct an analysis to identify the factors contributing to inefficiency and offer valuable insights into the relationship between CP performance and model calibration.
△ Less
Submitted 3 December, 2023; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Towards Flexible Time-to-event Modeling: Optimizing Neural Networks via Rank Regression
Authors:
Hyunjun Lee,
Junhyun Lee,
Taehwa Choi,
Jaewoo Kang,
Sangbum Choi
Abstract:
Time-to-event analysis, also known as survival analysis, aims to predict the time of occurrence of an event, given a set of features. One of the major challenges in this area is dealing with censored data, which can make learning algorithms more complex. Traditional methods such as Cox's proportional hazards model and the accelerated failure time (AFT) model have been popular in this field, but th…
▽ More
Time-to-event analysis, also known as survival analysis, aims to predict the time of occurrence of an event, given a set of features. One of the major challenges in this area is dealing with censored data, which can make learning algorithms more complex. Traditional methods such as Cox's proportional hazards model and the accelerated failure time (AFT) model have been popular in this field, but they often require assumptions such as proportional hazards and linearity. In particular, the AFT models often require pre-specified parametric distributional assumptions. To improve predictive performance and alleviate strict assumptions, there have been many deep learning approaches for hazard-based models in recent years. However, representation learning for AFT has not been widely explored in the neural network literature, despite its simplicity and interpretability in comparison to hazard-focused methods. In this work, we introduce the Deep AFT Rank-regression model for Time-to-event prediction (DART). This model uses an objective function based on Gehan's rank statistic, which is efficient and reliable for representation learning. On top of eliminating the requirement to establish a baseline event time distribution, DART retains the advantages of directly predicting event time in standard AFT models. The proposed method is a semiparametric approach to AFT modeling that does not impose any distributional assumptions on the survival time distribution. This also eliminates the need for additional hyperparameters or complex model architectures, unlike existing neural network-based AFT models. Through quantitative analysis on various benchmark datasets, we have shown that DART has significant potential for modeling high-throughput censored time-to-event data.
△ Less
Submitted 22 July, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Flexible Bayesian Modeling for Longitudinal Binary and Ordinal Responses
Authors:
Jizhou Kang,
Athanasios Kottas
Abstract:
Longitudinal studies with binary or ordinal responses are widely encountered in various disciplines, where the primary focus is on the temporal evolution of the probability of each response category. Traditional approaches build from the generalized mixed effects modeling framework. Even amplified with nonparametric priors placed on the fixed or random effects, such models are restrictive due to t…
▽ More
Longitudinal studies with binary or ordinal responses are widely encountered in various disciplines, where the primary focus is on the temporal evolution of the probability of each response category. Traditional approaches build from the generalized mixed effects modeling framework. Even amplified with nonparametric priors placed on the fixed or random effects, such models are restrictive due to the implied assumptions on the marginal expectation and covariance structure of the responses. We tackle the problem from a functional data analysis perspective, treating the observations for each subject as realizations from subject-specific stochastic processes at the measured times. We develop the methodology focusing initially on binary responses, for which we assume the stochastic processes have Binomial marginal distributions. Leveraging the logits representation, we model the discrete space processes through sequences of continuous space processes. We utilize a hierarchical framework to model the mean and covariance kernel of the continuous space processes nonparametrically and simultaneously through a Gaussian process prior and an Inverse-Wishart process prior, respectively. The prior structure results in flexible inference for the evolution and correlation of binary responses, while allowing for borrowing of strength across all subjects. The modeling approach can be naturally extended to ordinal responses. Here, the continuation-ratio logits factorization of the multinomial distribution is key for efficient modeling and inference, including a practical way of dealing with unbalanced longitudinal data. The methodology is illustrated with synthetic data examples and an analysis of college students' mental health status data.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Latent Subgroup Identification in Image-on-scalar Regression
Authors:
Zikai Lin,
Yajuan Si,
Jian Kang
Abstract:
Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, e.g., the Adolescent Brain Cognitive Development (ABCD) study. The ABCD data can inform our understandin…
▽ More
Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, e.g., the Adolescent Brain Cognitive Development (ABCD) study. The ABCD data can inform our understanding of heterogeneous associations and how to leverage the heterogeneity and tailor interventions to increase the number of youths who benefit. It is of great interest to identify subgroups of individuals from the population such that: 1) within each subgroup the brain activities have homogeneous associations with the clinical measures; 2) across subgroups the associations are heterogeneous; and 3) the group allocation depends on individual characteristics. Existing image-on-scalar regression methods and clustering methods cannot directly achieve this goal. We propose a latent subgroup image-on-scalar regression model (LASIR) to analyze large-scale, multi-site neuroimaging data with diverse sociodemographics. LASIR introduces the latent subgroup for each individual and group-specific, spatially varying effects, with an efficient stochastic expectation maximization algorithm for inferences. We demonstrate that LASIR outperforms existing alternatives for subgroup identification of brain activation patterns with functional magnetic resonance imaging data via comprehensive simulations and applications to the ABCD study. We have released our reproducible codes for public use with the software package available on Github: https://github.com/zikaiLin/lasir.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Mediation with External Summary Statistic Information (MESSI)
Authors:
Jonathan Boss,
Wei Hao,
Amber Cathey,
Barrett M. Welch,
Kelly K. Ferguson,
John D. Meeker,
Jian Kang,
Bhramar Mukherjee
Abstract:
Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there…
▽ More
Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $\boldsymbol{A}$ on $\boldsymbol{Y}$ given $\boldsymbol{C}$. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect improves estimation efficiency of the natural direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $R^2$ between the outcome ($\boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C}$) and total effect ($\boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C}$) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We robustify our estimation procedure to incongenial external information by assuming the total effect follows a random distribution. This framework allows shrinkage towards the external information if the total effects in the internal and external populations agree. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Bayesian inference for group-level cortical surface image-on-scalar-regression with Gaussian process priors
Authors:
Andrew S. Whiteman,
Timothy D. Johnson,
Jian Kang
Abstract:
In regression-based analyses of group-level neuroimage data researchers typically fit a series of marginal general linear models to image outcomes at each spatially-referenced pixel. Spatial regularization of effects of interest is usually induced indirectly by applying spatial smoothing to the data during preprocessing. While this procedure often works well, resulting inference can be poorly cali…
▽ More
In regression-based analyses of group-level neuroimage data researchers typically fit a series of marginal general linear models to image outcomes at each spatially-referenced pixel. Spatial regularization of effects of interest is usually induced indirectly by applying spatial smoothing to the data during preprocessing. While this procedure often works well, resulting inference can be poorly calibrated. Spatial modeling of effects of interest leads to more powerful analyses, however the number of locations in a typical neuroimage can preclude standard computation with explicitly spatial models. Here we contribute a Bayesian spatial regression model for group-level neuroimaging analyses. We induce regularization of spatially varying regression coefficient functions through Gaussian process priors. When combined with a simple nonstationary model for the error process, our prior hierarchy can lead to more data-adaptive smoothing than standard methods. We achieve computational tractability through Vecchia approximation of our prior which, critically, can be constructed for a wide class of spatial correlation functions and results in prior models that retain full spatial rank. We outline several ways to work with our model in practice and compare performance against standard vertex-wise analyses. Finally we illustrate our method in an analysis of cortical surface fMRI task contrast data from a large cohort of children enrolled in the Adolescent Brain Cognitive Development study.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Sequential Best-Arm Identification with Application to Brain-Computer Interface
Authors:
Xin Zhou,
Botao Hao,
Jian Kang,
Tor Lattimore,
Lexin Li
Abstract:
A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system. It allows individuals to interact with the device using only their thoughts, and holds immense potential for a wide range of applications in medicine, rehabilitation, and human augmentation. An electroencephalogram (EEG) and event-related potential (ERP)-b…
▽ More
A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system. It allows individuals to interact with the device using only their thoughts, and holds immense potential for a wide range of applications in medicine, rehabilitation, and human augmentation. An electroencephalogram (EEG) and event-related potential (ERP)-based speller system is a type of BCI that allows users to spell words without using a physical keyboard, but instead by recording and interpreting brain signals under different stimulus presentation paradigms. Conventional non-adaptive paradigms treat each word selection independently, leading to a lengthy learning process. To improve the sampling efficiency, we cast the problem as a sequence of best-arm identification tasks in multi-armed bandits. Leveraging pre-trained large language models (LLMs), we utilize the prior knowledge learned from previous tasks to inform and facilitate subsequent tasks. To do so in a coherent way, we propose a sequential top-two Thompson sampling (STTS) algorithm under the fixed-confidence setting and the fixed-budget setting. We study the theoretical property of the proposed algorithm, and demonstrate its substantial empirical improvement through both synthetic data analysis as well as a P300 BCI speller simulator example.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Bayesian Inference on Brain-Computer Interfaces via GLASS
Authors:
Bangyao Zhao,
Jane E. Huggins,
Jian Kang
Abstract:
Brain-computer interfaces (BCIs), particularly the P300 BCI, facilitate direct communication between the brain and computers. The fundamental statistical problem in P300 BCIs lies in classifying target and non-target stimuli based on electroencephalogram (EEG) signals. However, the low signal-to-noise ratio (SNR) and complex spatial/temporal correlations of EEG signals present challenges in modeli…
▽ More
Brain-computer interfaces (BCIs), particularly the P300 BCI, facilitate direct communication between the brain and computers. The fundamental statistical problem in P300 BCIs lies in classifying target and non-target stimuli based on electroencephalogram (EEG) signals. However, the low signal-to-noise ratio (SNR) and complex spatial/temporal correlations of EEG signals present challenges in modeling and computation, especially for individuals with severe physical disabilities-BCI's primary users. To address these challenges, we introduce a novel Gaussian Latent channel model with Sparse time-varying effects (GLASS) under a fully Bayesian framework. GLASS is built upon a constrained multinomial logistic regression particularly designed for the imbalanced target and non-target stimuli. The novel latent channel decomposition efficiently alleviates strong spatial correlations between EEG channels, while the soft-thresholded Gaussian process (STGP) prior ensures sparse and smooth time-varying effects. We demonstrate GLASS substantially improves BCI's performance in participants with amyotrophic lateral sclerosis (ALS) and identifies important EEG channels (PO8, Oz, PO7, and Pz) in parietal and occipital regions that align with existing literature. For broader accessibility, we develop an efficient gradient-based variational inference (GBVI) algorithm for posterior computation and provide a user-friendly Python module available at https://github.com/BangyaoZhao/GLASS.
△ Less
Submitted 14 February, 2024; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients
Authors:
Yuming Sun,
Jian Kang,
Chinmay Haridas,
Nicholas R. Mayne,
Alexandra L. Potter,
Chi-Fu Jeffrey Yang,
David C. Christiani,
Yi Li
Abstract:
Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) employed computed tomography texture analysis, which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially…
▽ More
Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) employed computed tomography texture analysis, which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially linear Cox models have gained popularity for survival analysis by dissecting the hazard function into parametric and nonparametric components, allowing for the effective incorporation of both well-established risk factors (such as age and clinical variables) and emerging risk factors (e.g., image features) within a unified framework. However, when the dimension of parametric components exceeds the sample size, the task of model fitting becomes formidable, while nonparametric modeling grapples with the curse of dimensionality. We propose a novel Penalized Deep Partially Linear Cox Model (Penalized DPLC), which incorporates the SCAD penalty to select important texture features and employs a deep neural network to estimate the nonparametric component of the model. We prove the convergence and asymptotic properties of the estimator and compare it to other methods through extensive simulation studies, evaluating its performance in risk prediction and feature selection. The proposed method is applied to the NLST study dataset to uncover the effects of key clinical and imaging risk factors on patients' survival. Our findings provide valuable insights into the relationship between these factors and survival outcomes.
△ Less
Submitted 29 September, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Statistical inferences for complex dependence of multimodal imaging data
Authors:
**yuan Chang,
**g He,
Jian Kang,
Mingcong Wu
Abstract:
Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HC…
▽ More
Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HCP) study, we particularly address three hypothesis testing problems: (a) testing independence among imaging modalities over brain regions, (b) testing independence between brain regions within imaging modalities, and (c) testing independence between brain regions across different modalities. Considering a general form for all the three tests, we develop a global testing procedure and a multiple testing procedure controlling the false discovery rate. We study theoretical properties of the proposed tests and develop a computationally efficient distributed algorithm. The proposed methods and theory are general and relevant for many statistical problems of testing independence structure among the components of high-dimensional random vectors with arbitrary dependence structures. We also illustrate our proposed methods via extensive simulations and analysis of five task fMRI contrast maps in the HCP study.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression
Authors:
Jizhou Kang,
Athanasios Kottas
Abstract:
We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors placed directly on the discrete distribution of the ordinal responses. The prior probability models are built from a structured mixture of multinomial distributions. We leverage a continuation-ratio logits representation to formulate the mixture kernel, with mixture weights defined through the logit stick-b…
▽ More
We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors placed directly on the discrete distribution of the ordinal responses. The prior probability models are built from a structured mixture of multinomial distributions. We leverage a continuation-ratio logits representation to formulate the mixture kernel, with mixture weights defined through the logit stick-breaking process that incorporates the covariates through a linear function. The implied regression functions for the response probabilities can be expressed as weighted sums of parametric regression functions, with covariate-dependent weights. Thus, the modeling approach achieves flexible ordinal regression relationships, avoiding linearity or additivity assumptions in the covariate effects. Model flexibility is formally explored through the Kullback-Leibler support of the prior probability model. A key model feature is that the parameters for both the mixture kernel and the mixture weights can be associated with a continuation-ratio logits regression structure. Hence, an efficient and relatively easy to implement posterior simulation method can be designed, using Pólya-Gamma data augmentation. Moreover, the model is built from a conditional independence structure for category-specific parameters, which results in additional computational efficiency gains through partial parallel sampling. In addition to the general mixture structure, we study simplified model versions that incorporate covariate dependence only in the mixture kernel parameters or only in the mixture weights. For all proposed models, we discuss approaches to prior specification and develop Markov chain Monte Carlo methods for posterior simulation. The methodology is illustrated with several synthetic and real data examples.
△ Less
Submitted 22 March, 2024; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment
Authors:
Qinmengge Li,
Matthew T. Patrick,
Haihan Zhang,
Chachrit Khunsriraksakul,
Philip E. Stuart,
Johann E. Gudjonsson,
Rajan Nair,
James T. Elder,
Dajiang J. Liu,
Jian Kang,
Lam C. Tsoi,
Kevin He
Abstract:
Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Cau…
▽ More
Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Caucasian model for the minority population also has limited performance. In addition, due to data privacy, the individual genotype data is not accessible for either the Caucasian population or the minority population. To address these challenges, we propose a Bregman divergence-based estimation procedure to measure and optimally balance the information from different populations. The proposed method only requires the use of encrypted summary statistics and improves the PRS performance for ethnic minority groups by incorporating additional information. We provide the asymptotic consistency and weak oracle property for the proposed method. Simulations and real data analyses also show its advantages in prediction and variable selection.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Understanding the dynamic impact of COVID-19 through competing risk modeling with bivariate varying coefficients
Authors:
Wenbo Wu,
John D. Kalbfleisch,
Jeremy M. G. Taylor,
Jian Kang,
Kevin He
Abstract:
The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischar…
▽ More
The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischarge time and time since the onset of the pandemic. However, the complex dynamics of the COVID-19 effect trajectories cannot be characterized by existing varying coefficient models. To address this issue, we propose a bivariate varying coefficient model for competing risks within a cause-specific hazard framework, where tensor-product B-splines are used to estimate the surface of the COVID-19 effect. An efficient proximal Newton algorithm is developed to facilitate the fitting of the new model to the massive Medicare data for dialysis patients. Difference-based anisotropic penalization is introduced to mitigate model overfitting and the wiggliness of the estimated trajectories; various cross-validation methods are considered in the determination of optimal tuning parameters. Hypothesis testing procedures are designed to examine whether the COVID-19 effect varies significantly with postdischarge time and the time since pandemic onset, either jointly or separately. Simulation experiments are conducted to evaluate the estimation accuracy, type I error rate, statistical power, and model selection procedures. Applications to Medicare dialysis patients demonstrate the real-world performance of the proposed methods.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
-
Composite Scores for Transplant Center Evaluation: A New Individualized Empirical Null Method
Authors:
Nicholas Hartman,
Joseph M. Messana,
Jian Kang,
Abhijit S. Naik,
Tempie H. Shearon,
Kevin He
Abstract:
Risk-adjusted quality measures are used to evaluate healthcare providers while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the risk adjustment is perfect and the between-provider variation in quality measures is entirely due to the quality of care. However, in practice, even with very good models for risk adjustment, some be…
▽ More
Risk-adjusted quality measures are used to evaluate healthcare providers while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the risk adjustment is perfect and the between-provider variation in quality measures is entirely due to the quality of care. However, in practice, even with very good models for risk adjustment, some between-provider variation will be due to incomplete risk adjustment, which should be recognized in assessing and monitoring providers. Otherwise, conventional methods disproportionately identify larger providers as outliers, even though their provider effects need not be "extreme.'' Motivated by efforts to evaluate the quality of care provided by transplant centers, we develop a composite evaluation score based on a novel individualized empirical null method, which robustly accounts for overdispersion due to unobserved risk factors, models the marginal variance of standardized scores as a function of the effective center size, and only requires the use of publicly-available center-level statistics. The evaluations of United States kidney transplant centers based on the proposed composite score are substantially different from those based on conventional methods. Simulations show that the proposed empirical null approach more accurately classifies centers in terms of quality of care, compared to existing methods.
△ Less
Submitted 23 July, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Density Regression and Uncertainty Quantification with Bayesian Deep Noise Neural Networks
Authors:
Daiwei Zhang,
Tianci Liu,
Jian Kang
Abstract:
Deep neural network (DNN) models have achieved state-of-the-art predictive accuracy in a wide range of supervised learning applications. However, accurately quantifying the uncertainty in DNN predictions remains a challenging task. For continuous outcome variables, an even more difficult problem is to estimate the predictive density function, which not only provides a natural quantification of the…
▽ More
Deep neural network (DNN) models have achieved state-of-the-art predictive accuracy in a wide range of supervised learning applications. However, accurately quantifying the uncertainty in DNN predictions remains a challenging task. For continuous outcome variables, an even more difficult problem is to estimate the predictive density function, which not only provides a natural quantification of the predictive uncertainty, but also fully captures the random variation in the outcome. In this work, we propose the Bayesian Deep Noise Neural Network (B-DeepNoise), which generalizes standard Bayesian DNNs by extending the random noise variable from the output layer to all hidden layers. The latent random noise equips B-DeepNoise with the flexibility to approximate highly complex predictive distributions and accurately quantify predictive uncertainty. For posterior computation, the unique structure of B-DeepNoise leads to a closed-form Gibbs sampling algorithm that iteratively simulates from the posterior full conditional distributions of the model parameters, circumventing computationally intensive Metropolis-Hastings methods. A theoretical analysis of B-DeepNoise establishes a recursive representation of the predictive distribution and decomposes the predictive variance with respect to the latent parameters. We evaluate B-DeepNoise against existing methods on benchmark regression datasets, demonstrating its superior performance in terms of prediction accuracy, uncertainty quantification accuracy, and uncertainty quantification efficiency. To illustrate our method's usefulness in scientific studies, we apply B-DeepNoise to predict general intelligence from neuroimaging features in the Adolescent Brain Cognitive Development (ABCD) project.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
A Soft-Thresholding Operator for Sparse Time-Varying Effects in Survival Models
Authors:
Yuan Yang,
Jian Kang,
Yi Li
Abstract:
We consider a class of Cox models with time-dependent effects that may be zero over certain unknown time regions or, in short, sparse time-varying effects. The model is particularly useful for biomedical studies as it conveniently depicts the gradual evolution of effects of risk factors on survival. Statistically, estimating and drawing inference on infinite dimensional functional parameters with…
▽ More
We consider a class of Cox models with time-dependent effects that may be zero over certain unknown time regions or, in short, sparse time-varying effects. The model is particularly useful for biomedical studies as it conveniently depicts the gradual evolution of effects of risk factors on survival. Statistically, estimating and drawing inference on infinite dimensional functional parameters with sparsity (e.g., time-varying effects with zero-effect time intervals) present enormous challenges. To address them, we propose a new soft-thresholding operator for modeling sparse, piecewise smooth and continuous time-varying coefficients in a Cox time-varying effects model. Unlike the common regularized methods, our approach enables one to estimate non-zero time-varying effects and detect zero regions simultaneously, and construct a new type of sparse confidence intervals that accommodate zero regions. This leads to a more interpretable model with a straightforward inference procedure. We develop an efficient algorithm for inference in the target functional space, show that the proposed method enjoys desired theoretical properties, and present its finite sample performance by way of simulations. We apply the proposed method to analyze the data of the Boston Lung Cancer Survivor Cohort, an epidemiological cohort study investigating the impacts of risk factors on lung cancer survival, and obtain clinically useful results.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Individualized Risk Assessment of Preoperative Opioid Use by Interpretable Neural Network Regression
Authors:
Yuming Sun,
Jian Kang,
Chad Brummett,
Yi Li
Abstract:
Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk a…
▽ More
Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk assessment because of its superb prediction power; however, the blackbox algorithms may make the results less interpretable than statistical models. Bridging the gap between the statistical and machine learning fields, we propose a novel Interpretable Neural Network Regression (INNER), which combines the strengths of statistical and DNN models. We use the proposed INNER to conduct individualized risk assessment of preoperative opioid use. Intensive simulations and an analysis of 34,186 patients expecting surgery in the Analgesic Outcomes Study (AOS) show that the proposed INNER not only can accurately predict the preoperative opioid use using preoperative characteristics as DNN, but also can estimate the patient specific odds of opioid use without pain and the odds ratio of opioid use for a unit increase in the reported overall body pain, leading to more straightforward interpretations of the tendency to use opioids than DNN. Our results identify the patient characteristics that are strongly associated with opioid use and is largely consistent with the previous findings, providing evidence that INNER is a useful tool for individualized risk assessment of preoperative opioid use.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Bayesian learning of COVID-19 Vaccine safety while incorporating Adverse Events ontology
Authors:
Bangyao Zhao,
Yuan Zhong,
Jian Kang,
Lili Zhao
Abstract:
While vaccines are crucial to end the COVID-19 pandemic, public confidence in vaccine safety has always been vulnerable. Many statistical methods have been applied to VAERS (Vaccine Adverse Event Reporting System) database to study the safety of COVID-19 vaccines. However, all these methods ignored the adverse event (AE) ontology. AEs are naturally related; for example, events of retching, dysphag…
▽ More
While vaccines are crucial to end the COVID-19 pandemic, public confidence in vaccine safety has always been vulnerable. Many statistical methods have been applied to VAERS (Vaccine Adverse Event Reporting System) database to study the safety of COVID-19 vaccines. However, all these methods ignored the adverse event (AE) ontology. AEs are naturally related; for example, events of retching, dysphagia, and reflux are all related to an abnormal digestive system. Explicitly bringing AE relationships into the model can aid in the detection of true AE signals amid the noise while reducing false positives. We propose a Bayesian graphical model to estimate all AEs while incorporating the AE ontology simultaneously. We proposed strategies to construct conjugate forms leading to an efficient Gibbs sampler. Built upon the posterior distributions, we proposed a negative control approach to mitigate reporting bias and an enrichment approach to detect AE groups of concern. The proposed methods were evaluated using simulation studies and were further illustrated on studying the safety of COVID-19 vaccines. The proposed methods were implemented in R package \textit{BGrass} and source code are available at https://github.com/BangyaoZhao/BGrass.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Learning Multi-Objective Curricula for Robotic Policy Learning
Authors:
Jikun Kang,
Miao Liu,
Abhinav Gupta,
Chris Pal,
Xue Liu,
Jie Fu
Abstract:
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward sha**, environment…
▽ More
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward sha**, environment generation, or initial state generation. However, prior work only considers curriculum learning following one of the aforementioned predefined paradigms. It is unclear which of these paradigms are complementary, and how the combination of them can be learned from interactions with the environment. Therefore, in this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula that are generated by a set of parametric curriculum modules. Each curriculum module is instantiated as a neural network and is responsible for generating a particular curriculum. In order to coordinate those potentially conflicting modules in unified parameter space, we propose a multi-task hyper-net learning framework that uses a single hyper-net to parameterize all those curriculum modules. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum, which may otherwise be difficult to design manually. We evaluate our method on a series of robotic manipulation tasks and demonstrate its superiority over other state-of-the-art ACL methods in terms of sample efficiency and final performance.
△ Less
Submitted 19 October, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis
Authors:
Bingyuan Liu,
Qi Zhang,
Lingzhou Xue,
Peter X. K. Song,
Jian Kang
Abstract:
It is of importance to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible outliers in real-world applications such as imaging data analyses. We propose a new robust high-dimensional regression with coefficient thresholding, in which an efficient nonconvex estimation procedure is proposed through a thresholding function and the ro…
▽ More
It is of importance to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible outliers in real-world applications such as imaging data analyses. We propose a new robust high-dimensional regression with coefficient thresholding, in which an efficient nonconvex estimation procedure is proposed through a thresholding function and the robust Huber loss. The proposed regularization method accounts for complex dependence structures in predictors and is robust against outliers in outcomes. Theoretically, we analyze rigorously the landscape of the population and empirical risk functions for the proposed method. The fine landscape enables us to establish both {statistical consistency and computational convergence} under the high-dimensional setting. The finite-sample properties of the proposed method are examined by extensive simulation studies. An illustration of real-world application concerns a scalar-on-image regression analysis for an association of psychiatric disorder measured by the general factor of psychopathology with features extracted from the task functional magnetic resonance imaging data in the Adolescent Brain Cognitive Development study.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
A gradient-based variable selection for binary classification in reproducing kernel Hilbert space
Authors:
Jongkyeong Kang,
Seung Jun Shin
Abstract:
Variable selection is essential in high-dimensional data analysis. Although various variable selection methods have been developed, most rely on the linear model assumption. This article proposes a nonparametric variable selection method for the large-margin classifier defined by reproducing the kernel Hilbert space (RKHS). we propose a gradient-based representation of the large-margin classifier…
▽ More
Variable selection is essential in high-dimensional data analysis. Although various variable selection methods have been developed, most rely on the linear model assumption. This article proposes a nonparametric variable selection method for the large-margin classifier defined by reproducing the kernel Hilbert space (RKHS). we propose a gradient-based representation of the large-margin classifier and then regularize the gradient functions by the group-lasso penalty to obtain sparse gradients that naturally lead to the variable selection. The groupwise-majorization-decent algorithm (GMD, Yang and Zou, 2015) is proposed to efficiently solve the proposed problem with a large number of parameters. We employ the strong sequential rule (Tibshirani et al., 2012) to facilitate the tuning procedure. The selection consistency of the proposed method is established by obtaining the risk bound of the estimated classifier and its gradient. Finally, we demonstrate the promising performance of the proposed method through simulations and real data illustration.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
InfoFair: Information-Theoretic Intersectional Fairness
Authors:
Jian Kang,
Tiankai Xie,
Xintao Wu,
Ross Maciejewski,
Hanghang Tong
Abstract:
Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race,…
▽ More
Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race, marital status, etc.) in the real-world is commonplace. As such, methods that can ensure a fair learning outcome with respect to all sensitive attributes of concern simultaneously need to be developed. In this paper, we study the problem of information-theoretic intersectional fairness (InfoFair), where statistical parity, a representative group fairness measure, is guaranteed among demographic groups formed by multiple sensitive attributes of interest. We formulate it as a mutual information minimization problem and propose a generic end-to-end algorithmic framework to solve it. The key idea is to leverage a variational representation of mutual information, which considers the variational distribution between learning outcomes and sensitive attributes, as well as the density ratio between the variational and the original distributions. Our proposed framework is generalizable to many different settings, including other statistical notions of fairness, and could handle any type of learning task equipped with a gradient-based optimizer. Empirical evaluations in the fair classification task on three real-world datasets demonstrate that our proposed framework can effectively debias the classification results with minimal impact to the classification accuracy.
△ Less
Submitted 31 December, 2022; v1 submitted 23 May, 2021;
originally announced May 2021.
-
Deep Neural Networks Guided Ensemble Learning for Point Estimation
Authors:
Tianyu Zhan,
Haoda Fu,
Jian Kang
Abstract:
In modern statistics, interests shift from pursuing the uniformly minimum variance unbiased estimator to reducing mean squared error (MSE) or residual squared error. Shrinkage based estimation and regression methods offer better prediction accuracy and improved interpretation. However, the characterization of such optimal statistics in terms of minimizing MSE remains open and challenging in many p…
▽ More
In modern statistics, interests shift from pursuing the uniformly minimum variance unbiased estimator to reducing mean squared error (MSE) or residual squared error. Shrinkage based estimation and regression methods offer better prediction accuracy and improved interpretation. However, the characterization of such optimal statistics in terms of minimizing MSE remains open and challenging in many problems, for example estimating treatment effect in adaptive clinical trials with pre-planned modifications to design aspects based on accumulated data. From an alternative perspective, we propose a deep neural network based automatic method to construct an improved estimator from existing ones. Theoretical properties are studied to provide guidance on applicability of our estimator to seek potential improvement. Simulation studies demonstrate that the proposed method has considerable finite-sample efficiency gain as compared with several common estimators. In the Adaptive COVID-19 Treatment Trial (ACTT) as an important application, our ensemble estimator essentially contributes to a more ethical and efficient adaptive clinical trial with fewer patients enrolled. The proposed framework can be generally applied to various statistical problems, and can be served as a reference measure to guide statistical research.
△ Less
Submitted 2 October, 2023; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Bayesian Variational Federated Learning and Unlearning in Decentralized Networks
Authors:
**u Gong,
Osvaldo Simeone,
Joonhyuk Kang
Abstract:
Federated Bayesian learning offers a principled framework for the definition of collaborative training algorithms that are able to quantify epistemic uncertainty and to produce trustworthy decisions. Upon the completion of collaborative training, an agent may decide to exercise her legal "right to be forgotten", which calls for her contribution to the jointly trained model to be deleted and discar…
▽ More
Federated Bayesian learning offers a principled framework for the definition of collaborative training algorithms that are able to quantify epistemic uncertainty and to produce trustworthy decisions. Upon the completion of collaborative training, an agent may decide to exercise her legal "right to be forgotten", which calls for her contribution to the jointly trained model to be deleted and discarded. This paper studies federated learning and unlearning in a decentralized network within a Bayesian framework. It specifically develops federated variational inference (VI) solutions based on the decentralized solution of local free energy minimization problems within exponential-family models and on local gossip-driven communication. The proposed protocols are demonstrated to yield efficient unlearning mechanisms.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Bayesian Inference for Brain Activity from Functional Magnetic Resonance Imaging Collected at Two Spatial Resolutions
Authors:
Andrew S. Whiteman,
Andreas J. Bartsch,
Jian Kang,
Timothy D. Johnson
Abstract:
Neuroradiologists and neurosurgeons increasingly opt to use functional magnetic resonance imaging (fMRI) to map functionally relevant brain regions for noninvasive presurgical planning and intraoperative neuronavigation. This application requires a high degree of spatial accuracy, but the fMRI signal-to-noise ratio (SNR) decreases as spatial resolution increases. In practice, fMRI scans can be col…
▽ More
Neuroradiologists and neurosurgeons increasingly opt to use functional magnetic resonance imaging (fMRI) to map functionally relevant brain regions for noninvasive presurgical planning and intraoperative neuronavigation. This application requires a high degree of spatial accuracy, but the fMRI signal-to-noise ratio (SNR) decreases as spatial resolution increases. In practice, fMRI scans can be collected at multiple spatial resolutions, and it is of interest to make more accurate inference on brain activity by combining data with different resolutions. To this end, we develop a new Bayesian model to leverage both better anatomical precision in high resolution fMRI and higher SNR in standard resolution fMRI. We assign a Gaussian process prior to the mean intensity function and develop an efficient, scalable posterior computation algorithm to integrate both sources of data. We draw posterior samples using an algorithm analogous to Riemann manifold Hamiltonian Monte Carlo in an expanded parameter space. We illustrate our method in analysis of presurgical fMRI data, and show in simulation that it infers the mean intensity more accurately than alternatives that use either the high or standard resolution fMRI data alone.
△ Less
Submitted 6 June, 2023; v1 submitted 24 March, 2021;
originally announced March 2021.
-
Group Inverse-Gamma Gamma Shrinkage for Sparse Regression with Block-Correlated Predictors
Authors:
Jonathan Boss,
Jyotishka Datta,
Xin Wang,
Sung Kyun Park,
Jian Kang,
Bhramar Mukherjee
Abstract:
Heavy-tailed continuous shrinkage priors, such as the horseshoe prior, are widely used for sparse estimation problems. However, there is limited work extending these priors to predictors with grou** structures. Of particular interest in this article, is regression coefficient estimation where pockets of high collinearity in the covariate space are contained within known covariate grou**s. To a…
▽ More
Heavy-tailed continuous shrinkage priors, such as the horseshoe prior, are widely used for sparse estimation problems. However, there is limited work extending these priors to predictors with grou** structures. Of particular interest in this article, is regression coefficient estimation where pockets of high collinearity in the covariate space are contained within known covariate grou**s. To assuage variance inflation due to multicollinearity we propose the group inverse-gamma gamma (GIGG) prior, a heavy-tailed prior that can trade-off between local and group shrinkage in a data adaptive fashion. A special case of the GIGG prior is the group horseshoe prior, whose shrinkage profile is correlated within-group such that the regression coefficients marginally have exact horseshoe regularization. We show posterior consistency for regression coefficients in linear regression models and posterior concentration results for mean parameters in sparse normal means models. The full conditional distributions corresponding to GIGG regression can be derived in closed form, leading to straightforward posterior computation. We show that GIGG regression results in low mean-squared error across a wide range of correlation structures and within-group signal densities via simulation. We apply GIGG regression to data from the National Health and Nutrition Examination Survey for associating environmental exposures with liver functionality.
△ Less
Submitted 21 February, 2021;
originally announced February 2021.
-
RDIS: Random Drop Imputation with Self-Training for Incomplete Time Series Data
Authors:
Tae-Min Choi,
Ji-Su Kang,
Jong-Hwan Kim
Abstract:
Time-series data with missing values are commonly encountered in many fields, such as healthcare, meteorology, and robotics. The imputation aims to fill the missing values with valid values. Most imputation methods trained the models implicitly because missing values have no ground truth. In this paper, we propose Random Drop Imputation with Self-training (RDIS), a novel training method for time-s…
▽ More
Time-series data with missing values are commonly encountered in many fields, such as healthcare, meteorology, and robotics. The imputation aims to fill the missing values with valid values. Most imputation methods trained the models implicitly because missing values have no ground truth. In this paper, we propose Random Drop Imputation with Self-training (RDIS), a novel training method for time-series data imputation models. In RDIS, we generate extra missing values by applying a random drop on the observed values in incomplete data. We can explicitly train the imputation models by filling in the randomly dropped values. In addition, we adopt self-training with pseudo values to exploit the original missing values. To improve the quality of pseudo values, we set the threshold and filter them by calculating the entropy. To verify the effectiveness of RDIS on the time series imputation, we test RDIS to various imputation models and achieve competitive results on two real-world datasets.
△ Less
Submitted 25 January, 2023; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Bayesian Hierarchical Models for High-Dimensional Mediation Analysis with Coordinated Selection of Correlated Mediators
Authors:
Yanyi Song,
Xiang Zhou,
Jian Kang,
Max T. Aung,
Min Zhang,
Wei Zhao,
Belinda L. Needham,
Sharon L. R. Kardia,
Yongmei Liu,
John D. Meeker,
Jennifer A. Smith,
Bhramar Mukherjee
Abstract:
We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven…
▽ More
We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven by gene networks in genome data and correlated exposure data from the same source. When correlations are present among active mediators, mediation analysis that fails to account for such correlation can be sub-optimal and may lead to a loss of power in identifying active mediators. Building upon a recent high-dimensional mediation analysis framework, we propose two Bayesian hierarchical models, one with a Gaussian mixture prior that enables correlated mediator selection and the other with a Potts mixture prior that accounts for the correlation among active mediators in mediation analysis. We develop efficient sampling algorithms for both methods. Various simulations demonstrate that our methods enable effective identification of correlated active mediators, which could be missed by using existing methods that assume prior independence among active mediators. The proposed methods are applied to the LIFECODES birth cohort and the Multi-Ethnic Study of Atherosclerosis (MESA) and identified new active mediators with important biological implications.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error
Authors:
Xiang Lyu,
Jian Kang,
Lexin Li
Abstract:
High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the…
▽ More
High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the high-dimensional scenario. We develop inferential procedures for both the global and simultaneous testing of the transition matrix. We first develop a new sparse expectation-maximization algorithm to estimate the model parameters, and carefully characterize their estimation precisions. We then construct a Gaussian matrix, after proper bias and variance corrections, from which we derive the test statistics. Finally, we develop the testing procedures and establish their asymptotic guarantees. We study the finite-sample performance of our tests through intensive simulations, and illustrate with a brain connectivity analysis example.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Error estimate for a universal function approximator of ReLU network with a local connection
Authors:
Jae-Mo Kang,
Sunghwan Moon
Abstract:
Neural networks have shown high successful performance in a wide range of tasks, but further studies are needed to improve its performance. We analyze the approximation error of the specific neural network architecture with a local connection and higher application than one with the full connection because the local-connected network can be used to explain diverse neural networks such as CNNs. Our…
▽ More
Neural networks have shown high successful performance in a wide range of tasks, but further studies are needed to improve its performance. We analyze the approximation error of the specific neural network architecture with a local connection and higher application than one with the full connection because the local-connected network can be used to explain diverse neural networks such as CNNs. Our error estimate depends on two parameters: one controlling the depth of the hidden layer, and the other, the width of the hidden layers.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Deep Historical Borrowing Framework to Prospectively and Simultaneously Synthesize Control Information in Confirmatory Clinical Trials with Multiple Endpoints
Authors:
Tianyu Zhan,
Yiwang Zhou,
Ziqian Geng,
Yihua Gu,
Jian Kang,
Li Wang,
Xiaohong Huang,
Elizabeth H. Slate
Abstract:
In current clinical trial development, historical information is receiving more attention as it provides utility beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAP priors have been proposed for prospectively borrowing historical data on a single endpoint. To simultaneously synthesize control information from multiple endpoints in confirmatory clinical trials, we p…
▽ More
In current clinical trial development, historical information is receiving more attention as it provides utility beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAP priors have been proposed for prospectively borrowing historical data on a single endpoint. To simultaneously synthesize control information from multiple endpoints in confirmatory clinical trials, we propose to approximate posterior probabilities from a Bayesian hierarchical model and estimate critical values by deep learning to construct pre-specified strategies for hypothesis testing. This feature is important to ensure study integrity by establishing prospective decision functions before the trial conduct. Simulations are performed to show that our method properly controls family-wise error rate (FWER) and preserves power as compared with a typical practice of choosing constant critical values given a subset of null space. Satisfactory performance under prior-data conflict is also demonstrated. We further illustrate our method using a case study in Immunology.
△ Less
Submitted 1 August, 2022; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects
Authors:
Yanyi Song,
Xiang Zhou,
Jian Kang,
Max T. Aung,
Min Zhang,
Wei Zhao,
Belinda L. Needham,
Sharon L. R. Kardia,
Yongmei Liu,
John D. Meeker,
Jennifer A. Smith,
Bhramar Mukherjee
Abstract:
Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional me…
▽ More
Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Self-supervised Learning for Large-scale Item Recommendations
Authors:
Tiansheng Yao,
Xinyang Yi,
Derek Zhiyuan Cheng,
Felix Yu,
Ting Chen,
Aditya Menon,
Lichan Hong,
Ed H. Chi,
Steve Tjoa,
Jieqi Kang,
Evan Ettinger
Abstract:
Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the…
▽ More
Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the corpus, users tend to provide feedback for a very small set of them, causing a power-law distribution. This makes the feedback data for long-tail items extremely sparse.
Inspired by the recent success in self-supervised representation learning research in both computer vision and natural language understanding, we propose a multi-task self-supervised learning (SSL) framework for large-scale item recommendations. The framework is designed to tackle the label sparsity problem by learning better latent relationship of item features. Specifically, SSL improves item representation learning as well as serving as additional regularization to improve generalization. Furthermore, we propose a novel data augmentation method that utilizes feature correlations within the proposed framework.
We evaluate our framework using two real-world datasets with 500M and 1B training examples respectively. Our results demonstrate the effectiveness of SSL regularization and show its superior performance over the state-of-the-art regularization techniques. We also have already launched the proposed techniques to a web-scale commercial app-to-app recommendation system, with significant improvements top-tier business metrics demonstrated in A/B experiments on live traffic. Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.
△ Less
Submitted 24 February, 2021; v1 submitted 25 July, 2020;
originally announced July 2020.
-
Deep Anomaly Detection for Time-series Data in Industrial IoT: A Communication-Efficient On-device Federated Learning Approach
Authors:
Yi Liu,
Sahil Garg,
Jiangtian Nie,
Yang Zhang,
Zehui Xiong,
Jiawen Kang,
M. Shamim Hossain
Abstract:
Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies is becoming increasingly important. Furthermore, data collected by the edge device may contain the user's private data, which is challenging the current detection approaches as user privacy is calling for the public concern in recen…
▽ More
Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies is becoming increasingly important. Furthermore, data collected by the edge device may contain the user's private data, which is challenging the current detection approaches as user privacy is calling for the public concern in recent years. With this focus, this paper proposes a new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data in IIoT. Specifically, we first introduce a FL framework to enable decentralized edge devices to collaboratively train an anomaly detection model, which can improve its generalization ability. Second, we propose an Attention Mechanism-based Convolutional Neural Network-Long Short Term Memory (AMCNN-LSTM) model to accurately detect anomalies. The AMCNN-LSTM model uses attention mechanism-based CNN units to capture important fine-grained features, thereby preventing memory loss and gradient dispersion problems. Furthermore, this model retains the advantages of LSTM unit in predicting time series data. Third, to adapt the proposed framework to the timeliness of industrial anomaly detection, we propose a gradient compression mechanism based on Top-\textit{k} selection to improve communication efficiency. Extensive experiment studies on four real-world datasets demonstrate that the proposed framework can accurately and timely detect anomalies and also reduce the communication overhead by 50\% compared to the federated learning framework that does not use a gradient compression scheme.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Image Response Regression via Deep Neural Networks
Authors:
Daiwei Zhang,
Lexin Li,
Chandra Sripada,
Jian Kang
Abstract:
Delineating the associations between images and a vector of covariates is of central interest in medical imaging studies. To tackle this problem of image response regression, we propose a novel nonparametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Compared to existing solutions, the propose…
▽ More
Delineating the associations between images and a vector of covariates is of central interest in medical imaging studies. To tackle this problem of image response regression, we propose a novel nonparametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Compared to existing solutions, the proposed method explicitly accounts for spatial smoothness and subject heterogeneity, has straightforward interpretations, and is highly flexible and accurate in capturing complex association patterns. A key idea in our approach is to treat the image voxels as the effective samples, which not only alleviates the limited sample size issue that haunts the majority of medical imaging studies, but also leads to more robust and reproducible results. Focusing on a broad family of piecewise smooth functions, we establish the estimation and selection consistency, and derive the asymptotic error bounds. We demonstrate the efficacy of the method through intensive simulations, and further illustrate its advantages with analyses of two functional magnetic resonance imaging datasets.
△ Less
Submitted 2 March, 2022; v1 submitted 17 June, 2020;
originally announced June 2020.
-
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers
Authors:
Jung-Woo Ha,
Kihyun Nam,
**gu Kang,
Sang-Woo Lee,
Sohee Yang,
Hyunhoon Jung,
Eunmi Kim,
Hyeji Kim,
Soo** Kim,
Hyun Ah Kim,
Kyoungtae Doh,
Chan Kyu Lee,
Nako Sung,
Sunghun Kim
Abstract:
Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. Despite the advancement of ASR, however, most publicly available call-based speech corpora such as Switchboard are old-fashioned. Also, most existing call corpora are in English and mainly focus on open domain dialog or general scenarios such as audiobooks. Here we in…
▽ More
Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. Despite the advancement of ASR, however, most publicly available call-based speech corpora such as Switchboard are old-fashioned. Also, most existing call corpora are in English and mainly focus on open domain dialog or general scenarios such as audiobooks. Here we introduce a new large-scale Korean call-based speech corpus under a goal-oriented dialog scenario from more than 11,000 people, i.e., ClovaCall corpus. ClovaCall includes approximately 60,000 pairs of a short sentence and its corresponding spoken utterance in a restaurant reservation domain. We validate the effectiveness of our dataset with intensive experiments using two standard ASR models. Furthermore, we release our ClovaCall dataset and baseline source codes to be available via https://github.com/ClovaAI/ClovaCall.
△ Less
Submitted 17 May, 2020; v1 submitted 20 April, 2020;
originally announced April 2020.