Skip to main content

Showing 1–50 of 85 results for author: Kang, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.04142  [pdf, other

    stat.ME

    Bayesian Structured Mediation Analysis With Unobserved Confounders

    Authors: Yuliang Xu, Shu Yang, Jian Kang

    Abstract: We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent individual effects, which influence the structured mediators, as unobserved confounders in the outcome model, thereby potentially debiasing the mediation effects.… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2406.16721  [pdf, other

    stat.AP

    Spatially Structured Regression for Non-conformable Spaces: Integrating Pathology Imaging and Genomics Data in Cancer

    Authors: Nathaniel Osher, Jian Kang, Arvind Rao, Veerabhadran Baladandayuthapani

    Abstract: The spatial composition and cellular heterogeneity of the tumor microenvironment plays a critical role in cancer development and progression. High-definition pathology imaging of tumor biopsies provide a high-resolution view of the spatial organization of different types of cells. This allows for systematic assessment of intra- and inter-patient spatial cellular interactions and heterogeneity by i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.15664  [pdf, other

    stat.ML cs.LG

    Flat Posterior Does Matter For Bayesian Transfer Learning

    Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, **ho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

    Abstract: The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2405.13353  [pdf, other

    stat.ME stat.ML

    Adaptive Bayesian Multivariate Spline Knot Inference with Prior Specifications on Model Complexity

    Authors: Junhui He, Ying Yang, Jian Kang

    Abstract: In multivariate spline regression, the number and locations of knots influence the performance and interpretability significantly. However, due to non-differentiability and varying dimensions, there is no desirable frequentist method to make inference on knots. In this article, we propose a fully Bayesian approach for knot inference in multivariate spline regression. The existing Bayesian method o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  5. arXiv:2405.13342  [pdf, other

    stat.ME math.ST

    Scalable Bayesian inference for heat kernel Gaussian processes on manifolds

    Authors: Junhui He, Guoxuan Ma, Jian Kang, Ying Yang

    Abstract: We develop scalable manifold learning methods and theory, motivated by the problem of estimating manifold of fMRI activation in the Human Connectome Project (HCP). We propose the Fast Graph Laplacian Estimation for Heat Kernel Gaussian Processes (FLGP) in the natural exponential family model. FLGP handles large sample sizes $ n $, preserves the intrinsic geometry of data, and significantly reduces… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  6. arXiv:2404.13204  [pdf, other

    stat.AP stat.CO

    Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

    Authors: Yuliang Xu, Timothy D. Johnson, Thomas E. Nichols, Jian Kang

    Abstract: Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain m… ▽ More

    Submitted 15 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  7. arXiv:2403.13628  [pdf, other

    stat.ME stat.AP

    Scalable Scalar-on-Image Cortical Surface Regression with a Relaxed-Thresholded Gaussian Process Prior

    Authors: Anna Menacher, Thomas E. Nichols, Timothy D. Johnson, Jian Kang

    Abstract: In addressing the challenge of analysing the large-scale Adolescent Brain Cognition Development (ABCD) fMRI dataset, involving over 5,000 subjects and extensive neuroimaging data, we propose a scalable Bayesian scalar-on-image regression model for computational feasibility and efficiency. Our model employs a relaxed-thresholded Gaussian process (RTGP), integrating piecewise-smooth, sparse, and con… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: For supplementary materials, see https://drive.google.com/file/d/1SNS0T6ptIGLfs67zYrZ9Bz0-DgzCIRgz/view?usp=sharing . For code, see https://github.com/annamenacher/RTGP

  8. arXiv:2401.07111  [pdf, other

    stat.AP stat.CO

    Bayesian Signal Matching for Transfer Learning in ERP-Based Brain Computer Interface

    Authors: Tianwen Ma, Jane E. Huggins, Jian Kang

    Abstract: An Event-Related Potential (ERP)-based Brain-Computer Interface (BCI) Speller System assists people with disabilities communicate by decoding electroencephalogram (EEG) signals. A P300-ERP embedded in EEG signals arises in response to a rare, but relevant event (target) among a series of irrelevant events (non-target). Different machine learning methods have constructed binary classifiers to detec… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 34 pages, 6 figures, 4 tables

  9. arXiv:2312.03274  [pdf, other

    stat.ME stat.ML

    Empirical Bayes Covariance Decomposition, and a solution to the Multiple Tuning Problem in Sparse PCA

    Authors: Joonsuk Kang, Matthew Stephens

    Abstract: Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introd… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  10. arXiv:2312.03257  [pdf, other

    stat.ME stat.AP

    Bayesian Functional Analysis for Untargeted Metabolomics Data with Matching Uncertainty and Small Sample Sizes

    Authors: Guoxuan Ma, Jian Kang, Tianwei Yu

    Abstract: Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application given its ability to depict the global metabolic pattern in biological samples. However, the data is noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while… ▽ More

    Submitted 23 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  11. arXiv:2311.05649  [pdf, other

    stat.AP stat.ME

    Bayesian Image-on-Image Regression via Deep Kernel Learning based Gaussian Processes

    Authors: Guoxuan Ma, Bangyao Zhao, Hasan Abu-Amara, Jian Kang

    Abstract: In neuroimaging studies, it becomes increasingly important to study associations between different imaging modalities using image-on-image regression (IIR), which faces challenges in interpretation, statistical inference, and prediction. Our motivating problem is how to predict task-evoked fMRI activity using resting-state fMRI data in the Human Connectome Project (HCP). The main difficulty lies i… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  12. arXiv:2310.18474  [pdf, other

    stat.ME

    Robust Bayesian Graphical Regression Models for Assessing Tumor Heterogeneity in Proteomic Networks

    Authors: Tsung-Hung Yao, Yang Ni, Anindya Bhadra, Jian Kang, Veerabhadran Baladandayuthapani

    Abstract: Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of the two canonical assumptions: (i) a homogeneous graph with a common network for all subjects; or (ii) an assumption of normality especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hol… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  13. arXiv:2310.16284  [pdf, other

    stat.ME math.ST stat.CO

    Bayesian Image Mediation Analysis

    Authors: Yuliang Xu, Jian Kang

    Abstract: Mediation analysis aims to separate the indirect effect through mediators from the direct effect of the exposure on the outcome. It is challenging to perform mediation analysis with neuroimaging data which involves high dimensionality, complex spatial correlations, sparse activation patterns and relatively low signal-to-noise ratio. To address these issues, we develop a new spatially varying coeff… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  14. arXiv:2310.15653  [pdf, other

    cs.LG cs.SI stat.ML

    Deceptive Fairness Attacks on Graphs via Meta Learning

    Authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong

    Abstract: We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as wel… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 23 pages, 11 tables

  15. arXiv:2310.11479  [pdf, other

    cs.LG stat.ML

    On the Temperature of Bayesian Graph Neural Networks for Conformal Prediction

    Authors: Seohyeon Cha, Honggu Kang, Joonhyuk Kang

    Abstract: Accurate uncertainty quantification in graph neural networks (GNNs) is essential, especially in high-stakes domains where GNNs are frequently employed. Conformal prediction (CP) offers a promising framework for quantifying uncertainty by providing $\textit{valid}$ prediction sets for any black-box model. CP ensures formal probabilistic guarantees that a prediction set contains a true label with a… ▽ More

    Submitted 3 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  16. arXiv:2307.08044  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Flexible Time-to-event Modeling: Optimizing Neural Networks via Rank Regression

    Authors: Hyunjun Lee, Junhyun Lee, Taehwa Choi, Jaewoo Kang, Sangbum Choi

    Abstract: Time-to-event analysis, also known as survival analysis, aims to predict the time of occurrence of an event, given a set of features. One of the major challenges in this area is dealing with censored data, which can make learning algorithms more complex. Traditional methods such as Cox's proportional hazards model and the accelerated failure time (AFT) model have been popular in this field, but th… ▽ More

    Submitted 22 July, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted at ECAI 2023

  17. arXiv:2307.00224  [pdf, other

    stat.ME

    Flexible Bayesian Modeling for Longitudinal Binary and Ordinal Responses

    Authors: Jizhou Kang, Athanasios Kottas

    Abstract: Longitudinal studies with binary or ordinal responses are widely encountered in various disciplines, where the primary focus is on the temporal evolution of the probability of each response category. Traditional approaches build from the generalized mixed effects modeling framework. Even amplified with nonparametric priors placed on the fixed or random effects, such models are restrictive due to t… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: 46 pages, 13 figures

  18. arXiv:2307.00129  [pdf, other

    stat.ME

    Latent Subgroup Identification in Image-on-scalar Regression

    Authors: Zikai Lin, Yajuan Si, Jian Kang

    Abstract: Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, e.g., the Adolescent Brain Cognitive Development (ABCD) study. The ABCD data can inform our understandin… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  19. arXiv:2306.17347  [pdf, other

    stat.ME

    Mediation with External Summary Statistic Information (MESSI)

    Authors: Jonathan Boss, Wei Hao, Amber Cathey, Barrett M. Welch, Kelly K. Ferguson, John D. Meeker, Jian Kang, Bhramar Mukherjee

    Abstract: Environmental health studies are increasingly measuring endogenous omics data ($\boldsymbol{M}$) to study intermediary biological pathways by which an exogenous exposure ($\boldsymbol{A}$) affects a health outcome ($\boldsymbol{Y}$), given confounders ($\boldsymbol{C}$). Mediation analysis is frequently carried out to understand such mechanisms. If intermediary pathways are of interest, then there… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 32 pages, 6 figures

  20. arXiv:2306.03663  [pdf, other

    stat.ME

    Bayesian inference for group-level cortical surface image-on-scalar-regression with Gaussian process priors

    Authors: Andrew S. Whiteman, Timothy D. Johnson, Jian Kang

    Abstract: In regression-based analyses of group-level neuroimage data researchers typically fit a series of marginal general linear models to image outcomes at each spatially-referenced pixel. Spatial regularization of effects of interest is usually induced indirectly by applying spatial smoothing to the data during preprocessing. While this procedure often works well, resulting inference can be poorly cali… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  21. arXiv:2305.11908  [pdf, other

    cs.HC cs.LG q-bio.NC stat.ML

    Sequential Best-Arm Identification with Application to Brain-Computer Interface

    Authors: Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li

    Abstract: A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system. It allows individuals to interact with the device using only their thoughts, and holds immense potential for a wide range of applications in medicine, rehabilitation, and human augmentation. An electroencephalogram (EEG) and event-related potential (ERP)-b… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  22. arXiv:2304.07401  [pdf, other

    stat.AP stat.ML

    Bayesian Inference on Brain-Computer Interfaces via GLASS

    Authors: Bangyao Zhao, Jane E. Huggins, Jian Kang

    Abstract: Brain-computer interfaces (BCIs), particularly the P300 BCI, facilitate direct communication between the brain and computers. The fundamental statistical problem in P300 BCIs lies in classifying target and non-target stimuli based on electroencephalogram (EEG) signals. However, the low signal-to-noise ratio (SNR) and complex spatial/temporal correlations of EEG signals present challenges in modeli… ▽ More

    Submitted 14 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: 32 pages, 5 figures

  23. arXiv:2303.05341  [pdf, other

    stat.ML cs.LG eess.IV

    Penalized Deep Partially Linear Cox Models with Application to CT Scans of Lung Cancer Patients

    Authors: Yuming Sun, Jian Kang, Chinmay Haridas, Nicholas R. Mayne, Alexandra L. Potter, Chi-Fu Jeffrey Yang, David C. Christiani, Yi Li

    Abstract: Lung cancer is a leading cause of cancer mortality globally, highlighting the importance of understanding its mortality risks to design effective patient-centered therapies. The National Lung Screening Trial (NLST) employed computed tomography texture analysis, which provides objective measurements of texture patterns on CT scans, to quantify the mortality risks of lung cancer patients. Partially… ▽ More

    Submitted 29 September, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  24. arXiv:2303.03582  [pdf, other

    stat.ME math.ST stat.AP

    Statistical inferences for complex dependence of multimodal imaging data

    Authors: **yuan Chang, **g He, Jian Kang, Mingcong Wu

    Abstract: Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HC… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  25. arXiv:2211.04034  [pdf, other

    stat.ME

    Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

    Authors: Jizhou Kang, Athanasios Kottas

    Abstract: We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors placed directly on the discrete distribution of the ordinal responses. The prior probability models are built from a structured mixture of multinomial distributions. We leverage a continuation-ratio logits representation to formulate the mixture kernel, with mixture weights defined through the logit stick-b… ▽ More

    Submitted 22 March, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 72 pages, 26 figures

  26. arXiv:2210.06025  [pdf, other

    stat.ME math.ST

    Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment

    Authors: Qinmengge Li, Matthew T. Patrick, Haihan Zhang, Chachrit Khunsriraksakul, Philip E. Stuart, Johann E. Gudjonsson, Rajan Nair, James T. Elder, Dajiang J. Liu, Jian Kang, Lam C. Tsoi, Kevin He

    Abstract: Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Cau… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 35 pages, 6 figures

  27. arXiv:2209.00181  [pdf, other

    stat.ME stat.AP

    Understanding the dynamic impact of COVID-19 through competing risk modeling with bivariate varying coefficients

    Authors: Wenbo Wu, John D. Kalbfleisch, Jeremy M. G. Taylor, Jian Kang, Kevin He

    Abstract: The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischar… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: 40 pages, 8 figures, 1 table

  28. arXiv:2207.07602  [pdf, other

    stat.AP

    Composite Scores for Transplant Center Evaluation: A New Individualized Empirical Null Method

    Authors: Nicholas Hartman, Joseph M. Messana, Jian Kang, Abhijit S. Naik, Tempie H. Shearon, Kevin He

    Abstract: Risk-adjusted quality measures are used to evaluate healthcare providers while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the risk adjustment is perfect and the between-provider variation in quality measures is entirely due to the quality of care. However, in practice, even with very good models for risk adjustment, some be… ▽ More

    Submitted 23 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

  29. arXiv:2206.05643  [pdf, other

    cs.LG stat.ML

    Density Regression and Uncertainty Quantification with Bayesian Deep Noise Neural Networks

    Authors: Daiwei Zhang, Tianci Liu, Jian Kang

    Abstract: Deep neural network (DNN) models have achieved state-of-the-art predictive accuracy in a wide range of supervised learning applications. However, accurately quantifying the uncertainty in DNN predictions remains a challenging task. For continuous outcome variables, an even more difficult problem is to estimate the predictive density function, which not only provides a natural quantification of the… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

  30. arXiv:2206.00102  [pdf, other

    stat.ME

    A Soft-Thresholding Operator for Sparse Time-Varying Effects in Survival Models

    Authors: Yuan Yang, Jian Kang, Yi Li

    Abstract: We consider a class of Cox models with time-dependent effects that may be zero over certain unknown time regions or, in short, sparse time-varying effects. The model is particularly useful for biomedical studies as it conveniently depicts the gradual evolution of effects of risk factors on survival. Statistically, estimating and drawing inference on infinite dimensional functional parameters with… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  31. arXiv:2205.08370  [pdf, other

    cs.LG stat.AP

    Individualized Risk Assessment of Preoperative Opioid Use by Interpretable Neural Network Regression

    Authors: Yuming Sun, Jian Kang, Chad Brummett, Yi Li

    Abstract: Preoperative opioid use has been reported to be associated with higher preoperative opioid demand, worse postoperative outcomes, and increased postoperative healthcare utilization and expenditures. Understanding the risk of preoperative opioid use helps establish patient-centered pain management. In the field of machine learning, deep neural network (DNN) has emerged as a powerful means for risk a… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: 14 pages, 6 tables and 2 figures in main text

  32. arXiv:2202.05370  [pdf, other

    stat.ME

    Bayesian learning of COVID-19 Vaccine safety while incorporating Adverse Events ontology

    Authors: Bangyao Zhao, Yuan Zhong, Jian Kang, Lili Zhao

    Abstract: While vaccines are crucial to end the COVID-19 pandemic, public confidence in vaccine safety has always been vulnerable. Many statistical methods have been applied to VAERS (Vaccine Adverse Event Reporting System) database to study the safety of COVID-19 vaccines. However, all these methods ignored the adverse event (AE) ontology. AEs are naturally related; for example, events of retching, dysphag… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 12 pages, 5 figures

  33. arXiv:2110.03032  [pdf, other

    cs.LG cs.AI cs.RO eess.SY stat.ML

    Learning Multi-Objective Curricula for Robotic Policy Learning

    Authors: Jikun Kang, Miao Liu, Abhinav Gupta, Chris Pal, Xue Liu, Jie Fu

    Abstract: Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward sha**, environment… ▽ More

    Submitted 19 October, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: CoRL 2022; Reinforcement Learning; Meta-Reinforcement Learning; Hyper-network

  34. arXiv:2109.14856  [pdf, other

    stat.ME math.ST stat.CO stat.ML

    Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis

    Authors: Bingyuan Liu, Qi Zhang, Lingzhou Xue, Peter X. K. Song, Jian Kang

    Abstract: It is of importance to develop statistical techniques to analyze high-dimensional data in the presence of both complex dependence and possible outliers in real-world applications such as imaging data analyses. We propose a new robust high-dimensional regression with coefficient thresholding, in which an efficient nonconvex estimation procedure is proposed through a thresholding function and the ro… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: 38 pages

  35. arXiv:2109.14282  [pdf, other

    stat.ML cs.LG

    A gradient-based variable selection for binary classification in reproducing kernel Hilbert space

    Authors: Jongkyeong Kang, Seung Jun Shin

    Abstract: Variable selection is essential in high-dimensional data analysis. Although various variable selection methods have been developed, most rely on the linear model assumption. This article proposes a nonparametric variable selection method for the large-margin classifier defined by reproducing the kernel Hilbert space (RKHS). we propose a gradient-based representation of the large-margin classifier… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: 22 pages, 2 figures

  36. arXiv:2105.11069  [pdf, other

    cs.LG cs.IT stat.ML

    InfoFair: Information-Theoretic Intersectional Fairness

    Authors: Jian Kang, Tiankai Xie, Xintao Wu, Ross Maciejewski, Hanghang Tong

    Abstract: Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race,… ▽ More

    Submitted 31 December, 2022; v1 submitted 23 May, 2021; originally announced May 2021.

    Comments: IEEE Big Data 2022

  37. arXiv:2105.06523  [pdf, other

    stat.ME

    Deep Neural Networks Guided Ensemble Learning for Point Estimation

    Authors: Tianyu Zhan, Haoda Fu, Jian Kang

    Abstract: In modern statistics, interests shift from pursuing the uniformly minimum variance unbiased estimator to reducing mean squared error (MSE) or residual squared error. Shrinkage based estimation and regression methods offer better prediction accuracy and improved interpretation. However, the characterization of such optimal statistics in terms of minimizing MSE remains open and challenging in many p… ▽ More

    Submitted 2 October, 2023; v1 submitted 13 May, 2021; originally announced May 2021.

  38. arXiv:2104.03834  [pdf, other

    cs.LG cs.DC stat.ML

    Bayesian Variational Federated Learning and Unlearning in Decentralized Networks

    Authors: **u Gong, Osvaldo Simeone, Joonhyuk Kang

    Abstract: Federated Bayesian learning offers a principled framework for the definition of collaborative training algorithms that are able to quantify epistemic uncertainty and to produce trustworthy decisions. Upon the completion of collaborative training, an agent may decide to exercise her legal "right to be forgotten", which calls for her contribution to the jointly trained model to be deleted and discar… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: Submitted for conference publication

  39. arXiv:2103.13131  [pdf, other

    stat.ME stat.AP

    Bayesian Inference for Brain Activity from Functional Magnetic Resonance Imaging Collected at Two Spatial Resolutions

    Authors: Andrew S. Whiteman, Andreas J. Bartsch, Jian Kang, Timothy D. Johnson

    Abstract: Neuroradiologists and neurosurgeons increasingly opt to use functional magnetic resonance imaging (fMRI) to map functionally relevant brain regions for noninvasive presurgical planning and intraoperative neuronavigation. This application requires a high degree of spatial accuracy, but the fMRI signal-to-noise ratio (SNR) decreases as spatial resolution increases. In practice, fMRI scans can be col… ▽ More

    Submitted 6 June, 2023; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 37 pages, 12 figures

    Journal ref: Ann. Appl. Stat. (2022) 16(4): 2626-2647

  40. Group Inverse-Gamma Gamma Shrinkage for Sparse Regression with Block-Correlated Predictors

    Authors: Jonathan Boss, Jyotishka Datta, Xin Wang, Sung Kyun Park, Jian Kang, Bhramar Mukherjee

    Abstract: Heavy-tailed continuous shrinkage priors, such as the horseshoe prior, are widely used for sparse estimation problems. However, there is limited work extending these priors to predictors with grou** structures. Of particular interest in this article, is regression coefficient estimation where pockets of high collinearity in the covariate space are contained within known covariate grou**s. To a… ▽ More

    Submitted 21 February, 2021; originally announced February 2021.

    Comments: 44 pages, 4 figures

  41. arXiv:2010.10075  [pdf, other

    cs.LG stat.ML

    RDIS: Random Drop Imputation with Self-Training for Incomplete Time Series Data

    Authors: Tae-Min Choi, Ji-Su Kang, Jong-Hwan Kim

    Abstract: Time-series data with missing values are commonly encountered in many fields, such as healthcare, meteorology, and robotics. The imputation aims to fill the missing values with valid values. Most imputation methods trained the models implicitly because missing values have no ground truth. In this paper, we propose Random Drop Imputation with Self-training (RDIS), a novel training method for time-s… ▽ More

    Submitted 25 January, 2023; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 8 pages, 3 figures, 2 tables

  42. arXiv:2009.11409  [pdf, other

    stat.AP

    Bayesian Hierarchical Models for High-Dimensional Mediation Analysis with Coordinated Selection of Correlated Mediators

    Authors: Yanyi Song, Xiang Zhou, Jian Kang, Max T. Aung, Min Zhang, Wei Zhao, Belinda L. Needham, Sharon L. R. Kardia, Yongmei Liu, John D. Meeker, Jennifer A. Smith, Bhramar Mukherjee

    Abstract: We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  43. arXiv:2009.08011  [pdf, other

    stat.ME stat.AP stat.CO

    Statistical Inference for High-Dimensional Vector Autoregression with Measurement Error

    Authors: Xiang Lyu, Jian Kang, Lexin Li

    Abstract: High-dimensional vector autoregression with measurement error is frequently encountered in a large variety of scientific and business applications. In this article, we study statistical inference of the transition matrix under this model. While there has been a large body of literature studying sparse estimation of the transition matrix, there is a paucity of inference solutions, especially in the… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

  44. arXiv:2009.01461  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Error estimate for a universal function approximator of ReLU network with a local connection

    Authors: Jae-Mo Kang, Sunghwan Moon

    Abstract: Neural networks have shown high successful performance in a wide range of tasks, but further studies are needed to improve its performance. We analyze the approximation error of the specific neural network architecture with a local connection and higher application than one with the full connection because the local-connected network can be used to explain diverse neural networks such as CNNs. Our… ▽ More

    Submitted 3 September, 2020; originally announced September 2020.

  45. Deep Historical Borrowing Framework to Prospectively and Simultaneously Synthesize Control Information in Confirmatory Clinical Trials with Multiple Endpoints

    Authors: Tianyu Zhan, Yiwang Zhou, Ziqian Geng, Yihua Gu, Jian Kang, Li Wang, Xiaohong Huang, Elizabeth H. Slate

    Abstract: In current clinical trial development, historical information is receiving more attention as it provides utility beyond sample size calculation. Meta-analytic-predictive (MAP) priors and robust MAP priors have been proposed for prospectively borrowing historical data on a single endpoint. To simultaneously synthesize control information from multiple endpoints in confirmatory clinical trials, we p… ▽ More

    Submitted 1 August, 2022; v1 submitted 28 August, 2020; originally announced August 2020.

  46. arXiv:2008.06366  [pdf, other

    stat.AP

    Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects

    Authors: Yanyi Song, Xiang Zhou, Jian Kang, Max T. Aung, Min Zhang, Wei Zhao, Belinda L. Needham, Sharon L. R. Kardia, Yongmei Liu, John D. Meeker, Jennifer A. Smith, Bhramar Mukherjee

    Abstract: Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional me… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  47. arXiv:2007.12865  [pdf, other

    cs.LG cs.IR stat.ML

    Self-supervised Learning for Large-scale Item Recommendations

    Authors: Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger

    Abstract: Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the… ▽ More

    Submitted 24 February, 2021; v1 submitted 25 July, 2020; originally announced July 2020.

  48. arXiv:2007.09712  [pdf, other

    cs.LG cs.DC stat.ML

    Deep Anomaly Detection for Time-series Data in Industrial IoT: A Communication-Efficient On-device Federated Learning Approach

    Authors: Yi Liu, Sahil Garg, Jiangtian Nie, Yang Zhang, Zehui Xiong, Jiawen Kang, M. Shamim Hossain

    Abstract: Since edge device failures (i.e., anomalies) seriously affect the production of industrial products in Industrial IoT (IIoT), accurately and timely detecting anomalies is becoming increasingly important. Furthermore, data collected by the edge device may contain the user's private data, which is challenging the current detection approaches as user privacy is calling for the public concern in recen… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: IEEE Internet of Things Journal

  49. arXiv:2006.09911  [pdf, other

    stat.ML cs.LG eess.IV

    Image Response Regression via Deep Neural Networks

    Authors: Daiwei Zhang, Lexin Li, Chandra Sripada, Jian Kang

    Abstract: Delineating the associations between images and a vector of covariates is of central interest in medical imaging studies. To tackle this problem of image response regression, we propose a novel nonparametric approach in the framework of spatially varying coefficient models, where the spatially varying functions are estimated through deep neural networks. Compared to existing solutions, the propose… ▽ More

    Submitted 2 March, 2022; v1 submitted 17 June, 2020; originally announced June 2020.

  50. arXiv:2004.09367  [pdf, other

    cs.LG cs.CL cs.SD stat.ML

    ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

    Authors: Jung-Woo Ha, Kihyun Nam, **gu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Eunmi Kim, Hyeji Kim, Soo** Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim

    Abstract: Automatic speech recognition (ASR) via call is essential for various applications, including AI for contact center (AICC) services. Despite the advancement of ASR, however, most publicly available call-based speech corpora such as Switchboard are old-fashioned. Also, most existing call corpora are in English and mainly focus on open domain dialog or general scenarios such as audiobooks. Here we in… ▽ More

    Submitted 17 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: 5 pages, 2 figures, 4 tables, The first two authors equally contributed to this work