Search | arXiv e-print repository

Statistical ranking with dynamic covariates

Authors: Pinjun Dong, Ruijian Han, Binyan Jiang, Yiming Xu

Abstract: We consider a covariate-assisted ranking model grounded in the Plackett--Luce framework. Unlike existing works focusing on pure covariates or individual effects with fixed covariates, our approach integrates individual effects with dynamic covariates. This added flexibility enhances realistic ranking yet poses significant challenges for analyzing the associated estimation procedures. This paper ma… ▽ More We consider a covariate-assisted ranking model grounded in the Plackett--Luce framework. Unlike existing works focusing on pure covariates or individual effects with fixed covariates, our approach integrates individual effects with dynamic covariates. This added flexibility enhances realistic ranking yet poses significant challenges for analyzing the associated estimation procedures. This paper makes an initial attempt to address these challenges. We begin by discussing the sufficient and necessary condition for the model's identifiability. We then introduce an efficient alternating maximization algorithm to compute the maximum likelihood estimator (MLE). Under suitable assumptions on the topology of comparison graphs and dynamic covariates, we establish a quantitative uniform consistency result for the MLE with convergence rates characterized by the asymptotic graph connectivity. The proposed graph topology assumption holds for several popular random graph models under optimal leading-order sparsity conditions. A comprehensive numerical study is conducted to corroborate our theoretical findings and demonstrate the application of the proposed model to real-world datasets, including horse racing and tennis competitions. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 40 pages; 8 figures

arXiv:2406.08209 [pdf, other]

Forward-Euler time-discretization for Wasserstein gradient flows can be wrong

Authors: Yewei Xu, Qin Li

Abstract: In this note, we examine the forward-Euler discretization for simulating Wasserstein gradient flows. We provide two counter-examples showcasing the failure of this discretization even for a simple case where the energy functional is defined as the KL divergence against some nicely structured probability densities. A simple explanation of this failure is also discussed. In this note, we examine the forward-Euler discretization for simulating Wasserstein gradient flows. We provide two counter-examples showcasing the failure of this discretization even for a simple case where the energy functional is defined as the KL divergence against some nicely structured probability densities. A simple explanation of this failure is also discussed. △ Less

Submitted 12 June, 2024; originally announced June 2024.

MSC Class: 65M12

arXiv:2406.05193 [pdf, ps, other]

Probabilistic Clustering using Shared Latent Variable Model for Assessing Alzheimers Disease Biomarkers

Authors: Yizhen Xu, Scott Zeger, Zheyu Wang

Abstract: The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease sta… ▽ More The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease state and the considerable variability in both biomarkers and disease dynamics among individuals. Recent research hypothesized the existence of subgroups with distinct biomarker patterns due to co-morbidities and degrees of brain resilience. Our ability to early diagnose and intervene during the preclinical stage of neurodegenerative diseases will be enhanced by further insights into heterogeneity in the biomarker-disease relationship. In this paper, we focus on Alzheimer's disease (AD) and attempt to identify the systematic patterns within the heterogeneous AD biomarker-disease cascade. Specifically, we quantify the disease progression using a dynamic latent variable whose mixture distribution represents patient subgroups. Model estimation uses Hamiltonian Monte Carlo with the number of clusters determined by the Bayesian Information Criterion (BIC). We report simulation studies that investigate the performance of the proposed model in finite sample settings that are similar to our motivating application. We apply the proposed model to the BIOCARD data, a longitudinal study that was conducted over two decades among individuals who were initially cognitively normal. Our application yields evidence consistent with the hypothetical model of biomarker dynamics presented in Jack et al. (2013). In addition, our analysis identified two subgroups with distinct disease-onset patterns. Finally, we develop a dynamic prediction approach to improve the precision of prognoses. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03628 [pdf, other]

Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance

Authors: Ryumei Nakada, Yichen Xu, Lexin Li, Linjun Zhang

Abstract: Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversamplin… ▽ More Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversampling approach that leverages the capabilities of large language models (LLMs) to generate high-quality synthetic data for minority groups. Recent studies on synthetic data generation using deep generative models mostly target prediction tasks. Our proposal differs in that we focus on handling imbalanced data and spurious correlations. More importantly, we develop a novel theory that rigorously characterizes the benefits of using the synthetic data, and shows the capacity of transformers in generating high-quality synthetic data for both labels and covariates. We further conduct intensive numerical experiments to demonstrate the efficacy of our proposed approach compared to some representative alternative solutions. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 59 pages, 7 figures

arXiv:2406.00827 [pdf, other]

LaLonde (1986) after Nearly Four Decades: Lessons Learned

Authors: Guido Imbens, Yiqing Xu

Abstract: In 1986, Robert LaLonde published an article that compared nonexperimental estimates to experimental benchmarks (LaLonde 1986). He concluded that the nonexperimental methods at the time could not systematically replicate experimental benchmarks, casting doubt on the credibility of these methods. Following LaLonde's critical assessment, there have been significant methodological advances and practi… ▽ More In 1986, Robert LaLonde published an article that compared nonexperimental estimates to experimental benchmarks (LaLonde 1986). He concluded that the nonexperimental methods at the time could not systematically replicate experimental benchmarks, casting doubt on the credibility of these methods. Following LaLonde's critical assessment, there have been significant methodological advances and practical changes, including (i) an emphasis on estimators based on unconfoundedness, (ii) a focus on the importance of overlap in covariate distributions, (iii) the introduction of propensity score-based methods leading to doubly robust estimators, (iv) a greater emphasis on validation exercises to bolster research credibility, and (v) methods for estimating and exploiting treatment effect heterogeneity. To demonstrate the practical lessons from these advances, we reexamine the LaLonde data and the Imbens-Rubin-Sacerdote lottery data. We show that modern methods, when applied in contexts with sufficient covariate overlap, yield robust estimates for the adjusted differences between the treatment and control groups. However, this does not mean that these estimates are valid. To assess their credibility, validation exercises (such as placebo tests) are essential, whereas goodness of fit tests alone are inadequate. Our findings highlight the importance of closely examining the assignment process, carefully inspecting overlap, and conducting validation exercises when analyzing causal effects with nonexperimental data. △ Less

Submitted 8 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20451 [pdf, other]

Statistical Properties of Robust Satisficing

Authors: Zhiyi Li, Yunbei Xu, Ruohan Zhan

Abstract: The Robust Satisficing (RS) model is an emerging approach to robust optimization, offering streamlined procedures and robust generalization across various applications. However, the statistical theory of RS remains unexplored in the literature. This paper fills in the gap by comprehensively analyzing the theoretical properties of the RS model. Notably, the RS structure offers a more straightforwar… ▽ More The Robust Satisficing (RS) model is an emerging approach to robust optimization, offering streamlined procedures and robust generalization across various applications. However, the statistical theory of RS remains unexplored in the literature. This paper fills in the gap by comprehensively analyzing the theoretical properties of the RS model. Notably, the RS structure offers a more straightforward path to deriving statistical guarantees compared to the seminal Distributionally Robust Optimization (DRO), resulting in a richer set of results. In particular, we establish two-sided confidence intervals for the optimal loss without the need to solve a minimax optimization problem explicitly. We further provide finite-sample generalization error bounds for the RS optimizer. Importantly, our results extend to scenarios involving distribution shifts, where discrepancies exist between the sampling and target distributions. Our numerical experiments show that the RS model consistently outperforms the baseline empirical risk minimization in small-sample regimes and under distribution shifts. Furthermore, compared to the DRO model, the RS model exhibits lower sensitivity to hyperparameter tuning, highlighting its practicability for robustness considerations. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17535 [pdf, other]

Calibrated Dataset Condensation for Faster Hyperparameter Search

Authors: Mucong Ding, Yuancheng Xu, Tahseen Rabbani, Xiaoyu Liu, Brian Gravelle, Teresa Ranadive, Tai-Ching Tuan, Furong Huang

Abstract: Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generali… ▽ More Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.12838 [pdf, ps, other]

Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits

Authors: Jiachen Hu, Tongyang Li, Xinzhao Wang, Yecheng Xue, Chenyi Zhang, Han Zhong

Abstract: We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve qua… ▽ More We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve quadratic speed-up over the number of classical samples needed to estimate the mean $μ$, where the samples come from different random variables with mean close to $μ$. Technically, our quantum algorithms reduce bounded and sub-Gaussian random variables to the Bernoulli case, and use an uncomputation trick to overcome the challenge that direct amplitude estimation does not work with non-identical query access. Our quantum query lower bounds are established by simulating non-identical oracles by parallel oracles, and also by an adversarial method with non-identical oracles. Both results pave the way for proving quantum query lower bounds with non-identical oracles in general, which may be of independent interest. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 31 pages, 0 figure. To appear in the 19th Theory of Quantum Computation, Communication and Cryptography (TQC 2024)

arXiv:2405.08886 [pdf, other]

The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Authors: Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

Abstract: In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness th… ▽ More In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR). △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: ICML2024

arXiv:2405.00417 [pdf, other]

Conformal Risk Control for Ordinal Classification

Authors: Yunpeng Xu, Wenge Guo, Zhi Wei

Abstract: As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification t… ▽ More As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification task in the conformal risk control framework, and provided theoretic risk bounds of the risk control method. Then we proposed two types of loss functions specially designed for ordinal classification tasks, and developed corresponding algorithms to determine the prediction set for each case to control their risks at a desired level. We demonstrated the effectiveness of our proposed methods, and analyzed the difference between the two types of risks on three different datasets, including a simulated dataset, the UTKFace dataset and the diabetic retinopathy detection dataset. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures, 2 table; 1 supplementary page

Journal ref: In UAI 2023: The 39th Conference on Uncertainty in Artificial Intelligence

arXiv:2404.17769 [pdf, other]

Conformal Ranked Retrieval

Authors: Yunpeng Xu, Wenge Guo, Zhi Wei

Abstract: Given the wide adoption of ranked retrieval techniques in various information systems that significantly impact our daily lives, there is an increasing need to assess and address the uncertainty inherent in their predictions. This paper introduces a novel method using the conformal risk control framework to quantitatively measure and manage risks in the context of ranked retrieval problems. Our re… ▽ More Given the wide adoption of ranked retrieval techniques in various information systems that significantly impact our daily lives, there is an increasing need to assess and address the uncertainty inherent in their predictions. This paper introduces a novel method using the conformal risk control framework to quantitatively measure and manage risks in the context of ranked retrieval problems. Our research focuses on a typical two-stage ranked retrieval problem, where the retrieval stage generates candidates for subsequent ranking. By carefully formulating the conformal risk for each stage, we have developed algorithms to effectively control these risks within their specified bounds. The efficacy of our proposed methods has been demonstrated through comprehensive experiments on three large-scale public datasets for ranked retrieval tasks, including the MSLR-WEB dataset, the Yahoo LTRC dataset and the MS MARCO dataset. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures, 1 table; 7 supplementary pages, 12 supplementary figures, 2 supplementary tables

arXiv:2404.13204 [pdf, other]

Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

Authors: Yuliang Xu, Timothy D. Johnson, Thomas E. Nichols, Jian Kang

Abstract: Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain m… ▽ More Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain masks that deviate from the common mask typically used in standard ISR approaches. To address these challenges, we introduce a novel Bayesian ISR model that is scalable and accommodates inconsistent brain masks across subjects in large-scale imaging studies. Our model leverages Gaussian process priors and integrates salience area indicators to facilitate ISR. We develop a cutting-edge scalable posterior computation algorithm that employs stochastic gradient Langevin dynamics coupled with memory map** techniques, ensuring that computation time scales linearly with subsample size and memory usage is constrained only by the batch size. Our approach uniquely enables direct spatial posterior inferences on brain activation regions. The efficacy of our method is demonstrated through simulations and analysis of the UK Biobank task fMRI data, encompassing 38,639 subjects and over 120,000 voxels per image, showing that it can achieve a speed increase of 4 to 11 times and enhance statistical power by 8% to 18% compared to traditional Gibbs sampling with zero-imputation in various simulation scenarios. △ Less

Submitted 15 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.03804 [pdf, other]

TransformerLSR: Attentive Joint Model of Longitudinal Data, Survival, and Recurrent Events with Concurrent Latent Structure

Authors: Zhiyue Zhang, Yao Zhao, Yanxun Xu

Abstract: In applications such as biomedical studies, epidemiology, and social sciences, recurrent events often co-occur with longitudinal measurements and a terminal event, such as death. Therefore, jointly modeling longitudinal measurements, recurrent events, and survival data while accounting for their dependencies is critical. While joint models for the three components exist in statistical literature,… ▽ More In applications such as biomedical studies, epidemiology, and social sciences, recurrent events often co-occur with longitudinal measurements and a terminal event, such as death. Therefore, jointly modeling longitudinal measurements, recurrent events, and survival data while accounting for their dependencies is critical. While joint models for the three components exist in statistical literature, many of these approaches are limited by heavy parametric assumptions and scalability issues. Recently, incorporating deep learning techniques into joint modeling has shown promising results. However, current methods only address joint modeling of longitudinal measurements at regularly-spaced observation times and survival events, neglecting recurrent events. In this paper, we develop TransformerLSR, a flexible transformer-based deep modeling and inference framework to jointly model all three components simultaneously. TransformerLSR integrates deep temporal point processes into the joint modeling framework, treating recurrent and terminal events as two competing processes dependent on past longitudinal measurements and recurrent event times. Additionally, TransformerLSR introduces a novel trajectory representation and model architecture to potentially incorporate a priori knowledge of known latent structures among concurrent longitudinal variables. We demonstrate the effectiveness and necessity of TransformerLSR through simulation studies and analyzing a real-world medical dataset on patients after kidney transplantation. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.01467 [pdf]

Transnational Network Dynamics of Problematic Information Diffusion

Authors: Esteban Villa-Turek, Rod Abhari, Erik C. Nisbet, Yu Xu, Ayse Deniz Lokmanoglu

Abstract: This study maps the spread of two cases of COVID-19 conspiracy theories and misinformation in Spanish and French in Latin American and French-speaking communities on Facebook, and thus contributes to understanding the dynamics, reach and consequences of emerging transnational misinformation networks. The findings show that co-sharing behavior of public Facebook groups created transnational network… ▽ More This study maps the spread of two cases of COVID-19 conspiracy theories and misinformation in Spanish and French in Latin American and French-speaking communities on Facebook, and thus contributes to understanding the dynamics, reach and consequences of emerging transnational misinformation networks. The findings show that co-sharing behavior of public Facebook groups created transnational networks by sharing videos of Medicos por la Verdad (MPV) conspiracy theories in Spanish and hydroxychloroquine-related misinformation sparked by microbiologist Didier Raoult (DR) in French, usually igniting the surge of locally led interest groups across the Global South. Using inferential methods, the study shows how these networks are enabled primarily by shared cultural and thematic attributes among Facebook groups, effectively creating very large, networked audiences. The study contributes to the understanding of how potentially harmful conspiracy theories and misinformation transcend national borders through non-English speaking online communities, further highlighting the overlooked role of transnationalism in global misinformation diffusion and the potentially disproportionate harm that it causes in vulnerable communities across the globe. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.01273 [pdf, other]

TWIN-GPT: Digital Twins for Clinical Trials via Large Language Model

Authors: Yue Wang, Tianfan Fu, Yinlong Xu, Zihan Ma, Hongxia Xu, Yingzhou Lu, Bang Du, Honghao Gao, Jian Wu

Abstract: Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significa… ▽ More Clinical trials are indispensable for medical research and the development of new treatments. However, clinical trials often involve thousands of participants and can span several years to complete, with a high probability of failure during the process. Recently, there has been a burgeoning interest in virtual clinical trials, which simulate real-world scenarios and hold the potential to significantly enhance patient safety, expedite development, reduce costs, and contribute to the broader scientific knowledge in healthcare. Existing research often focuses on leveraging electronic health records (EHRs) to support clinical trial outcome prediction. Yet, trained with limited clinical trial outcome data, existing approaches frequently struggle to perform accurate predictions. Some research has attempted to generate EHRs to augment model development but has fallen short in personalizing the generation for individual patient profiles. Recently, the emergence of large language models has illuminated new possibilities, as their embedded comprehensive clinical knowledge has proven beneficial in addressing medical issues. In this paper, we propose a large language model-based digital twin creation approach, called TWIN-GPT. TWIN-GPT can establish cross-dataset associations of medical information given limited data, generating unique personalized digital twins for different patients, thereby preserving individual patient characteristics. Comprehensive experiments show that using digital twins created by TWIN-GPT can boost the clinical trial outcome prediction, exceeding various previous prediction approaches. △ Less

Submitted 28 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.03353 [pdf, ps, other]

Hypothesis Spaces for Deep Learning

Authors: Rui Wang, Yuesheng Xu, Mingsong Yan

Abstract: This paper introduces a hypothesis space for deep learning that employs deep neural networks (DNNs). By treating a DNN as a function of two variables, the physical variable and parameter variable, we consider the primitive set of the DNNs for the parameter variable located in a set of the weight matrices and biases determined by a prescribed depth and widths of the DNNs. We then complete the linea… ▽ More This paper introduces a hypothesis space for deep learning that employs deep neural networks (DNNs). By treating a DNN as a function of two variables, the physical variable and parameter variable, we consider the primitive set of the DNNs for the parameter variable located in a set of the weight matrices and biases determined by a prescribed depth and widths of the DNNs. We then complete the linear span of the primitive DNN set in a weak* topology to construct a Banach space of functions of the physical variable. We prove that the Banach space so constructed is a reproducing kernel Banach space (RKBS) and construct its reproducing kernel. We investigate two learning models, regularized learning and minimum interpolation problem in the resulting RKBS, by establishing representer theorems for solutions of the learning models. The representer theorems unfold that solutions of these learning models can be expressed as linear combination of a finite number of kernel sessions determined by given data and the reproducing kernel. △ Less

Submitted 11 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.15248 [pdf, other]

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Authors: Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng

Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoreti… ▽ More Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: To appear in AISTATS2024

arXiv:2401.11380 [pdf, other]

MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning

Authors: Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu

Abstract: Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully l… ▽ More Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees of MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.08463 [pdf, other]

Statistical inference for pairwise comparison models

Authors: Ruijian Han, Wenlu Tang, Yiming Xu

Abstract: Pairwise comparison models have been widely used for utility evaluation and ranking across various fields. The increasing scale of problems today underscores the need to understand statistical inference in these models when the number of subjects diverges, a topic currently lacking in the literature except in a few special instances. To partially address this gap, this paper establishes a near-opt… ▽ More Pairwise comparison models have been widely used for utility evaluation and ranking across various fields. The increasing scale of problems today underscores the need to understand statistical inference in these models when the number of subjects diverges, a topic currently lacking in the literature except in a few special instances. To partially address this gap, this paper establishes a near-optimal asymptotic normality result for the maximum likelihood estimator in a broad class of pairwise comparison models, as well as a non-asymptotic convergence rate for each individual subject under comparison. The key idea lies in identifying the Fisher information matrix as a weighted graph Laplacian, which can be studied via a meticulous spectral analysis. Our findings provide a unified theory for performing statistical inference in a wide range of pairwise comparison models beyond the Bradley--Terry model, benefiting practitioners with theoretical guarantees for their use. Simulations utilizing synthetic data are conducted to validate the asymptotic normality result, followed by a hypothesis test using a tennis competition dataset. △ Less

Submitted 2 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 28 pages; include additional results

arXiv:2312.07271 [pdf]

Analyze the Robustness of Classifiers under Label Noise

Authors: Cheng Zeng, Yixuan Xu, Jiaqi Tian

Abstract: This study explores the robustness of label noise classifiers, aiming to enhance model resilience against noisy data in complex real-world scenarios. Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance. This research focuses on the increasingly pertinent issue of label noise's impact on practical applications. Addressing the p… ▽ More This study explores the robustness of label noise classifiers, aiming to enhance model resilience against noisy data in complex real-world scenarios. Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance. This research focuses on the increasingly pertinent issue of label noise's impact on practical applications. Addressing the prevalent challenge of inaccurate training data labels, we integrate adversarial machine learning (AML) and importance reweighting techniques. Our approach involves employing convolutional neural networks (CNN) as the foundational model, with an emphasis on parameter adjustment for individual training samples. This strategy is designed to heighten the model's focus on samples critically influencing performance. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 21 pages, 11 figures

arXiv:2312.03218 [pdf, other]

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Authors: Yuanshi Liu, Hanzhen Zhao, Yang Xu, Pengyun Yue, Cong Fang

Abstract: Gradient-based minimax optimal algorithms have greatly promoted the development of continuous optimization and machine learning. One seminal work due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/μ})$ gradient complexity for minimizing an $L$-smooth $μ$-strongly convex objective. However, an ideal algorithm would adapt to the explicit complexity of a particular objective func… ▽ More Gradient-based minimax optimal algorithms have greatly promoted the development of continuous optimization and machine learning. One seminal work due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/μ})$ gradient complexity for minimizing an $L$-smooth $μ$-strongly convex objective. However, an ideal algorithm would adapt to the explicit complexity of a particular objective function and incur faster rates for simpler problems, triggering our reconsideration of two defeats of existing optimization modeling and analysis. (i) The worst-case optimality is neither the instance optimality nor such one in reality. (ii) Traditional $L$-smoothness condition may not be the primary abstraction/characterization for modern practical problems. In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning, including linear regression and beyond. We introduce two factors $(α, τ_α)$ to refine the description of the degenerated condition of the optimization problems based on the observation that the singular values of Hessian often drop sharply. We design adaptive algorithms that solve simpler problems without pre-known knowledge with reduced gradient or analogous oracle accesses. The algorithms also improve the state-of-art complexities for several problems in machine learning, thereby solving the open problem of how to design faster algorithms in light of the known complexity lower bounds. Specially, with the $\mathcal{O}(1)$-nuclear norm bounded, we achieve an optimal $\tilde{\mathcal{O}}(μ^{-1/3})$ (v.s. $\tilde{\mathcal{O}}(μ^{-1/2})$) gradient complexity for linear regression. We hope this work could invoke the rethinking for understanding the difficulty of modern problems in optimization. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Optimization for Machine Learning

arXiv:2311.03382 [pdf, other]

Causal Structure Representation Learning of Confounders in Latent Space for Recommendation

Authors: Hangtong Xu, Yuanbo Xu, Yongjian Yang

Abstract: Inferring user preferences from the historical feedback of users is a valuable problem in recommender systems. Conventional approaches often rely on the assumption that user preferences in the feedback data are equivalent to the real user preferences without additional noise, which simplifies the problem modeling. However, there are various confounders during user-item interactions, such as weathe… ▽ More Inferring user preferences from the historical feedback of users is a valuable problem in recommender systems. Conventional approaches often rely on the assumption that user preferences in the feedback data are equivalent to the real user preferences without additional noise, which simplifies the problem modeling. However, there are various confounders during user-item interactions, such as weather and even the recommendation system itself. Therefore, neglecting the influence of confounders will result in inaccurate user preferences and suboptimal performance of the model. Furthermore, the unobservability of confounders poses a challenge in further addressing the problem. To address these issues, we refine the problem and propose a more rational solution. Specifically, we consider the influence of confounders, disentangle them from user preferences in the latent space, and employ causal graphs to model their interdependencies without specific labels. By cleverly combining local and global causal graphs, we capture the user-specificity of confounders on user preferences. We theoretically demonstrate the identifiability of the obtained causal graph. Finally, we propose our model based on Variational Autoencoders, named Causal Structure representation learning of Confounders in latent space (CSC). We conducted extensive experiments on one synthetic dataset and five real-world datasets, demonstrating the superiority of our model. Furthermore, we demonstrate that the learned causal representations of confounders are controllable, potentially offering users fine-grained control over the objectives of their recommendation lists with the learned causal graphs. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2311.03381 [pdf, other]

Separating and Learning Latent Confounders to Enhancing User Preferences Modeling

Authors: Hangtong Xu, Yuanbo Xu, Yongjian Yang

Abstract: Recommender models aim to capture user preferences from historical feedback and then predict user-specific feedback on candidate items. However, the presence of various unmeasured confounders causes deviations between the user preferences in the historical feedback and the true preferences, resulting in models not meeting their expected performance. Existing debias models either (1) specific to so… ▽ More Recommender models aim to capture user preferences from historical feedback and then predict user-specific feedback on candidate items. However, the presence of various unmeasured confounders causes deviations between the user preferences in the historical feedback and the true preferences, resulting in models not meeting their expected performance. Existing debias models either (1) specific to solving one particular bias or (2) directly obtain auxiliary information from user historical feedback, which cannot identify whether the learned preferences are true user preferences or mixed with unmeasured confounders. Moreover, we find that the former recommender system is not only a successor to unmeasured confounders but also acts as an unmeasured confounder affecting user preference modeling, which has always been neglected in previous studies. To this end, we incorporate the effect of the former recommender system and treat it as a proxy for all unmeasured confounders. We propose a novel framework, Separating and Learning Latent Confounders For Recommendation (SLFR), which obtains the representation of unmeasured confounders to identify the counterfactual feedback by disentangling user preferences and unmeasured confounders, then guides the target model to capture the true preferences of users. Extensive experiments in five real-world datasets validate the advantages of our method. △ Less

Submitted 2 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted by DASFAA 2024

arXiv:2310.16336 [pdf, other]

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

Authors: Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha

Abstract: Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's… ▽ More Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective that avoids the intractable computation. With such a learned score function, we can sample arrival time of events from the predictive distribution. This naturally allows for the quantification of uncertainty by computing confidence intervals over the generated samples. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time. In all the experiments, SMURF-THP outperforms existing likelihood-based methods in confidence calibration while exhibiting comparable prediction accuracy. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.16284 [pdf, other]

Bayesian Image Mediation Analysis

Authors: Yuliang Xu, Jian Kang

Abstract: Mediation analysis aims to separate the indirect effect through mediators from the direct effect of the exposure on the outcome. It is challenging to perform mediation analysis with neuroimaging data which involves high dimensionality, complex spatial correlations, sparse activation patterns and relatively low signal-to-noise ratio. To address these issues, we develop a new spatially varying coeff… ▽ More Mediation analysis aims to separate the indirect effect through mediators from the direct effect of the exposure on the outcome. It is challenging to perform mediation analysis with neuroimaging data which involves high dimensionality, complex spatial correlations, sparse activation patterns and relatively low signal-to-noise ratio. To address these issues, we develop a new spatially varying coefficient structural equation model for Bayesian Image Mediation Analysis (BIMA). We define spatially varying mediation effects within the potential outcome framework, employing the soft-thresholded Gaussian process prior for functional parameters. We establish the posterior consistency for spatially varying mediation effects along with selection consistency on important regions that contribute to the mediation effects. We develop an efficient posterior computation algorithm scalable to analysis of large-scale imaging data. Through extensive simulations, we show that BIMA can improve the estimation accuracy and computational efficiency for high-dimensional mediation analysis over the existing methods. We apply BIMA to analyze the behavioral and fMRI data in the Adolescent Brain Cognitive Development (ABCD) study with a focus on inferring the mediation effects of the parental education level on the children's general cognitive ability that are mediated through the working memory brain activities. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2309.15983 [pdf, other]

What To Do (and Not to Do) with Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study

Authors: Albert Chiu, Xingchen Lan, Ziyi Liu, Yiqing Xu

Abstract: Two-way fixed effects (TWFE) models are ubiquitous in causal panel analysis in political science. However, recent methodological discussions challenge their validity in the presence of heterogeneous treatment effects (HTE) and violations of the parallel trends assumption (PTA). This burgeoning literature has introduced multiple estimators and diagnostics, leading to confusion among empirical resea… ▽ More Two-way fixed effects (TWFE) models are ubiquitous in causal panel analysis in political science. However, recent methodological discussions challenge their validity in the presence of heterogeneous treatment effects (HTE) and violations of the parallel trends assumption (PTA). This burgeoning literature has introduced multiple estimators and diagnostics, leading to confusion among empirical researchers on two fronts: the reliability of existing results based on TWFE models and the current best practices. To address these concerns, we examined, replicated, and reanalyzed 37 articles from three leading political science journals that employed observational panel data with binary treatments. Using six newly introduced HTE-robust estimators, along with diagnostics tests and uncertainty measures that are robust to PTA violations, we find that only a small minority of studies are highly robust. Although HTE-robust estimates tend to be broadly consistent with TWFE estimates, discrepancies in point estimates, increased measures of uncertainty, and potential PTA violations call into question many results that were already on the margins of statistical significance. We offer recommendations for improving practice in empirical research based on these findings. △ Less

Submitted 14 June, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2308.08158 [pdf, other]

doi 10.1145/3583780.3614835

Deep Generative Imputation Model for Missing Not At Random Data

Authors: Jialei Chen, Yuanbo Xu, Pengyang Wang, Yongjian Yang

Abstract: Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the realistic scenario whereas more complex and challenging. Existing statistical methods model the MNAR mechanism by different decomposition of the joint distribution of t… ▽ More Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the realistic scenario whereas more complex and challenging. Existing statistical methods model the MNAR mechanism by different decomposition of the joint distribution of the complete data and the missing mask. But we empirically find that directly incorporating these statistical methods into deep generative models is sub-optimal. Specifically, it would neglect the confidence of the reconstructed mask during the MNAR imputation process, which leads to insufficient information extraction and less-guaranteed imputation quality. In this paper, we revisit the MNAR problem from a novel perspective that the complete data and missing mask are two modalities of incomplete data on an equal footing. Along with this line, we put forward a generative-model-specific joint probability decomposition method, conjunction model, to represent the distributions of two modalities in parallel and extract sufficient information from both complete data and missing mask. Taking a step further, we exploit a deep generative imputation model, namely GNR, to process the real-world missing mechanism in the latent space and concurrently impute the incomplete data and reconstruct the missing mask. The experimental results show that our GNR surpasses state-of-the-art MNAR baselines with significant margins (averagely improved from 9.9% to 18.8% in RMSE) and always gives a better mask reconstruction accuracy which makes the imputation more principle. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2307.07675 [pdf, other]

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

Authors: Yinglun Xu, Bhuvesh Kumar, Jacob Abernethy

Abstract: Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). Each of these challenges has been studied orthogonally in the literature; incentives have been addressed… ▽ More Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). Each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. Since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. In this work, we show that the most prominent contextual bandit algorithm, $ε$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. We further show that $ε$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.03821 [pdf, other]

Mediation Analysis with Graph Mediator

Authors: Yixi Xu, Yi Zhao

Abstract: This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph representation. Causal estimands and assumptions are discussed under this representation. With a covariance matrix as the mediator, parametric mediation models are imposed based on matrix decomposition. Assuming Gaussian random errors, likelihood-based estimator… ▽ More This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph representation. Causal estimands and assumptions are discussed under this representation. With a covariance matrix as the mediator, parametric mediation models are imposed based on matrix decomposition. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the decomposition and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.14878 [pdf, other]

Restart Sampling for Improving Generative Processes

Authors: Yilun Xu, Mingyang Deng, Xiang Cheng, Yonglong Tian, Ziming Liu, Tommi Jaakkola

Abstract: Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while… ▽ More Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while stochasticity in SDE contracts accumulated errors. Based on these findings, we propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction. The sampling method alternates between adding substantial noise in additional forward steps and strictly following a backward ODE. Empirically, Restart sampler surpasses previous SDE and ODE samplers in both speed and accuracy. Restart not only outperforms the previous best SDE results, but also accelerates the sampling speed by 10-fold / 2-fold on CIFAR-10 / ImageNet $64 \times 64$. In addition, it attains significantly better sample quality than ODE samplers within comparable sampling times. Moreover, Restart better balances text-image alignment/visual quality versus diversity than previous samplers in the large-scale text-to-image Stable Diffusion model pre-trained on LAION $512 \times 512$. Code is available at https://github.com/Newbeeer/diffusion_restart_sampling △ Less

Submitted 1 November, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Code is available at https://github.com/Newbeeer/diffusion_restart_sampling

arXiv:2306.02826 [pdf, ps, other]

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

Authors: Yecheng Xue, Xiaoyu Chen, Tongyang Li, Shaofeng H. -C. Jiang

Abstract: $k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3… ▽ More $k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality $n$, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity. Our coreset reduces the input size from $n$ to $\mathrm{poly}(kε^{-1}d)$, so that existing $α$-approximation algorithms for clustering can run on top of it and yield $(1 + ε)α$-approximation. This eventually yields a quadratic speedup for various $k$-clustering approximation algorithms. We complement our algorithm with a nearly matching lower bound, that any quantum algorithm must make $Ω(\sqrt{nk})$ queries in order to achieve even $O(1)$-approximation for $k$-clustering. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Comments: 32 pages, 0 figures, 1 table. To appear in the Fortieth International Conference on Machine Learning (ICML 2023)

arXiv:2306.02821 [pdf, other]

A unified analysis of likelihood-based estimators in the Plackett--Luce model

Authors: Ruijian Han, Yiming Xu

Abstract: The Plackett--Luce model is a popular approach for ranking data analysis, where a utility vector is employed to determine the probability of each outcome based on Luce's choice axiom. In this paper, we investigate the asymptotic theory of utility vector estimation by maximizing different types of likelihood, such as the full-, marginal-, and quasi-likelihood. We provide a rank-matching interpretat… ▽ More The Plackett--Luce model is a popular approach for ranking data analysis, where a utility vector is employed to determine the probability of each outcome based on Luce's choice axiom. In this paper, we investigate the asymptotic theory of utility vector estimation by maximizing different types of likelihood, such as the full-, marginal-, and quasi-likelihood. We provide a rank-matching interpretation for the estimating equations of these estimators and analyze their asymptotic behavior as the number of items being compared tends to infinity. In particular, we establish the uniform consistency of these estimators under conditions characterized by the topology of the underlying comparison graph sequence and demonstrate that the proposed conditions are sharp for common sampling scenarios such as the nonuniform random hypergraph model and the hypergraph stochastic block model; we also obtain the asymptotic normality of these estimators and discuss the trade-off between statistical efficiency and computational complexity for practical uncertainty quantification. Both results allow for nonuniform and inhomogeneous comparison graphs with varying edge sizes and different asymptotic orders of edge probabilities. We verify our theoretical findings by conducting detailed numerical experiments. △ Less

Submitted 20 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 42 pages, corrected typos, added the supplementary file containing all remaining proofs

arXiv:2306.01675 [pdf, other]

Bayesian Segmentation Modeling of Epidemic Growth

Authors: Tejasv Bedi, Yanxun Xu, Qiwei Li

Abstract: Tracking the spread of infectious disease during a pandemic has posed a great challenge to the governments and health sectors on a global scale. To facilitate informed public health decision-making, the concerned parties usually rely on short-term daily and weekly projections generated via predictive modeling. Several deterministic and stochastic epidemiological models, including growth and compar… ▽ More Tracking the spread of infectious disease during a pandemic has posed a great challenge to the governments and health sectors on a global scale. To facilitate informed public health decision-making, the concerned parties usually rely on short-term daily and weekly projections generated via predictive modeling. Several deterministic and stochastic epidemiological models, including growth and compartmental models, have been proposed in the literature. These models assume that an epidemic would last over a short duration and the observed cases/deaths would attain a single peak. However, some infectious diseases, such as COVID-19, extend over a longer duration than expected. Moreover, time-varying disease transmission rates due to government interventions have made the observed data multi-modal. To address these challenges, this work proposes stochastic epidemiological models under a unified Bayesian framework augmented by a change-point detection mechanism to account for multiple peaks. The Bayesian framework allows us to incorporate prior knowledge, such as dates of influential policy changes, to predict the change-point locations precisely. We develop a trans-dimensional reversible jump Markov chain Monte Carlo algorithm to sample the posterior distributions of epidemiological parameters while estimating the number of change points and the resulting parameters. The proposed method is evaluated and compared to alternative methods in terms of change-point detection, parameter estimation, and long-term forecasting accuracy on both simulated and COVID-19 data of several major states in the United States. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.17083 [pdf, other]

A Policy Gradient Method for Confounded POMDPs

Authors: Mao Hong, Zhengling Qi, Yanxun Xu

Abstract: In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of c… ▽ More In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting. △ Less

Submitted 30 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 95 pages, 3 figures

arXiv:2305.16536 [pdf, ps, other]

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

Authors: Yihao Xue, Siddharth Joshi, Eric Gan, Pin-Yu Chen, Baharan Mirzasoleiman

Abstract: Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder class-relevant features by focusing on learning easy class-irrelevant features; both significantly comprom… ▽ More Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder class-relevant features by focusing on learning easy class-irrelevant features; both significantly compromise representation quality. Yet, there is no theoretical understanding of \textit{class collapse} or \textit{feature suppression} at \textit{test} time. We provide the first unified theoretically rigorous framework to determine \textit{which} features are learnt by CL. Our analysis indicate that, perhaps surprisingly, bias of (stochastic) gradient descent towards finding simpler solutions is a key factor in collapsing subclass representations and suppressing harder class-relevant features. Moreover, we present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions to {feature suppression}. We also provide the first theoretical explanation for why employing supervised and unsupervised CL together yields higher-quality representations, even when using commonly-used stochastic gradient methods. △ Less

Submitted 28 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: to appear at ICML 2023

arXiv:2304.14549 [pdf, other]

Evaluating Racialized Economic Segregation in the Presence of Spatial Autocorrelation

Authors: Yang Xu, Loni Philip Tabb

Abstract: Research on residential segregation has been active since the 1950s and originated in a desire to quantify the level of racial/ethnic segregation in the United States. The Index of Concentration at the Extremes (ICE), an operationalization of racialized economic segregation that simultaneously captures spatial, racial, and income polarization, has been a popular topic in public health research, wi… ▽ More Research on residential segregation has been active since the 1950s and originated in a desire to quantify the level of racial/ethnic segregation in the United States. The Index of Concentration at the Extremes (ICE), an operationalization of racialized economic segregation that simultaneously captures spatial, racial, and income polarization, has been a popular topic in public health research, with a particular focus on social epidemiology. However, the construction of the ICE metric usually ignores the spatial autocorrelation that may be present in the data, and it is usually presented without indicating its degree of statistical and spatial uncertainty. To address these issues, we propose reformulating the ICE metric using Bayesian modeling methodologies. We use a simulation study to evaluate the performance of each method by considering various segregation scenarios. The application is based on racialized economic segregation in Georgia, and the proposed modeling approach will help determine whether racialized economic segregation has changed over two non-overlap** time points. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2303.13700 [pdf]

Bayesian Reconstruction of Magnetic Resonance Images using Gaussian Processes

Authors: Yihong Xu, Chad W. Farris, Stephan W. Anderson, Xin Zhang, Keith A. Brown

Abstract: A central goal of modern magnetic resonance imaging (MRI) is to reduce the time required to produce high-quality images. Efforts have included hardware and software innovations such as parallel imaging, compressed sensing, and deep learning-based reconstruction. Here, we propose and demonstrate a Bayesian method to build statistical libraries of magnetic resonance (MR) images in k-space and use th… ▽ More A central goal of modern magnetic resonance imaging (MRI) is to reduce the time required to produce high-quality images. Efforts have included hardware and software innovations such as parallel imaging, compressed sensing, and deep learning-based reconstruction. Here, we propose and demonstrate a Bayesian method to build statistical libraries of magnetic resonance (MR) images in k-space and use these libraries to identify optimal subsampling paths and reconstruction processes. Specifically, we compute a multivariate normal distribution based upon Gaussian processes using a publicly available library of T1-weighted images of healthy brains. We combine this library with physics-informed envelope functions to only retain meaningful correlations in k-space. This covariance function is then used to select a series of ring-shaped subsampling paths using Bayesian optimization such that they optimally explore space while remaining practically realizable in commercial MRI systems. Combining optimized subsampling paths found for a range of images, we compute a generalized sampling path that, when used for novel images, produces superlative structural similarity and error in comparison to previously reported reconstruction processes (i.e. 96.3% structural similarity and <0.003 normalized mean squared error from sampling only 12.5% of the k-space data). Finally, we use this reconstruction process on pathological data without retraining to show that reconstructed images are clinically useful for stroke identification. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.11399 [pdf, other]

How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on Over 60 Replicated Studies

Authors: Apoorva Lal, Mac Lockhart, Yiqing Xu, Ziwen Zu

Abstract: Instrumental variable (IV) strategies are widely used in political science to establish causal relationships. However, the identifying assumptions required by an IV design are demanding, and it remains challenging for researchers to assess their validity. In this paper, we replicate 67 papers published in three top journals in political science during 2010-2022 and identify several troubling patte… ▽ More Instrumental variable (IV) strategies are widely used in political science to establish causal relationships. However, the identifying assumptions required by an IV design are demanding, and it remains challenging for researchers to assess their validity. In this paper, we replicate 67 papers published in three top journals in political science during 2010-2022 and identify several troubling patterns. First, researchers often overestimate the strength of their IVs due to non-i.i.d. errors, such as a clustering structure. Second, the most commonly used t-test for the two-stage-least-squares (2SLS) estimates often severely underestimates uncertainty. Using more robust inferential methods, we find that around 19-30% of the 2SLS estimates in our sample are underpowered. Third, in the majority of the replicated studies, the 2SLS estimates are much larger than the ordinary-least-squares estimates, and their ratio is negatively correlated with the strength of the IVs in studies where the IVs are not experimentally generated, suggesting potential violations of unconfoundedness or the exclusion restriction. To help researchers avoid these pitfalls, we provide a checklist for better practice. △ Less

Submitted 7 November, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: Forthcoming in Political Analysis. Appendix (supp.pdf) in archived zip

arXiv:2303.10908 [pdf, other]

A Two-stage Bayesian Model for Assessing the Geography of Racialized Economic Segregation and Premature Mortality Across US Counties

Authors: Yang Xu, Loni Philip Tabb

Abstract: Racialized economic segregation, a key metric that simultaneously accounts for spatial, social and income polarization, has been linked to adverse health outcomes, including morbidity and mortality; however, statistical methods for measuring the association between racialized economic segregation and health outcomes are not well-developed and are usually studied at the individual level. In this pa… ▽ More Racialized economic segregation, a key metric that simultaneously accounts for spatial, social and income polarization, has been linked to adverse health outcomes, including morbidity and mortality; however, statistical methods for measuring the association between racialized economic segregation and health outcomes are not well-developed and are usually studied at the individual level. In this paper we propose a two-stage Bayesian statistical framework that provides a broad, flexible approach to studying the spatially varying association between premature mortality and racialized economic segregation, while accounting for neighborhood-level latent health factors across US counties. We apply our method by using data from three sources: (1) the CDC WONDER, (2) the County Health Rankings, and (3) the Public Health Disparities Geocoding Project. Findings from our study show that the posterior estimates of latent health factors clearly demonstrate geographical patterning across US counties. Additionally, our results highlight the importance of accounting for the presence of spatial autocorrelation in racialized economic segregation measures, in health equity focused settings. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 30 pages, 5 figures

arXiv:2303.06422 [pdf, other]

An approximate control variates approach to multifidelity distribution estimation

Authors: Ruijian Han, Boris Kramer, Dong** Lee, Akil Narayan, Yiming Xu

Abstract: Forward simulation-based uncertainty quantification that studies the distribution of quantities of interest (QoI) is a crucial component for computationally robust engineering design and prediction. There is a large body of literature devoted to accurately assessing statistics of QoIs, and in particular, multilevel or multifidelity approaches are known to be effective, leveraging cost-accuracy tra… ▽ More Forward simulation-based uncertainty quantification that studies the distribution of quantities of interest (QoI) is a crucial component for computationally robust engineering design and prediction. There is a large body of literature devoted to accurately assessing statistics of QoIs, and in particular, multilevel or multifidelity approaches are known to be effective, leveraging cost-accuracy tradeoffs between a given ensemble of models. However, effective algorithms that can estimate the full distribution of QoIs are still under active development. In this paper, we introduce a general multifidelity framework for estimating the cumulative distribution function (CDF) of a vector-valued QoI associated with a high-fidelity model under a budget constraint. Given a family of appropriate control variates obtained from lower-fidelity surrogates, our framework involves identifying the most cost-effective model subset and then using it to build an approximate control variates estimator for the target CDF. We instantiate the framework by constructing a family of control variates using intermediate linear approximators and rigorously analyze the corresponding algorithm. Our analysis reveals that the resulting CDF estimator is uniformly consistent and asymptotically optimal as the budget tends to infinity, with only mild moment and regularity assumptions on the joint distribution of QoIs. The approach provides a robust multifidelity CDF estimator that is adaptive to the available budget, does not require \textit{a priori} knowledge of cross-model statistics or model hierarchy, and applies to multiple dimensions. We demonstrate the efficiency and robustness of the approach using test examples of parametric PDEs and stochastic differential equations including both academic instances and more challenging engineering problems. △ Less

Submitted 5 July, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

Comments: 41 pages, added additional numerical experiments

arXiv:2303.02201 [pdf, other]

Causal Inference using Multivariate Generalized Linear Mixed-Effects Models with Longitudinal Data

Authors: Yizhen Xu, Jisoo Kim, Laura K. Hummers, Ami A. Shah, Scott Zeger

Abstract: Dynamic prediction of causal effects under different treatment regimes conditional on an individual's characteristics and longitudinal history is an essential problem in precision medicine. This is challenging in practice because outcomes and treatment assignment mechanisms are unknown in observational studies, an individual's treatment efficacy is a counterfactual, and the existence of selection… ▽ More Dynamic prediction of causal effects under different treatment regimes conditional on an individual's characteristics and longitudinal history is an essential problem in precision medicine. This is challenging in practice because outcomes and treatment assignment mechanisms are unknown in observational studies, an individual's treatment efficacy is a counterfactual, and the existence of selection bias is often unavoidable. We propose a Bayesian framework for identifying subgroup counterfactual benefits of dynamic treatment regimes by adapting Bayesian g-computation algorithm (J. Robins, 1986; Zhou, Elliott, & Little, 2019) to incorporate multivariate generalized linear mixed-effects models. Unmeasured time-invariant factors are identified as subject-specific random effects in the assumed joint distribution of outcomes, time-varying confounders, and treatment assignments. Existing methods mostly assume no unmeasured confounding and focus on balancing the observed confounder distributions between different treatments, while our method allows the presence of time-invariant unmeasured confounding. We propose a sequential ignorability assumption based on treatment assignment heterogeneity, which is analogous to balancing the latent tendency toward each treatment due to unmeasured time-invariant factors beyond the observables. We use simulation studies to assess the sensitivity of the proposed method's performance to various model assumptions. The method is applied to observational clinical data to investigate the efficacy of continuously using mycophenolate in different subgroups of scleroderma patients who were treated with the drug. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 19 pages, 5 figures

arXiv:2302.14427 [pdf, other]

Federated Covariate Shift Adaptation for Missing Target Output Values

Authors: Yaqian Xu, Wenquan Cui, Jianjun Xu, Haoyang Cheng

Abstract: The most recent multi-source covariate shift algorithm is an efficient hyperparameter optimization algorithm for missing target output. In this paper, we extend this algorithm to the framework of federated learning. For data islands in federated learning and covariate shift adaptation, we propose the federated domain adaptation estimate of the target risk which is asymptotically unbiased with a de… ▽ More The most recent multi-source covariate shift algorithm is an efficient hyperparameter optimization algorithm for missing target output. In this paper, we extend this algorithm to the framework of federated learning. For data islands in federated learning and covariate shift adaptation, we propose the federated domain adaptation estimate of the target risk which is asymptotically unbiased with a desirable asymptotic variance property. We construct a weighted model for the target task and propose the federated covariate shift adaptation algorithm which works preferably in our setting. The efficacy of our method is justified both theoretically and empirically. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.11647 [pdf, other]

Patient stratification in multi-arm trials: a two-stage procedure with Bayesian profile regression

Authors: Yuejia Xu, Angela M. Wood, Brian D. M. Tom

Abstract: Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods… ▽ More Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods developed for this purpose focus on the case with a dichotomous treatment and are not directly applicable to multi-arm trials. In this paper, we consider the problem of patient stratification in multi-arm trial settings and propose a two-stage procedure within the Bayesian nonparametric framework. Specifically, we first use Bayesian additive regression trees (BART) to predict potential outcomes (treatment responses) under different treatment options for each patient, and then we leverage Bayesian profile regression to cluster patients into subgroups according to their baseline characteristics and predicted potential outcomes. We further embed a variable selection procedure into our proposed framework to identify the patient characteristics that actively "drive" the clustering structure. We conduct simulation studies to examine the performance of our proposed method and demonstrate the method by applying it to a UK-based multi-arm blood donation trial, wherein our method uncovers five clinically meaningful donor subgroups. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.11638 [pdf, other]

Sequential Re-estimation Learning of Optimal Individualized Treatment Rules Among Ordinal Treatments with Application to Recommended Intervals Between Blood Donations

Authors: Yuejia Xu, Angela M. Wood, David J. Roberts, Brian D. M. Tom

Abstract: Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard do… ▽ More Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard donor health and quality of blood received. However, the evidence-base for these limits is lacking. Moreover, in England, the blood service is interested in making blood donation both safe and sustainable by integrating multi-marker data from INTERVAL and develo** personalized donation strategies. As the three inter-donation interval options in INTERVAL have clear orderings, we propose a sequential re-estimation learning method that effectively incorporates "treatment" orderings when identifying optimal ITRs. Furthermore, we incorporate variable selection into our method for both linear and nonlinear decision rules to handle situations with (noise) covariates irrelevant for decision-making. Simulations demonstrate its superior performance over existing methods that assume multiple nominal treatments by achieving smaller misclassification rates and larger value functions. Application to a much-in-demand donor subgroup shows that the estimated optimal ITR achieves both the highest utilities and largest proportions of donors assigned to the safest inter-donation interval option in INTERVAL. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.10796 [pdf, ps, other]

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Authors: Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, Liwei Wang

Abstract: While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision pr… ▽ More While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with $S$ states, $A$ actions, and horizon $H$, and establish an $\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where $T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation and prove that it enjoys $\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the $Ω(\sqrt{T})$-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret. △ Less

Submitted 13 June, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: ICML 2024

arXiv:2302.00816 [pdf, other]

Dynamic Atomic Column Detection in Transmission Electron Microscopy Videos via Ridge Estimation

Authors: Yuchen Xu, Andrew M. Thomas, Peter A. Crozier, David S. Matteson

Abstract: Ridge detection is a classical tool to extract curvilinear features in image processing. As such, it has great promise in applications to material science problems; specifically, for trend filtering relatively stable atom-shaped objects in image sequences, such as Transmission Electron Microscopy (TEM) videos. Standard analysis of TEM videos is limited to frame-by-frame object recognition. We inst… ▽ More Ridge detection is a classical tool to extract curvilinear features in image processing. As such, it has great promise in applications to material science problems; specifically, for trend filtering relatively stable atom-shaped objects in image sequences, such as Transmission Electron Microscopy (TEM) videos. Standard analysis of TEM videos is limited to frame-by-frame object recognition. We instead harness temporal correlation across frames through simultaneous analysis of long image sequences, specified as a spatio-temporal image tensor. We define new ridge detection algorithms to non-parametrically estimate explicit trajectories of atomic-level object locations as a continuous function of time. Our approach is specially tailored to handle temporal analysis of objects that seemingly stochastically disappear and subsequently reappear throughout a sequence. We demonstrate that the proposed method is highly effective and efficient in simulation scenarios, and delivers notable performance improvements in TEM experiments compared to other material science benchmarks. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 27 pages, 11 figures

arXiv:2301.09300 [pdf, other]

A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Authors: Jianwen Xie, Yaxuan Zhu, Yifei Xu, Dingcheng Li, ** Li

Abstract: We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from… ▽ More We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI) 2023

arXiv:2301.09231 [pdf, other]

GP-NAS-ensemble: a model for NAS Performance Prediction

Authors: Kunlong Chen, Liu Yang, Yitian Chen, Kun** Chen, Yidan Xu, Lujun Li

Abstract: It is of great significance to estimate the performance of a given model architecture without training in the application of Neural Architecture Search (NAS) as it may take a lot of time to evaluate the performance of an architecture. In this paper, a novel NAS framework called GP-NAS-ensemble is proposed to predict the performance of a neural network architecture with a small training dataset. We… ▽ More It is of great significance to estimate the performance of a given model architecture without training in the application of Neural Architecture Search (NAS) as it may take a lot of time to evaluate the performance of an architecture. In this paper, a novel NAS framework called GP-NAS-ensemble is proposed to predict the performance of a neural network architecture with a small training dataset. We make several improvements on the GP-NAS model to make it share the advantage of ensemble learning methods. Our method ranks second in the CVPR2022 second lightweight NAS challenge performance prediction track. △ Less

Submitted 22 January, 2023; originally announced January 2023.

arXiv:2301.02651 [pdf, other]

A Robust Data-driven Process Modeling Applied to Time-series Stochastic Power Flow

Authors: Pooja Algikar, Yijun Xu, Somayeh Yarahmadi, Lamine Mili

Abstract: In this paper, we propose a robust data-driven process model whose hyperparameters are robustly estimated using the Schweppe-type generalized maximum likelihood estimator. The proposed model is trained on recorded time-series data of voltage phasors and power injections to perform a time-series stochastic power flow calculation. Power system data are often corrupted with outliers caused by large e… ▽ More In this paper, we propose a robust data-driven process model whose hyperparameters are robustly estimated using the Schweppe-type generalized maximum likelihood estimator. The proposed model is trained on recorded time-series data of voltage phasors and power injections to perform a time-series stochastic power flow calculation. Power system data are often corrupted with outliers caused by large errors, fault conditions, power outages, and extreme weather, to name a few. The proposed model downweights vertical outliers and bad leverage points in the measurements of the training dataset. The weights used to bound the influence of the outliers are calculated using projection statistics, which are a robust version of Mahalanobis distances of the time series data points. The proposed method is demonstrated on the IEEE 33-Bus power distribution system and a real-world unbalanced 240-bus power distribution system heavily integrated with renewable energy sources. Our simulation results show that the proposed robust model can handle up to 25% of outliers in the training data set. △ Less

Submitted 6 January, 2023; originally announced January 2023.

Comments: Submitted to the IEEE Transactions on Power Systems

arXiv:2212.14468 [pdf, other]

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Authors: Yang Xu, ** Zhu, Chengchun Shi, Shikai Luo, Rui Song

Abstract: Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable… ▽ More Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform. △ Less

Submitted 2 February, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Showing 1–50 of 349 results for author: Xu, Y