Search | arXiv e-print repository

CLEAR: Can Language Models Really Understand Causal Graphs?

Authors: Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

Abstract: Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel… ▽ More Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we develop a framework to define causal graph understanding, by assessing language models' behaviors through four practical criteria derived from diverse disciplines (e.g., philosophy and psychology). We then develop CLEAR, a novel benchmark that defines three complexity levels and encompasses 20 causal graph-based tasks across these levels. Finally, based on our framework and benchmark, we conduct extensive experiments on six leading language models and summarize five empirical findings. Our results indicate that while language models demonstrate a preliminary understanding of causal graphs, significant potential for improvement remains. Our project website is at https://github.com/OpenCausaLab/CLEAR. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.03681 [pdf, other]

Multiscale Tests for Point Processes and Longitudinal Networks

Authors: Youmeng Jiang, Min Xu

Abstract: We propose a new testing framework applicable to both the two-sample problem on point processes and the community detection problem on rectangular arrays of point processes, which we refer to as longitudinal networks; the latter problem is useful in situations where we observe interactions among a group of individuals over time. Our framework is based on a multiscale discretization scheme that con… ▽ More We propose a new testing framework applicable to both the two-sample problem on point processes and the community detection problem on rectangular arrays of point processes, which we refer to as longitudinal networks; the latter problem is useful in situations where we observe interactions among a group of individuals over time. Our framework is based on a multiscale discretization scheme that consider not just the global null but also a collection of nulls local to small regions in the domain; in the two-sample problem, the local rejections tell us where the intensity functions differ and in the longitudinal network problem, the local rejections tell us when the community structure is most salient. We provide theoretical analysis for the two-sample problem and show that our method has minimax optimal power under a Holder continuity condition. We provide extensive simulation and real data analysis demonstrating the practicality of our proposed method. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 59 pages, 9 figures

MSC Class: 62Mxx

arXiv:2403.16688 [pdf, other]

Optimal convex $M$-estimation via score matching

Authors: Oliver Y. Feng, Yu-Chun Kao, Min Xu, Richard J. Samworth

Abstract: In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitti… ▽ More In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that our procedure attains the minimal asymptotic covariance among all convex $M$-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments confirm the practical merits of our proposal. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 69 pages, 12 figures and 4 tables

arXiv:2401.17504 [pdf, other]

CaMU: Disentangling Causal Effects in Deep Model Unlearning

Authors: Shaofei Shen, Chenhao Zhang, Alina Bialkowski, Weitong Chen, Miao Xu

Abstract: Machine unlearning requires removing the information of forgetting data while kee** the necessary information of remaining data. Despite recent advancements in this area, existing methodologies mainly focus on the effect of removing forgetting data without considering the negative impact this can have on the information of the remaining data, resulting in significant performance degradation afte… ▽ More Machine unlearning requires removing the information of forgetting data while kee** the necessary information of remaining data. Despite recent advancements in this area, existing methodologies mainly focus on the effect of removing forgetting data without considering the negative impact this can have on the information of the remaining data, resulting in significant performance degradation after data removal. Although some methods try to repair the performance of remaining data after removal, the forgotten information can also return after repair. Such an issue is due to the intricate intertwining of the forgetting and remaining data. Without adequately differentiating the influence of these two kinds of data on the model, existing algorithms take the risk of either inadequate removal of the forgetting data or unnecessary loss of valuable information from the remaining data. To address this shortcoming, the present study undertakes a causal analysis of the unlearning and introduces a novel framework termed Causal Machine Unlearning (CaMU). This framework adds intervention on the information of remaining data to disentangle the causal effects between forgetting data and remaining data. Then CaMU eliminates the causal impact associated with forgetting data while concurrently preserving the causal relevance of the remaining data. Comprehensive empirical results on various datasets and models suggest that CaMU enhances performance on the remaining data and effectively minimizes the influences of forgetting data. Notably, this work is the first to interpret deep model unlearning tasks from a new perspective of causality and provide a solution based on causal analysis, which opens up new possibilities for future research in deep model unlearning. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Full version of the paper accepted for the SDM 24 conference

arXiv:2312.15469 [pdf, other]

Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

Authors: Gan Yuan, Mingyue Xu, Samory Kpotufe, Daniel Hsu

Abstract: We consider the problem of sufficient dimension reduction (SDR) for multi-index models. The estimators of the central mean subspace in prior works either have slow (non-parametric) convergence rates, or rely on stringent distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$ being elliptical symmetric). In this paper, we show that a fast parametric convergence rate of form… ▽ More We consider the problem of sufficient dimension reduction (SDR) for multi-index models. The estimators of the central mean subspace in prior works either have slow (non-parametric) convergence rates, or rely on stringent distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$ being elliptical symmetric). In this paper, we show that a fast parametric convergence rate of form $C_d \cdot n^{-1/2}$ is achievable via estimating the \emph{expected smoothed gradient outer product}, for a general class of distribution $P_{\mathbf{X}}$ admitting Gaussian or heavier distributions. When the link function is a polynomial with a degree of at most $r$ and $P_{\mathbf{X}}$ is the standard Gaussian, we show that the prefactor depends on the ambient dimension $d$ as $C_d \propto d^r$. △ Less

Submitted 24 December, 2023; originally announced December 2023.

MSC Class: 62B05; 62G08

arXiv:2311.08254 [pdf, other]

Identifiable and interpretable nonparametric factor analysis

Authors: Maoran Xu, Amy H. Herring, David B. Dunson

Abstract: Factor models have been widely used to summarize the variability of high-dimensional data through a set of factors with much lower dimensionality. Gaussian linear factor models have been particularly popular due to their interpretability and ease of computation. However, in practice, data often violate the multivariate Gaussian assumption. To characterize higher-order dependence and nonlinearity,… ▽ More Factor models have been widely used to summarize the variability of high-dimensional data through a set of factors with much lower dimensionality. Gaussian linear factor models have been particularly popular due to their interpretability and ease of computation. However, in practice, data often violate the multivariate Gaussian assumption. To characterize higher-order dependence and nonlinearity, models that include factors as predictors in flexible multivariate regression are popular, with GP-LVMs using Gaussian process (GP) priors for the regression function and VAEs using deep neural networks. Unfortunately, such approaches lack identifiability and interpretability and tend to produce brittle and non-reproducible results. To address these problems by simplifying the nonparametric factor model while maintaining flexibility, we propose the NIFTY framework, which parsimoniously transforms uniform latent variables using one-dimensional nonlinear map**s and then applies a linear generative model. The induced multivariate distribution falls into a flexible class while maintaining simple computation and interpretation. We prove that this model is identifiable and empirically study NIFTY using simulated data, observing good performance in density estimation and data visualization. We then apply NIFTY to bird song data in an environmental monitoring application. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 50 pages, 17 figures

arXiv:2310.20030 [pdf, other]

Scaling Riemannian Diffusion Models

Authors: Aaron Lou, Minkai Xu, Stefano Ermon

Abstract: Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications… ▽ More Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. Our key observation is that most relevant manifolds are symmetric spaces, which are much more amenable to computation. By leveraging and combining various ansätze, we can quickly compute relevant quantities to high precision. On low dimensional datasets, our correction produces a noticeable improvement, allowing diffusion to compete with other methods. Additionally, we show that our method enables us to scale to high dimensional tasks on nontrivial manifolds. In particular, we model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.04594 [pdf, ps, other]

A Comparison between Markov Switching Zero-inflated and Hurdle Models for Spatio-temporal Infectious Disease Counts

Authors: Mingchi Xu, Dirk Douwes-Schultz, Alexandra M. Schmidt

Abstract: In epidemiological studies, zero-inflated and hurdle models are commonly used to handle excess zeros in reported infectious disease cases. However, they can not model the persistence (from presence to presence) and reemergence (from absence to presence) of a disease separately. Covariates can sometimes have different effects on the reemergence and persistence of a disease. Recently, a zero-inflate… ▽ More In epidemiological studies, zero-inflated and hurdle models are commonly used to handle excess zeros in reported infectious disease cases. However, they can not model the persistence (from presence to presence) and reemergence (from absence to presence) of a disease separately. Covariates can sometimes have different effects on the reemergence and persistence of a disease. Recently, a zero-inflated Markov switching negative binomial model was proposed to accommodate this issue. We present a Markov switching negative binomial hurdle model as a competitor of that approach, as hurdle models are often also used as alternatives to zero-inflated models for accommodating excess zeroes. We begin the comparison by inspecting the underlying assumptions made by both models. Hurdle models assume perfect detection of the disease cases while zero-inflated models implicitly assume the case counts can be under-reported, thus we investigate when a negative binomial distribution can approximate the true distribution of reported counts. A comparison of the fit of the two types of Markov switching models is undertaken on chikungunya cases across the neighborhoods of Rio de Janeiro. We find that, among the fitted models, the Markov switching negative binomial zero-inflated model produces the best predictions and both Markov switching models produce remarkably better predictions than more traditional negative binomial hurdle and zero-inflated models. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.04268 [pdf, other]

Optimal Rate of Kernel Regression in Large Dimensions

Authors: Weihao Lu, Haobo Zhang, Yicheng Li, Manyun Xu, Qian Lin

Abstract: We perform a study on kernel regression for large-dimensional data (where the sample size $n$ is polynomially depending on the dimension $d$ of the samples, i.e., $n\asymp d^γ$ for some $γ>0$ ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity $\varepsilon_{n}^{2}$ and the metr… ▽ More We perform a study on kernel regression for large-dimensional data (where the sample size $n$ is polynomially depending on the dimension $d$ of the samples, i.e., $n\asymp d^γ$ for some $γ>0$ ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity $\varepsilon_{n}^{2}$ and the metric entropy $\bar{\varepsilon}_{n}^{2}$ respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on $\mathbb{S}^{d}$, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is $n^{-1/2}$ when $n\asymp d^γ$ for $γ=2, 4, 6, 8, \cdots$. We then further determine the optimal rate of the excess risk of kernel regression for all the $γ>0$ and find that the curve of optimal rate varying along $γ$ exhibits several new phenomena including the multiple descent behavior and the periodic plateau behavior. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well. △ Less

Submitted 28 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

MSC Class: 62G08; 46E22; 68T07

arXiv:2308.08046 [pdf, ps, other]

Regret Lower Bounds in Multi-agent Multi-armed Bandit

Authors: Mengfan Xu, Diego Klabjan

Abstract: Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by reg… ▽ More Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 10 pages

arXiv:2306.10395 [pdf, other]

Distributed Semi-Supervised Sparse Statistical Inference

Authors: Jiyuan Tu, Weidong Liu, Xiaojun Mao, Mingyue Xu

Abstract: The debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters. However, constructing such an estimator involves estimating the high-dimensional inverse Hessian matrix, incurring significant computational costs. This challenge becomes particularly acute in distributed setups, where traditional methods necessitate computing a debiased estimator on every mach… ▽ More The debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters. However, constructing such an estimator involves estimating the high-dimensional inverse Hessian matrix, incurring significant computational costs. This challenge becomes particularly acute in distributed setups, where traditional methods necessitate computing a debiased estimator on every machine. This becomes unwieldy, especially with a large number of machines. In this paper, we delve into semi-supervised sparse statistical inference in a distributed setup. An efficient multi-round distributed debiased estimator, which integrates both labeled and unlabelled data, is developed. We will show that the additional unlabeled data helps to improve the statistical rate of each round of iteration. Our approach offers tailored debiasing methods for $M$-estimation and generalized linear models according to the specific form of the loss function. Our method also applies to a non-smooth loss like absolute deviation loss. Furthermore, our algorithm is computationally efficient since it requires only one estimation of a high-dimensional inverse covariance matrix. We demonstrate the effectiveness of our method by presenting simulation studies and real data applications that highlight the benefits of incorporating unlabeled data. △ Less

Submitted 15 December, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

Comments: IEEE Transactions on Information Theory, 2023

arXiv:2306.05579 [pdf, other]

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an a… ▽ More We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions. △ Less

Submitted 17 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 58 pages, to appear at Advances in Neural Information Processing Systems (NeurIPS 2023 Spotlight)

arXiv:2304.02127 [pdf, other]

A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations

Authors: Mingwei Xu, Samuel W. K. Wong, Peijun Sang

Abstract: Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these met… ▽ More Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these methods can be hindered by estimation error, especially when only sparse time-course observations are available. We present a Bayesian collocation framework that operates on the integrated form of the ODEs and also avoids the expensive use of numerical solvers. Our methodology has the capability to handle general nonlinear ODE systems. We demonstrate the accuracy of the proposed method through simulation studies, where the estimated parameters and recovered system trajectories are compared with other recent methods. A real data example is also provided. △ Less

Submitted 23 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.01992 [pdf, other]

Choosing the $p$ in $L_p$ loss: rate adaptivity on the symmetric location problem

Authors: Yu-Chun Kao, Min Xu, Cun-Hui Zhang

Abstract: Given univariate random variables $Y_1, \ldots, Y_n$ with the $\text{Uniform}(θ_0 - 1, θ_0 + 1)$ distribution, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ is the MLE for $θ_0$ and estimates $θ_0$ with error of order $1/n$, which is much smaller compared with the $1/\sqrt{n}$ error rate of the usual sample mean estimator. However, the sample midrange performs poorly when the data has say the Ga… ▽ More Given univariate random variables $Y_1, \ldots, Y_n$ with the $\text{Uniform}(θ_0 - 1, θ_0 + 1)$ distribution, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ is the MLE for $θ_0$ and estimates $θ_0$ with error of order $1/n$, which is much smaller compared with the $1/\sqrt{n}$ error rate of the usual sample mean estimator. However, the sample midrange performs poorly when the data has say the Gaussian $N(θ_0, 1)$ distribution, with an error rate of $1/\sqrt{\log n}$. In this paper, we propose an estimator of the location $θ_0$ with a rate of convergence that can, in many settings, adapt to the underlying distribution which we assume to be symmetric around $θ_0$ but is otherwise unknown. When the underlying distribution is compactly supported, we show that our estimator attains a rate of convergence of $n^{-\frac{1}α}$ up to polylog factors, where the rate parameter $α$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by the $\ell^γ$-center of the data, for a $γ\geq2$ chosen in a data-driven way -- by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $θ_0$ is a function of observed features and motivates the use of $\ell^γ$ loss function for $γ> 2$ in certain settings. △ Less

Submitted 16 August, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 60 pages; 7 figures

MSC Class: 62F10

arXiv:2302.05933 [pdf, other]

Generalization Ability of Wide Neural Networks on $\mathbb{R}$

Authors: Jianfa Lai, Manyun Xu, Rui Chen, Qian Lin

Abstract: We perform a study on the generalization ability of the wide two-layer ReLU neural network on $\mathbb{R}$. We first establish some spectral properties of the neural tangent kernel (NTK): $a)$ $K_{d}$, the NTK defined on $\mathbb{R}^{d}$, is positive definite; $b)$ $λ_{i}(K_{1})$, the $i$-th largest eigenvalue of $K_{1}$, is proportional to $i^{-2}$. We then show that: $i)$ when the width… ▽ More We perform a study on the generalization ability of the wide two-layer ReLU neural network on $\mathbb{R}$. We first establish some spectral properties of the neural tangent kernel (NTK): $a)$ $K_{d}$, the NTK defined on $\mathbb{R}^{d}$, is positive definite; $b)$ $λ_{i}(K_{1})$, the $i$-th largest eigenvalue of $K_{1}$, is proportional to $i^{-2}$. We then show that: $i)$ when the width $m\rightarrow\infty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_{1}$ is $n^{-2/3}$; $iii)$ if one adopts the early stop** strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv)$ if one trains the neural network till it overfits the data, the resulting neural network can not generalize well. Finally, we provide an explanation to reconcile our theory and the widely observed ``benign overfitting phenomenon''. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: 47 pages, 4 figures

MSC Class: 62G08 (Primary); 68T07 (secondary); 46E22 ACM Class: G.3

arXiv:2302.05549 [pdf, other]

Balancing Approach for Causal Inference at Scale

Authors: Sicheng Lin, Meng Xu, Xi Zhang, Shih-Kang Chao, Ying-Kai Huang, Xiaolin Shi

Abstract: With the modern software and online platforms to collect massive amount of data, there is an increasing demand of applying causal inference methods at large scale when randomized experimentation is not viable. Weighting methods that directly incorporate covariate balancing have recently gained popularity for estimating causal effects in observational studies. These methods reduce the manual effort… ▽ More With the modern software and online platforms to collect massive amount of data, there is an increasing demand of applying causal inference methods at large scale when randomized experimentation is not viable. Weighting methods that directly incorporate covariate balancing have recently gained popularity for estimating causal effects in observational studies. These methods reduce the manual efforts required by researchers to iterate between propensity score modeling and balance checking until a satisfied covariate balance result. However, conventional solvers for determining weights lack the scalability to apply such methods on large scale datasets in companies like Snap Inc. To address the limitations and improve computational efficiency, in this paper we present scalable algorithms, DistEB and DistMS, for two balancing approaches: entropy balancing and MicroSynth. The solvers have linear time complexity and can be conveniently implemented in distributed computing frameworks such as Spark, Hive, etc. We study the properties of balancing approaches at different scales up to 1 million treated units and 487 covariates. We find that with larger sample size, both bias and variance in the causal effect estimation are significantly reduced. The results emphasize the importance of applying balancing approaches on large scale datasets. We combine the balancing approach with a synthetic control framework and deploy an end-to-end system for causal impact estimation at Snap Inc. △ Less

Submitted 3 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

arXiv:2212.00884 [pdf, other]

Pareto Regret Analyses in Multi-objective Multi-armed Bandit

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both… ▽ More We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. The regrets do not rely on any scalarization functions and reflect Pareto optimality compared to scalarized regrets. We also present new algorithms assuming both with and without prior information of the multi-objective multi-armed bandit setting. The algorithms are shown optimal in adversarial settings and nearly optimal up to a logarithmic factor in stochastic settings simultaneously by our established upper bounds and lower bounds on Pareto regrets. Moreover, the lower bound analyses show that the new regrets are consistent with the existing Pareto regret for stochastic settings and extend an adversarial attack mechanism from bandit to the multi-objective one. △ Less

Submitted 30 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: 19 pages; accepted at ICML 2023 and to be published in Proceedings of Machine Learning Research (PMLR)

arXiv:2210.13843 [pdf, ps, other]

GLS under Monotone Heteroskedasticity

Authors: Yoichi Arai, Taisuke Otsu, Mengshan Xu

Abstract: The generalized least square (GLS) is one of the most basic tools in regression analyses. A major issue in implementing the GLS is estimation of the conditional variance function of the error term, which typically requires a restrictive functional form assumption for parametric estimation or smoothing parameters for nonparametric estimation. In this paper, we propose an alternative approach to est… ▽ More The generalized least square (GLS) is one of the most basic tools in regression analyses. A major issue in implementing the GLS is estimation of the conditional variance function of the error term, which typically requires a restrictive functional form assumption for parametric estimation or smoothing parameters for nonparametric estimation. In this paper, we propose an alternative approach to estimate the conditional variance function under nonparametric monotonicity constraints by utilizing the isotonic regression method. Our GLS estimator is shown to be asymptotically equivalent to the infeasible GLS estimator with knowledge of the conditional error variance, and involves only some tuning to trim boundary observations, not only for point estimation but also for interval estimation or hypothesis testing. Our analysis extends the scope of the isotonic regression method by showing that the isotonic estimates, possibly with generated variables, can be employed as first stage estimates to be plugged in for semiparametric objects. Simulation studies illustrate excellent finite sample performances of the proposed method. As an empirical example, we revisit Acemoglu and Restrepo's (2017) study on the relationship between an aging population and economic growth to illustrate how our GLS estimator effectively reduces estimation errors. △ Less

Submitted 22 January, 2024; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2210.08461 [pdf, other]

Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

Authors: Jonathan Wilton, Abigail M. Y. Koay, Ryan K. L. Ko, Miao Xu, Nan Ye

Abstract: The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and the potential of tree-based PU learning is under-explored. In this paper, we propose new random fores… ▽ More The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and the potential of tree-based PU learning is under-explored. In this paper, we propose new random forest algorithms for PU-learning. Key to our approach is a new interpretation of decision tree algorithms for positive and negative data as \emph{recursive greedy risk minimization algorithms}. We extend this perspective to the PU setting to develop new decision tree learning algorithms that directly minimizes PU-data based estimators for the expected risk. This allows us to develop an efficient PU random forest algorithm, PU extra trees. Our approach features three desirable properties: it is robust to the choice of the loss function in the sense that various loss functions lead to the same decision trees; it requires little hyperparameter tuning as compared to neural network based PU learning; it supports a feature importance that directly measures a feature's contribution to risk minimization. Our algorithms demonstrate strong performance on several datasets. Our code is available at \url{https://github.com/puetpaper/PUExtraTrees}. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.07306 [pdf, other]

Statistical Modeling of Data Breach Risks: Time to Identification and Notification

Authors: Maochao Xu, Quynh Nhu Nguyen

Abstract: It is very challenging to predict the cost of a cyber incident owing to the complex nature of cyber risk. However, it is inevitable for insurance companies who offer cyber insurance policies. The time to identifying an incident and the time to noticing the affected individuals are two important components in determining the cost of a cyber incident. In this work, we initialize the study on those t… ▽ More It is very challenging to predict the cost of a cyber incident owing to the complex nature of cyber risk. However, it is inevitable for insurance companies who offer cyber insurance policies. The time to identifying an incident and the time to noticing the affected individuals are two important components in determining the cost of a cyber incident. In this work, we initialize the study on those two metrics via statistical modeling approaches. Particularly, we propose a novel approach to imputing the missing data, and further develop a dependence model to capture the complex pattern exhibited by those two metrics. The empirical study shows that the proposed approach has a satisfactory predictive performance and is superior to other commonly used models. △ Less

Submitted 24 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

arXiv:2207.08868 [pdf, ps, other]

Isotonic propensity score matching

Authors: Mengshan Xu, Taisuke Otsu

Abstract: We propose a one-to-many matching estimator of the average treatment effect based on propensity scores estimated by isotonic regression. The method relies on the monotonicity assumption on the propensity score function, which can be justified in many applications in economics. We show that the nature of the isotonic estimator can help us to fix many problems of existing matching methods, including… ▽ More We propose a one-to-many matching estimator of the average treatment effect based on propensity scores estimated by isotonic regression. The method relies on the monotonicity assumption on the propensity score function, which can be justified in many applications in economics. We show that the nature of the isotonic estimator can help us to fix many problems of existing matching methods, including efficiency, choice of the number of matches, choice of tuning parameters, robustness to propensity score misspecification, and bootstrap validity. As a by-product, a uniformly consistent isotonic estimator is developed for our proposed matching method. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2204.14184 [pdf]

doi 10.1080/21680566.2022.2108522

Modeling Ride-Sourcing Matching and Pickup Processes based on Additive Gaussian Process Models

Authors: Zheng Zhu, Meng Xu, Yining Di, Xiqun Chen, **gru Yu

Abstract: Matching and pickup processes are core features of ride-sourcing services. Previous studies have adopted abundant analytical models to depict the two processes and obtain operational insights; while the goodness of fit between models and data was dismissed. To simultaneously consider the fitness between models and data and analytically tractable formations, we propose a data-driven approach based… ▽ More Matching and pickup processes are core features of ride-sourcing services. Previous studies have adopted abundant analytical models to depict the two processes and obtain operational insights; while the goodness of fit between models and data was dismissed. To simultaneously consider the fitness between models and data and analytically tractable formations, we propose a data-driven approach based on the additive Gaussian Process Model (AGPM) for ride-sourcing market modeling. The framework is tested based on real-world data collected in Hangzhou, China. We fit analytical models, machine learning models, and AGPMs, in which the number of matches or pickups are used as outputs and spatial, temporal, demand, and supply covariates are utilized as inputs. The results demonstrate the advantages of AGPMs in recovering the two processes in terms of estimation accuracy. Furthermore, we illustrate the modeling power of AGPM by utilizing the trained model to design and estimate idle vehicle relocation strategies. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: 30 pages, 8 figures, 4 tables. Submitted and under review in Transportmetrica B: Transport Dynamics

arXiv:2203.12789 [pdf, ps, other]

doi 10.1007/s42519-023-00339-2

Random Matrix Time Series

Authors: Peiyuan Teng, Min Xu

Abstract: In this paper, a time series model with coefficients that take values from random matrix ensembles is proposed. Formal definitions, theoretical solutions, and statistical properties are derived. Estimation and forecast methodologies for random matrix time series are discussed with examples. Random matrix differential equations and potential applications of the time series model are suggested at th… ▽ More In this paper, a time series model with coefficients that take values from random matrix ensembles is proposed. Formal definitions, theoretical solutions, and statistical properties are derived. Estimation and forecast methodologies for random matrix time series are discussed with examples. Random matrix differential equations and potential applications of the time series model are suggested at the end. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: 15 pages

Journal ref: Journal of Statistical Theory and Practice, 17, Article number: 42 (2023)

arXiv:2203.10975 [pdf, other]

GCF: Generalized Causal Forest for Heterogeneous Treatment Effect Estimation in Online Marketplace

Authors: Shu Wan, Chen Zheng, Zhonggen Sun, Mengfan Xu, Xiaoqing Yang, Hongtu Zhu, Jiecheng Guo

Abstract: Uplift modeling is a rapidly growing approach that utilizes causal inference and machine learning methods to directly estimate the heterogeneous treatment effects, which has been widely applied to various online marketplaces to assist large-scale decision-making in recent years. The existing popular models, like causal forest (CF), are limited to either discrete treatments or posing parametric ass… ▽ More Uplift modeling is a rapidly growing approach that utilizes causal inference and machine learning methods to directly estimate the heterogeneous treatment effects, which has been widely applied to various online marketplaces to assist large-scale decision-making in recent years. The existing popular models, like causal forest (CF), are limited to either discrete treatments or posing parametric assumptions on the outcome-treatment relationship that may suffer model misspecification. However, continuous treatments (e.g., price, duration) often arise in marketplaces. To alleviate these restrictions, we use a kernel-based doubly robust estimator to recover the non-parametric dose-response functions that can flexibly model continuous treatment effects. Moreover, we propose a generic distance-based splitting criterion to capture the heterogeneity for the continuous treatments. We call the proposed algorithm generalized causal forest (GCF) as it generalizes the use case of CF to a much broader setting. We show the effectiveness of GCF by deriving the asymptotic property of the estimator and comparing it to popular uplift modeling methods on both synthetic and real-world datasets. We implement GCF on Spark and successfully deploy it into a large-scale online pricing system at a leading ride-sharing company. Online A/B testing results further validate the superiority of GCF. △ Less

Submitted 23 September, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2111.07052

Distribution and Determinants of Correlation between PM2.5 and O3 in China Mainland: Dynamitic simil-Hu Lines

Authors: Chenru Chen, Miaoqing Xu, Shuyi Liu, Dehai Zhu, Jianyu Yang, Bingbo Gao, Ziyue Chen

Abstract: In recent years, China has made great efforts to control air pollution. During the governance process, it is found that fine particulate matter (PM2.5) and ozone (O3) change in the same trend among some areas and the opposite in others, which brings some difficulties to take measures in a planned way. Therefore, this study adopted multi-year and large-scale air quality data to explore the distribu… ▽ More In recent years, China has made great efforts to control air pollution. During the governance process, it is found that fine particulate matter (PM2.5) and ozone (O3) change in the same trend among some areas and the opposite in others, which brings some difficulties to take measures in a planned way. Therefore, this study adopted multi-year and large-scale air quality data to explore the distribution of correlation between PM2.5 and O3, and proposed a concept called dynamic similar hu lines to replace the single fixed division in the previous research. Furthermore, this study discussed the causes of distribution patterns quantitatively with geographical detector and random forest. The causes included natural factors and anthropogenic factors. And these factors could be divided into three parts according to the characteristics of spatial distribution: broadly changing with longitude, changing with latitude, and having local characteristics. Overall, regions with relatively more densely population, higher GDP, lower altitude, higher humidity, higher atmospheric pressure, higher surface temperature, less sunshine hours and more accumulated precipitation often corresponds to positive correlation coefficient between PM2.5 and O3, no matter in which season. The parts with opposite conditions that mentioned above are essentially negative correlation coefficient. And what's more, humidity, global surface temperature, air temperature and accumulated precipitation are four decisive factors to form the distribution of correlation between PM2.5 and O3. In general, collaborative governance of atmospheric pollutants should consider particular time and space background and also be based on the local actual socio-economic situations, geography and geomorphology, climate and meteorology and other comprehensive factors. △ Less

Submitted 30 September, 2022; v1 submitted 13 November, 2021; originally announced November 2021.

Comments: Our research group have decided to withdraw this preprint

arXiv:2108.04851 [pdf, other]

Bayesian Inference using the Proximal Map**: Uncertainty Quantification under Varying Dimensionality

Authors: Maoran Xu, Hua Zhou, Yujie Hu, Leo L. Duan

Abstract: In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty -- in the Bayesian framework, a major difficulty is that… ▽ More In statistical applications, it is common to encounter parameters supported on a varying or unknown dimensional space. Examples include the fused lasso regression, the matrix recovery under an unknown low rank, etc. Despite the ease of obtaining a point estimate via the optimization, it is much more challenging to quantify their uncertainty -- in the Bayesian framework, a major difficulty is that if assigning the prior associated with a $p$-dimensional measure, then there is zero posterior probability on any lower-dimensional subset with dimension $d<p$; to avoid this caveat, one needs to choose another dimension-selection prior on $d$, which often involves a highly combinatorial problem. To significantly reduce the modeling burden, we propose a new generative process for the prior: starting from a continuous random variable such as multivariate Gaussian, we transform it into a varying-dimensional space using the proximal map**. This leads to a large class of new Bayesian models that can directly exploit the popular frequentist regularizations and their algorithms, such as the nuclear norm penalty and the alternating direction method of multipliers, while providing a principled and probabilistic uncertainty estimation. We show that this framework is well justified in the geometric measure theory, and enjoys a convenient posterior computation via the standard Hamiltonian Monte Carlo. We demonstrate its use in the analysis of the dynamic flow network data. △ Less

Submitted 2 October, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 26 pages, 4 figures

arXiv:2107.00153 [pdf, other]

Root and community inference on the latent growth process of a network

Authors: Harry Crane, Min Xu

Abstract: Many existing statistical models for networks overlook the fact that many real world networks are formed through a growth process. To address this, we introduce the PAPER (Preferential Attachment Plus Erdős--Rényi) model for random networks, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdős--Rényi (ER) random edges. The PA tree component cap… ▽ More Many existing statistical models for networks overlook the fact that many real world networks are formed through a growth process. To address this, we introduce the PAPER (Preferential Attachment Plus Erdős--Rényi) model for random networks, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdős--Rényi (ER) random edges. The PA tree component captures the underlying growth/recruitment process of a network where vertices and edges are added sequentially, while the ER component can be regarded as random noise. Given only a single snapshot of the final network G, we study the problem of constructing confidence sets for the early history, in particular the root node, of the unobserved growth process; the root node can be patient zero in a disease infection network or the source of fake news in a social media network. We propose an inference algorithm based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the expected size of the confidence set is small so long as the noise level of the ER edges is not too large. We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities, and we use these models to provide a new approach to community detection. △ Less

Submitted 7 February, 2023; v1 submitted 30 June, 2021; originally announced July 2021.

Comments: 69 pages; 29 figures

MSC Class: 62M99; 05C80

arXiv:2103.08450 [pdf, other]

Modeling Multivariate Cyber Risks: Deep Learning Dating Extreme Value Theory

Authors: Mingyue Zhang Wu, **zhu Luo, Xing Fang, Maochao Xu, Peng Zhao

Abstract: Modeling cyber risks has been an important but challenging task in the domain of cyber security. It is mainly because of the high dimensionality and heavy tails of risk patterns. Those obstacles have hindered the development of statistical modeling of the multivariate cyber risks. In this work, we propose a novel approach for modeling the multivariate cyber risks which relies on the deep learning… ▽ More Modeling cyber risks has been an important but challenging task in the domain of cyber security. It is mainly because of the high dimensionality and heavy tails of risk patterns. Those obstacles have hindered the development of statistical modeling of the multivariate cyber risks. In this work, we propose a novel approach for modeling the multivariate cyber risks which relies on the deep learning and extreme value theory. The proposed model not only enjoys the high accurate point predictions via deep learning but also can provide the satisfactory high quantile prediction via extreme value theory. The simulation study shows that the proposed model can model the multivariate cyber risks very well and provide satisfactory prediction performances. The empirical evidence based on real honeypot attack data also shows that the proposed model has very satisfactory prediction performances. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 25 pages

arXiv:2102.03895 [pdf, other]

Functional optimal transport: map estimation and domain adaptation for functional data

Authors: Jiacheng Zhu, Aritra Guha, Dat Do, Mengdi Xu, XuanLong Nguyen, Ding Zhao

Abstract: We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator map** a Hilbert space of functions to another. For numerous machine learning tasks, data can be naturally viewed as samples drawn from spaces of functions, such… ▽ More We introduce a formulation of optimal transport problem for distributions on function spaces, where the stochastic map between functional domains can be partially represented in terms of an (infinite-dimensional) Hilbert-Schmidt operator map** a Hilbert space of functions to another. For numerous machine learning tasks, data can be naturally viewed as samples drawn from spaces of functions, such as curves and surfaces, in high dimensions. Optimal transport for functional data analysis provides a useful framework of treatment for such domains. { Since probability measures in infinite dimensional spaces generally lack absolute continuity (that is, with respect to non-degenerate Gaussian measures), the Monge map in the standard optimal transport theory for finite dimensional spaces may not exist. Our approach to the optimal transport problem in infinite dimensions is by a suitable regularization technique -- we restrict the class of transport maps to be a Hilbert-Schmidt space of operators.} To this end, we develop an efficient algorithm for finding the stochastic transport map between functional domains and provide theoretical guarantees on the existence, uniqueness, and consistency of our estimate for the Hilbert-Schmidt operator. We validate our method on synthetic datasets and examine the functional properties of the transport map. Experiments on real-world datasets of robot arm trajectories further demonstrate the effectiveness of our method on applications in domain adaptation. △ Less

Submitted 28 August, 2023; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: 48 pages, 10 figures, 3 tables

arXiv:2012.03420 [pdf, other]

Towards Generalized Implementation of Wasserstein Distance in GANs

Authors: Minkai Xu, Zhiming Zhou, Guansong Lu, Jian Tang, Weinan Zhang, Yong Yu

Abstract: Wasserstein GANs (WGANs), built upon the Kantorovich-Rubinstein (KR) duality of Wasserstein distance, is one of the most theoretically sound GAN models. However, in practice it does not always outperform other variants of GANs. This is mostly due to the imperfect implementation of the Lipschitz condition required by the KR duality. Extensive work has been done in the community with different imple… ▽ More Wasserstein GANs (WGANs), built upon the Kantorovich-Rubinstein (KR) duality of Wasserstein distance, is one of the most theoretically sound GAN models. However, in practice it does not always outperform other variants of GANs. This is mostly due to the imperfect implementation of the Lipschitz condition required by the KR duality. Extensive work has been done in the community with different implementations of the Lipschitz constraint, which, however, is still hard to satisfy the restriction perfectly in practice. In this paper, we argue that the strong Lipschitz constraint might be unnecessary for optimization. Instead, we take a step back and try to relax the Lipschitz constraint. Theoretically, we first demonstrate a more general dual form of the Wasserstein distance called the Sobolev duality, which relaxes the Lipschitz constraint but still maintains the favorable gradient property of the Wasserstein distance. Moreover, we show that the KR duality is actually a special case of the Sobolev duality. Based on the relaxed duality, we further propose a generalized WGAN training scheme named Sobolev Wasserstein GAN (SWGAN), and empirically demonstrate the improvement of SWGAN over existing methods with extensive experiments. △ Less

Submitted 12 January, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

Comments: Accepted by AAAI 2021

arXiv:2011.14437 [pdf, other]

How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled Experiments

Authors: Yuxiang Xie, Meng Xu, Evan Chow, Xiaolin Shi

Abstract: Effectively measuring, understanding, and improving mobile app performance is of paramount importance for mobile app developers. Across the mobile Internet landscape, companies run online controlled experiments (A/B tests) with thousands of performance metrics in order to understand how app performance causally impacts user retention and to guard against service or app regressions that degrade use… ▽ More Effectively measuring, understanding, and improving mobile app performance is of paramount importance for mobile app developers. Across the mobile Internet landscape, companies run online controlled experiments (A/B tests) with thousands of performance metrics in order to understand how app performance causally impacts user retention and to guard against service or app regressions that degrade user experiences. To capture certain characteristics particular to performance metrics, such as enormous observation volume and high skewness in distribution, an industry-standard practice is to construct a performance metric as a quantile over all performance events in control or treatment buckets in A/B tests. In our experience with thousands of A/B tests provided by Snap, we have discovered some pitfalls in this industry-standard way of calculating performance metrics that can lead to unexplained movements in performance metrics and unexpected misalignment with user engagement metrics. In this paper, we discuss two major pitfalls in this industry-standard practice of measuring performance for mobile apps. One arises from strong heterogeneity in both mobile devices and user engagement, and the other arises from self-selection bias caused by post-treatment user engagement changes. To remedy these two pitfalls, we introduce several scalable methods including user-level performance metric calculation and imputation and matching for missing metric values. We have extensively evaluated these methods on both simulation data and real A/B tests, and have deployed them into Snap's in-house experimentation platform. △ Less

Submitted 29 November, 2020; originally announced November 2020.

Comments: WSDM '21: Proceedings of the 14th International Conference on Web Search and Data Mining

arXiv:2010.01875 [pdf, other]

Pointwise Binary Classification with Pairwise Confidence Comparisons

Authors: Lei Feng, Senlin Shu, Nan Lu, Bo Han, Miao Xu, Gang Niu, Bo An, Masashi Sugiyama

Abstract: To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed. Among them, some consider using pairwise but not pointwise labels, when pointwise labels are not accessible due to privacy, confidentiality, or security reasons. However, as a pairwise label denotes whether or not two data points share a… ▽ More To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed. Among them, some consider using pairwise but not pointwise labels, when pointwise labels are not accessible due to privacy, confidentiality, or security reasons. However, as a pairwise label denotes whether or not two data points share a pointwise label, it cannot be easily collected if either point is equally likely to be positive or negative. Thus, in this paper, we propose a novel setting called pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other. Firstly, we give a Pcomp data generation process, derive an unbiased risk estimator (URE) with theoretical guarantee, and further improve URE using correction functions. Secondly, we link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization. Finally, we demonstrate by experiments the effectiveness of our methods, which suggests Pcomp is a valuable and practically useful type of pairwise supervision besides the pairwise label. △ Less

Submitted 13 January, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to ICML 2021

arXiv:2010.01819 [pdf, other]

Bigeminal Priors Variational auto-encoder

Authors: Xuming Ran, Mingkun Xu, Qi Xu, Huihui Zhou, Quanying Liu

Abstract: Variational auto-encoders (VAEs) are an influential and generally-used class of likelihood-based generative models in unsupervised learning. The likelihood-based generative models have been reported to be highly robust to the out-of-distribution (OOD) inputs and can be a detector by assuming that the model assigns higher likelihoods to the samples from the in-distribution (ID) dataset than an OOD… ▽ More Variational auto-encoders (VAEs) are an influential and generally-used class of likelihood-based generative models in unsupervised learning. The likelihood-based generative models have been reported to be highly robust to the out-of-distribution (OOD) inputs and can be a detector by assuming that the model assigns higher likelihoods to the samples from the in-distribution (ID) dataset than an OOD dataset. However, recent works reported a phenomenon that VAE recognizes some OOD samples as ID by assigning a higher likelihood to the OOD inputs compared to the one from ID. In this work, we introduce a new model, namely Bigeminal Priors Variational auto-encoder (BPVAE), to address this phenomenon. The BPVAE aims to enhance the robustness of the VAEs by combing the power of VAE with the two independent priors that belong to the training dataset and simple dataset, which complexity is lower than the training dataset, respectively. BPVAE learns two datasets'features, assigning a higher likelihood for the training dataset than the simple dataset. In this way, we can use BPVAE's density estimate for detecting the OOD samples. Quantitative experimental results suggest that our model has better generalization capability and stronger robustness than the standard VAEs, proving the effectiveness of the proposed approach of hybrid learning by collaborative priors. Overall, this work paves a new avenue to potentially overcome the OOD problem via multiple latent priors modeling. △ Less

Submitted 5 October, 2020; originally announced October 2020.

arXiv:2009.09538 [pdf, other]

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Authors: Mengfan Xu, Diego Klabjan

Abstract: We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-t… ▽ More We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in reinforcement learning either assume costly interactions with an environment or propose algorithms finding potentially low quality local maxima. Motivated by EXP-type methods that integrate multiple agents (experts) for exploration in bandits with the assumption that rewards are bounded, we propose new algorithms, namely EXP4.P and EXP4-RL for exploration in the unbounded reward case, and demonstrate their effectiveness in these new settings. Unbounded rewards introduce challenges as the regret cannot be limited by the number of trials, and selecting suboptimal arms may lead to infinite regret. Specifically, we establish EXP4.P's regret upper bounds in both bounded and unbounded linear and stochastic contextual bandits. Surprisingly, we also find that by including one sufficiently competent expert, EXP4.P can achieve global optimality in the linear case. This unbounded reward result is also applicable to a revised version of EXP3.P in the Multi-armed Bandit scenario. In EXP4-RL, we extend EXP4.P from bandit scenarios to reinforcement learning to incentivize exploration by multiple agents, including one high-performing agent, for both efficiency and excellence. This algorithm has been tested on difficult-to-explore games and shows significant improvements in exploration compared to state-of-the-art. △ Less

Submitted 3 May, 2024; v1 submitted 20 September, 2020; originally announced September 2020.

Comments: 40 pages, 8 figures

arXiv:2008.05095 [pdf, ps, other]

Experimental Analysis of Legendre Decomposition in Machine Learning

Authors: Jianye Pang, Kai Yi, Wanguang Yin, Min Xu

Abstract: In this technical report, we analyze Legendre decomposition for non-negative tensor in theory and application. In theory, the properties of dual parameters and dually flat manifold in Legendre decomposition are reviewed, and the process of tensor projection and parameter updating is analyzed. In application, a series of verification experiments and clustering experiments with parameters on submani… ▽ More In this technical report, we analyze Legendre decomposition for non-negative tensor in theory and application. In theory, the properties of dual parameters and dually flat manifold in Legendre decomposition are reviewed, and the process of tensor projection and parameter updating is analyzed. In application, a series of verification experiments and clustering experiments with parameters on submanifold were carried out, ho** to find an effective lower dimensional representation of the input tensor. The experimental results show that the parameters on submanifold have no ability to be directly used as low-rank representations. Combined with analysis, we connect Legendre decomposition with neural networks and low-rank representation applications, and put forward some promising prospects. △ Less

Submitted 21 September, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

arXiv:2007.15840 [pdf]

A Survey on Concept Factorization: From Shallow to Deep Representation Learning

Authors: Zhao Zhang, Yan Zhang, Mingliang Xu, Li Zhang, Yi Yang, Shuicheng Yan

Abstract: The quality of learned features by representation learning determines the performance of learning algorithms and the related application tasks (such as high-dimensional data clustering). As a relatively new paradigm for representation learning, Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining for over a decade. Lots of effective CF… ▽ More The quality of learned features by representation learning determines the performance of learning algorithms and the related application tasks (such as high-dimensional data clustering). As a relatively new paradigm for representation learning, Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining for over a decade. Lots of effective CF based methods have been proposed based on different perspectives and properties, but note that it still remains not easy to grasp the essential connections and figure out the underlying explanatory factors from exiting studies. In this paper, we therefore survey the recent advances on CF methodologies and the potential benchmarks by categorizing and summarizing the current methods. Specifically, we first re-view the root CF method, and then explore the advancement of CF-based representation learning ranging from shallow to deep/multilayer cases. We also introduce the potential application areas of CF-based methods. Finally, we point out some future directions for studying the CF-based representation learning. Overall, this survey provides an insightful overview of both theoretical basis and current developments in the field of CF, which can also help the interested researchers to understand the current trends of CF and find the most appropriate CF techniques to deal with particular applications. △ Less

Submitted 31 January, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

Comments: Please cite this work as: Zhao Zhang, Yan Zhang, Mingliang Xu, Li Zhang, Yi Yang and Shuicheng Yan, "A Survey on Concept Factorization: From Shallow to Deep Representation Learning," Information Processing and Management (IPM), Jan 2021

arXiv:2007.08929 [pdf, other]

Provably Consistent Partial-Label Learning

Authors: Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama

Abstract: Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then… ▽ More Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then it is still unclear why such a method works on a specific dataset and when it may fail given a different dataset. In this paper, we propose the first generation model of candidate label sets, and develop two novel PLL methods that are guaranteed to be provably consistent, i.e., one is risk-consistent and the other is classifier-consistent. Our methods are advantageous, since they are compatible with any deep network or stochastic optimizer. Furthermore, thanks to the generation model, we would be able to answer the two questions above by testing if the generation model matches given candidate label sets. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed generation model and two PLL methods. △ Less

Submitted 23 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: NeurIPS 2020 camera-ready version

arXiv:2007.08128 [pdf, other]

Detecting Out-of-distribution Samples via Variational Auto-encoder with Reliable Uncertainty Estimation

Authors: Xuming Ran, Mingkun Xu, Lingrui Mei, Qi Xu, Quanying Liu

Abstract: Variational autoencoders (VAEs) are influential generative models with rich representation capabilities from the deep neural network architecture and Bayesian method. However, VAE models have a weakness that assign a higher likelihood to out-of-distribution (OOD) inputs than in-distribution (ID) inputs. To address this problem, a reliable uncertainty estimation is considered to be critical for in-… ▽ More Variational autoencoders (VAEs) are influential generative models with rich representation capabilities from the deep neural network architecture and Bayesian method. However, VAE models have a weakness that assign a higher likelihood to out-of-distribution (OOD) inputs than in-distribution (ID) inputs. To address this problem, a reliable uncertainty estimation is considered to be critical for in-depth understanding of OOD inputs. In this study, we propose an improved noise contrastive prior (INCP) to be able to integrate into the encoder of VAEs, called INCPVAE. INCP is scalable, trainable and compatible with VAEs, and it also adopts the merits from the INCP for uncertainty estimation. Experiments on various datasets demonstrate that compared to the standard VAEs, our model is superior in uncertainty estimation for the OOD data and is robust in anomaly detection tasks. The INCPVAE model obtains reliable uncertainty estimation for OOD inputs and solves the OOD problem in VAE models. △ Less

Submitted 1 November, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

arXiv:2007.00454 [pdf, other]

Pricing cyber insurance for a large-scale network

Authors: Lei Hua, Maochao Xu

Abstract: Facing the lack of cyber insurance loss data, we propose an innovative approach for pricing cyber insurance for a large-scale network based on synthetic data. The synthetic data is generated by the proposed risk spreading and recovering algorithm that allows infection and recovery events to occur sequentially, and allows dependence of random waiting time to infection for different nodes. The scale… ▽ More Facing the lack of cyber insurance loss data, we propose an innovative approach for pricing cyber insurance for a large-scale network based on synthetic data. The synthetic data is generated by the proposed risk spreading and recovering algorithm that allows infection and recovery events to occur sequentially, and allows dependence of random waiting time to infection for different nodes. The scale-free network framework is adopted to account for the topology uncertainty of the random large-scale network. Extensive simulation studies are conducted to understand the risk spreading and recovering mechanism, and to uncover the most important underwriting risk factors. A case study is also presented to demonstrate that the proposed approach and algorithm can be adapted accordingly to provide reference for cyber insurance pricing. △ Less

Submitted 29 June, 2020; originally announced July 2020.

arXiv:2006.16723 [pdf, other]

Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification

Authors: Hongyuan Mei, Guanghui Qin, Minjie Xu, Jason Eisner

Abstract: Learning how to predict future events from patterns of past events is difficult when the set of possible event types is large. Training an unrestricted neural model might overfit to spurious patterns. To exploit domain-specific knowledge of how past events might affect an event's present probability, we propose using a temporal deductive database to track structured facts over time. Rules serve to… ▽ More Learning how to predict future events from patterns of past events is difficult when the set of possible event types is large. Training an unrestricted neural model might overfit to spurious patterns. To exploit domain-specific knowledge of how past events might affect an event's present probability, we propose using a temporal deductive database to track structured facts over time. Rules serve to prove facts from other facts and from past events. Each fact has a time-varying state---a vector computed by a neural net whose topology is determined by the fact's provenance, including its experience of past events. The possible event types at any time are given by special facts, whose probabilities are neurally modeled alongside their states. In both synthetic and real-world domains, we show that neural probabilistic models derived from concise Datalog programs improve prediction by encoding appropriate domain knowledge in their architecture. △ Less

Submitted 16 August, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

Comments: ICML 2020 camera-ready (new Appendix A.3, rewritten Appendix F)

arXiv:2006.16312 [pdf, other]

Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising

Authors: Xiaotian Hao, Zhaoqing Peng, Yi Ma, Guan Wang, Junqi **, Jianye Hao, Shan Chen, Rongquan Bai, Mingzhou Xie, Miao Xu, Zhenzhe Zheng, Chuan Yu, Han Li, Jian Xu, Kun Gai

Abstract: In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing adver… ▽ More In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertiser's cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: accepted by ICML 2020

arXiv:2006.11441 [pdf, other]

Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes

Authors: Mengdi Xu, Wenhao Ding, Jiacheng Zhu, Zuxin Liu, Baiming Chen, Ding Zhao

Abstract: Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. Th… ▽ More Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. This paper proposes a continual online model-based reinforcement learning approach that does not require pre-training to solve task-agnostic problems with unknown task boundaries. We maintain a mixture of experts to handle nonstationarity, and represent each different type of dynamics with a Gaussian Process to efficiently leverage collected data and expressively model uncertainty. We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference. Our approach reliably handles the task distribution shift by generating new models for never-before-seen dynamics and reusing old models for previously seen dynamics. In experiments, our approach outperforms alternative methods in non-stationary tasks, including classic control with changing dynamics and decision making in different driving scenarios. △ Less

Submitted 30 November, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: 16 pages, 6 figures

arXiv:2006.06983 [pdf, other]

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Authors: Chengxu Yang, Qipeng Wang, Mengwei Xu, Zhenpeng Chen, Kaigui Bian, Yunxin Liu, Xuanzhe Liu

Abstract: Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a… ▽ More Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity. △ Less

Submitted 12 March, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.01340 [pdf, other]

Bayesian Inference with the l1-ball Prior: Solving Combinatorial Problems with Exact Zeros

Authors: Maoran Xu, Leo L. Duan

Abstract: The l1-regularization is very popular in high dimensional statistics -- it changes a combinatorial problem of choosing which subset of the parameter are zero, into a simple continuous optimization. Using a continuous prior concentrated near zero, the Bayesian counterparts are successful in quantifying the uncertainty in the variable selection problems; nevertheless, the lack of exact zeros makes i… ▽ More The l1-regularization is very popular in high dimensional statistics -- it changes a combinatorial problem of choosing which subset of the parameter are zero, into a simple continuous optimization. Using a continuous prior concentrated near zero, the Bayesian counterparts are successful in quantifying the uncertainty in the variable selection problems; nevertheless, the lack of exact zeros makes it difficult for broader problems such as the change-point detection and rank selection. Inspired by the duality of the l1-regularization as a constraint onto an l1-ball, we propose a new prior by projecting a continuous distribution onto the l1-ball. This creates a positive probability on the ball boundary, which contains both continuous elements and exact zeros. Unlike the spike-and-slab prior, this l1-ball projection is continuous and differentiable almost surely, making the posterior estimation amenable to the Hamiltonian Monte Carlo algorithm. We examine the properties, such as the volume change due to the projection, the connection to the combinatorial prior, the minimax concentration rate in the linear problem. We demonstrate the usefulness of exact zeros that simplify the combinatorial problems, such as the change-point detection in time series, the dimension selection of mixture model and the low-rank-plus-sparse change detection in the medical images. △ Less

Submitted 20 February, 2023; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 44 pages, 15 figures

arXiv:2005.08794 [pdf, other]

Inference on the History of a Randomly Growing Tree

Authors: Harry Crane, Min Xu

Abstract: The spread of infectious disease in a human community or the proliferation of fake news on social media can be modeled as a randomly growing tree-shaped graph. The history of the random growth process is often unobserved but contains important information such as the source of the infection. We consider the problem of statistical inference on aspects of the latent history using only a single snaps… ▽ More The spread of infectious disease in a human community or the proliferation of fake news on social media can be modeled as a randomly growing tree-shaped graph. The history of the random growth process is often unobserved but contains important information such as the source of the infection. We consider the problem of statistical inference on aspects of the latent history using only a single snapshot of the final tree. Our approach is to apply random labels to the observed unlabeled tree and analyze the resulting distribution of the growth process, conditional on the final outcome. We show that this conditional distribution is tractable under a shape-exchangeability condition, which we introduce here, and that this condition is satisfied for many popular models for randomly growing trees such as uniform attachment, linear preferential attachment and uniform attachment on a $D$-regular tree. For inference of the root under shape-exchangeability, we propose O(n log n) time algorithms for constructing confidence sets with valid frequentist coverage as well as bounds on the expected size of the confidence sets. We also provide efficient sampling algorithms that extend our methods to a wide class of inference problems. △ Less

Submitted 13 January, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 36 pages; 7 figures; 5 tables

MSC Class: 90B15; 62M15

arXiv:2005.06546 [pdf]

Triaging moderate COVID-19 and other viral pneumonias from routine blood tests

Authors: Forrest Sheng Bao, Youbiao He, Jie Liu, Yuanfang Chen, Qian Li, Christina R. Zhang, Lei Han, Baoli Zhu, Yaorong Ge, Shi Chen, Ming Xu, Liu Ouyang

Abstract: The COVID-19 is swee** the world with deadly consequences. Its contagious nature and clinical similarity to other pneumonias make separating subjects contracted with COVID-19 and non-COVID-19 viral pneumonia a priority and a challenge. However, COVID-19 testing has been greatly limited by the availability and cost of existing methods, even in developed countries like the US. Intrigued by the wid… ▽ More The COVID-19 is swee** the world with deadly consequences. Its contagious nature and clinical similarity to other pneumonias make separating subjects contracted with COVID-19 and non-COVID-19 viral pneumonia a priority and a challenge. However, COVID-19 testing has been greatly limited by the availability and cost of existing methods, even in developed countries like the US. Intrigued by the wide availability of routine blood tests, we propose to leverage them for COVID-19 testing using the power of machine learning. Two proven-robust machine learning model families, random forests (RFs) and support vector machines (SVMs), are employed to tackle the challenge. Trained on blood data from 208 moderate COVID-19 subjects and 86 subjects with non-COVID-19 moderate viral pneumonia, the best result is obtained in an SVM-based classifier with an accuracy of 84%, a sensitivity of 88%, a specificity of 80%, and a precision of 92%. The results are found explainable from both machine learning and medical perspectives. A privacy-protected web portal is set up to help medical personnel in their practice and the trained models are released for developers to further build other applications. We hope our results can help the world fight this pandemic and welcome clinical verification of our approach on larger populations. △ Less

Submitted 13 May, 2020; originally announced May 2020.

ACM Class: I.5.4

arXiv:2005.05784 [pdf, other]

A Graph Gaussian Embedding Method for Predicting Alzheimer's Disease Progression with MEG Brain Networks

Authors: Mengjia Xu, David Lopez Sanz, Pilar Garces, Fernando Maestu, Quanzheng Li, Dimitrios Pantazis

Abstract: Characterizing the subtle changes of functional brain networks associated with the pathological cascade of Alzheimer's disease (AD) is important for early diagnosis and prediction of disease progression prior to clinical symptoms. We developed a new deep learning method, termed multiple graph Gaussian embedding model (MG2G), which can learn highly informative network features by map** high-dimen… ▽ More Characterizing the subtle changes of functional brain networks associated with the pathological cascade of Alzheimer's disease (AD) is important for early diagnosis and prediction of disease progression prior to clinical symptoms. We developed a new deep learning method, termed multiple graph Gaussian embedding model (MG2G), which can learn highly informative network features by map** high-dimensional resting-state brain networks into a low-dimensional latent space. These latent distribution-based embeddings enable a quantitative characterization of subtle and heterogeneous brain connectivity patterns at different regions and can be used as input to traditional classifiers for various downstream graph analytic tasks, such as AD early stage prediction, and statistical evaluation of between-group significant alterations across brain regions. We used MG2G to detect the intrinsic latent dimensionality of MEG brain networks, predict the progression of patients with mild cognitive impairment (MCI) to AD, and identify brain regions with network alterations related to MCI. △ Less

Submitted 10 November, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

arXiv:2005.05441 [pdf, other]

Delay-Aware Multi-Agent Reinforcement Learning for Cooperative and Competitive Environments

Authors: Baiming Chen, Mengdi Xu, Zuxin Liu, Liang Li, Ding Zhao

Abstract: Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training i… ▽ More Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one agent could spread to other agents. To resolve this problem, this paper proposes a novel framework to deal with delays as well as the non-stationary training issue of multi-agent tasks with model-free deep reinforcement learning. We formally define the Delay-Aware Markov Game that incorporates the delays of all agents in the environment. To solve Delay-Aware Markov Games, we apply centralized training and decentralized execution that allows agents to use extra information to ease the non-stationarity issue of the multi-agent systems during training, without the need of a centralized controller during execution. Experiments are conducted in multi-agent particle environments including cooperative communication, cooperative navigation, and competitive experiments. We also test the proposed algorithm in traffic scenarios that require coordination of all autonomous vehicles to show the practical value of delay-awareness. Results show that the proposed delay-aware multi-agent reinforcement learning algorithm greatly alleviates the performance degradation introduced by delay. Codes and demo videos are available at: https://github.com/baimingc/delay-aware-MARL. △ Less

Submitted 28 August, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

arXiv:2005.05440 [pdf, other]

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Authors: Baiming Chen, Mengdi Xu, Liang Li, Ding Zhao

Abstract: Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the le… ▽ More Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: https://github.com/baimingc/dambrl. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Journal ref: Neurocomputing Volume 450, 25 August 2021, Pages 119-128

Showing 1–50 of 102 results for author: Xu, M