Search | arXiv e-print repository

Bridging multiple worlds: multi-marginal optimal transport for causal partial-identification problem

Abstract: Under the prevalent potential outcome model in causal inference, each unit is associated with multiple potential outcomes but at most one of which is observed, leading to many causal quantities being only partially identified. The inherent missing data issue echoes the multi-marginal optimal transport (MOT) problem, where marginal distributions are known, but how the marginals couple to form the j… ▽ More Under the prevalent potential outcome model in causal inference, each unit is associated with multiple potential outcomes but at most one of which is observed, leading to many causal quantities being only partially identified. The inherent missing data issue echoes the multi-marginal optimal transport (MOT) problem, where marginal distributions are known, but how the marginals couple to form the joint distribution is unavailable. In this paper, we cast the causal partial identification problem in the framework of MOT with $K$ margins and $d$-dimensional outcomes and obtain the exact partial identified set. In order to estimate the partial identified set via MOT, statistically, we establish a convergence rate of the plug-in MOT estimator for general quadratic objective functions and prove it is minimax optimal for a quadratic objective function stemming from the variance minimization problem with arbitrary $K$ and $d \le 4$. Numerically, we demonstrate the efficacy of our method over several real-world datasets where our proposal consistently outperforms the baseline by a significant margin (over 70%). In addition, we provide efficient off-the-shelf implementations of MOT with general objective functions. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.07026 [pdf, other]

Selective Randomization Inference for Adaptive Experiments

Authors: Tobias Freidling, Qingyuan Zhao, Zijun Gao

Abstract: Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are not pre-specified, it has long been recognized that statistical inference for adaptive experiments is not straightforward. Most existing methods only apply to specific ad… ▽ More Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are not pre-specified, it has long been recognized that statistical inference for adaptive experiments is not straightforward. Most existing methods only apply to specific adaptive designs and rely on strong assumptions. In this work, we propose selective randomization inference as a general framework for analyzing adaptive experiments. In a nutshell, our approach applies conditional post-selection inference to randomization tests. By using directed acyclic graphs to describe the data generating process, we derive a selective randomization p-value that controls the selective type-I error without requiring independent and identically distributed data or any other modelling assumptions. We show how rejection sampling and Markov Chain Monte Carlo can be used to compute the selective randomization p-values and construct confidence intervals for a homogeneous treatment effect. To mitigate the risk of disconnected confidence intervals, we propose the use of hold-out units. Lastly, we demonstrate our method and compare it with other randomization tests using synthetic and real-world data. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.00424 [pdf, other]

Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution

Authors: Zhaoxing Gao

Abstract: Ridge regression is an indispensable tool in big data econometrics but suffers from bias issues affecting both statistical efficiency and scalability. We introduce an iterative strategy to correct the bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally reduces the bias to a level unachievable through linear transformations of the response. We e… ▽ More Ridge regression is an indispensable tool in big data econometrics but suffers from bias issues affecting both statistical efficiency and scalability. We introduce an iterative strategy to correct the bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally reduces the bias to a level unachievable through linear transformations of the response. We employ a Ridge-Screening (RS) method to handle the remaining bias when $p>n$, creating a reduced model suitable for bias-correction. Under certain conditions, the selected model nests the true one, making RS a novel variable selection approach. We establish the asymptotic properties and valid inferences of our de-biased ridge estimators for both $p< n$ and $p>n$, where $p$ and $n$ may grow towards infinity, along with the number of iterations. Our method is validated using simulated and real-world data examples, providing a closed-form solution to bias challenges in ridge regression inferences. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 53 pages, 10 figures

arXiv:2401.16651 [pdf, other]

A constructive approach to selective risk control

Authors: Zijun Gao, Wenjie Hu, Qingyuan Zhao

Abstract: Many modern applications require the use of data to both select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini-Hochberg (BH) procedure for multiple hypothesis testing as the iterative limit of the Benjamini-Yekutieli (BY)… ▽ More Many modern applications require the use of data to both select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini-Hochberg (BH) procedure for multiple hypothesis testing as the iterative limit of the Benjamini-Yekutieli (BY) procedure for constructing post-selection confidence intervals. Although several earlier authors have made noteworthy observations related to this, our discussion highlights that (1) the BH procedure is precisely the fixed-point iteration of the BY procedure; (2) the fact that the BH procedure controls the false discovery rate is almost an immediate corollary of the fact that the BY procedure controls the false coverage-statement rate. Building on this observation, we propose a constructive approach to control extra-selection risk (selection made after decision) by iterating decision strategies that control the post-selection risk (decision made after selection), and show that many previous methods and results are special cases of this general framework. We further extend this approach to problems with multiple selective risks and demonstrate how new methods can be developed. Our development leads to two surprising results about the BH procedure: (1) in the context of one-sided location testing, the BH procedure not only controls the false discovery rate at the null but also at other locations for free; (2) in the context of permutation tests, the BH procedure with exact permutation p-values can be well approximated by a procedure which only requires a total number of permutations that is almost linear in the total number of hypotheses. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 8 figures, 2 tables

arXiv:2310.19167 [pdf, other]

Rare Event Probability Learning by Normalizing Flows

Authors: Zhenggqi Gao, Dinghuai Zhang, Luca Daniel, Duane S. Boning

Abstract: A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow… ▽ More A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow assisted importance sampling, termed NOFIS. NOFIS first learns a sequence of proposal distributions associated with predefined nested subset events by minimizing KL divergence losses. Next, it estimates the rare event probability by utilizing importance sampling in conjunction with the last proposal. The efficacy of our NOFIS method is substantiated through comprehensive qualitative visualizations, affirming the optimality of the learned proposal distribution, as well as a series of quantitative experiments encompassing $10$ distinct test cases, which highlight NOFIS's superiority over baseline approaches. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: 16 pages, 5 figures, 2 tables

arXiv:2310.17844 [pdf, other]

Adaptive operator learning for infinite-dimensional Bayesian inverse problems

Authors: Zhiwei Gao, Liang Yan, Tao Zhou

Abstract: The fundamental computational issues in Bayesian inverse problems (BIP) governed by partial differential equations (PDEs) stem from the requirement of repeated forward model evaluations. A popular strategy to reduce such costs is to replace expensive model simulations with computationally efficient approximations using operator learning, motivated by recent progress in deep learning. However, usin… ▽ More The fundamental computational issues in Bayesian inverse problems (BIP) governed by partial differential equations (PDEs) stem from the requirement of repeated forward model evaluations. A popular strategy to reduce such costs is to replace expensive model simulations with computationally efficient approximations using operator learning, motivated by recent progress in deep learning. However, using the approximated model directly may introduce a modeling error, exacerbating the already ill-posedness of inverse problems. Thus, balancing between accuracy and efficiency is essential for the effective implementation of such approaches. To this end, we develop an adaptive operator learning framework that can reduce modeling error gradually by forcing the surrogate to be accurate in local areas. This is accomplished by adaptively fine-tuning the pre-trained approximate model with train- ing points chosen by a greedy algorithm during the posterior computational process. To validate our approach, we use DeepOnet to construct the surrogate and unscented Kalman inversion (UKI) to approximate the BIP solution, respectively. Furthermore, we present a rigorous convergence guarantee in the linear case using the UKI framework. The approach is tested on a number of benchmarks, including the Darcy flow, the heat source inversion problem, and the reaction-diffusion problem. The numerical results show that our method can significantly reduce computational costs while maintaining inversion accuracy. △ Less

Submitted 4 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.12349 [pdf, other]

Develo** 3D Virtual Safety Risk Terrain for UAS Operations in Complex Urban Environments

Authors: Zhenyu Gao, John-Paul Clarke, Javid Mardanov, Karen Marais

Abstract: Unmanned Aerial Systems (UAS), an integral part of the Advanced Air Mobility (AAM) vision, are capable of performing a wide spectrum of tasks in urban environments. The societal integration of UAS is a pivotal challenge, as these systems must operate harmoniously within the constraints imposed by regulations and societal concerns. In complex urban environments, UAS safety has been a perennial obst… ▽ More Unmanned Aerial Systems (UAS), an integral part of the Advanced Air Mobility (AAM) vision, are capable of performing a wide spectrum of tasks in urban environments. The societal integration of UAS is a pivotal challenge, as these systems must operate harmoniously within the constraints imposed by regulations and societal concerns. In complex urban environments, UAS safety has been a perennial obstacle to their large-scale deployment. To mitigate UAS safety risk and facilitate risk-aware UAS operations planning, we propose a novel concept called \textit{3D virtual risk terrain}. This concept converts public risk constraints in an urban environment into 3D exclusion zones that UAS operations should avoid to adequately reduce risk to Entities of Value (EoV). To implement the 3D virtual risk terrain, we develop a conditional probability framework that comprehensively integrates most existing basic models for UAS ground risk. To demonstrate the concept, we build risk terrains on a Chicago downtown model and observe their characteristics under different conditions. We believe that the 3D virtual risk terrain has the potential to become a new routine tool for risk-aware UAS operations planning, urban airspace management, and policy development. The same idea can also be extended to other forms of societal impacts, such as noise, privacy, and perceived risk. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 33 pages, 19 figures

arXiv:2310.06357 [pdf, other]

Adaptive Storey's null proportion estimator

Authors: Zijun Gao

Abstract: False discovery rate (FDR) is a commonly used criterion in multiple testing and the Benjamini-Hochberg (BH) procedure is arguably the most popular approach with FDR guarantee. To improve power, the adaptive BH procedure has been proposed by incorporating various null proportion estimators, among which Storey's estimator has gained substantial popularity. The performance of Storey's estimator hinge… ▽ More False discovery rate (FDR) is a commonly used criterion in multiple testing and the Benjamini-Hochberg (BH) procedure is arguably the most popular approach with FDR guarantee. To improve power, the adaptive BH procedure has been proposed by incorporating various null proportion estimators, among which Storey's estimator has gained substantial popularity. The performance of Storey's estimator hinges on a critical hyper-parameter, where a pre-fixed configuration lacks power and existing data-driven hyper-parameters compromise the FDR control. In this work, we propose a novel class of adaptive hyper-parameters and establish the FDR control of the associated BH procedure using a martingale argument. Within this class of data-driven hyper-parameters, we present a specific configuration designed to maximize the number of rejections and characterize the convergence of this proposal to the optimal hyper-parameter under a commonly-used mixture model. We evaluate our adaptive Storey's null proportion estimator and the associated BH procedure on extensive simulated data and a motivating protein dataset. Our proposal exhibits significant power gains when dealing with a considerable proportion of weak non-nulls or a conservative null distribution. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 17 pages, 4 figures, 1 table

arXiv:2309.02674 [pdf, other]

Denoising and Multilinear Dimension-Reduction of High-Dimensional Matrix-Variate Time Series via a Factor Model

Authors: Zhaoxing Gao, Ruey S. Tsay

Abstract: This paper proposes a new multilinear projection method for dimension-reduction in modeling high-dimensional matrix-variate time series. It assumes that a $p_1\times p_2$ matrix-variate time series consists of a dynamically dependent, lower-dimensional matrix-variate factor process and a $p_1\times p_2$ matrix white noise series. Covariance matrix of the vectorized white noises assumes a Kronecker… ▽ More This paper proposes a new multilinear projection method for dimension-reduction in modeling high-dimensional matrix-variate time series. It assumes that a $p_1\times p_2$ matrix-variate time series consists of a dynamically dependent, lower-dimensional matrix-variate factor process and a $p_1\times p_2$ matrix white noise series. Covariance matrix of the vectorized white noises assumes a Kronecker structure such that the row and column covariances of the noise all have diverging/spiked eigenvalues to accommodate the case of low signal-to-noise ratio often encountered in applications, such as in finance and economics. We use an iterative projection procedure to {reduce the dimensions and noise effects in estimating} front and back loading matrices and {to} obtain faster convergence rates than those of the traditional methods available in the literature. Furthermore, we introduce a two-way projected Principal Component Analysis to mitigate the diverging noise effects, and implement a high-dimensional white-noise testing procedure to estimate the dimension of the factor matrix. Asymptotic properties of the proposed method are established as the dimensions and sample size go to infinity. Simulated and real examples are used to assess the performance of the proposed method. We also compared the proposed method with some existing ones in the literature concerning the forecasting ability of the identified factors and found that the proposed approach fares well in out-of-sample forecasting. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 57 Pages, 7 figures, 7 tables. arXiv admin note: text overlap with arXiv:2011.09029

arXiv:2307.07689 [pdf, other]

Supervised Dynamic PCA: Linear Dynamic Forecasting with Many Predictors

Authors: Zhaoxing Gao, Ruey S. Tsay

Abstract: This paper proposes a novel dynamic forecasting method using a new supervised Principal Component Analysis (PCA) when a large number of predictors are available. The new supervised PCA provides an effective way to bridge the gap between predictors and the target variable of interest by scaling and combining the predictors and their lagged values, resulting in an effective dynamic forecasting. Unli… ▽ More This paper proposes a novel dynamic forecasting method using a new supervised Principal Component Analysis (PCA) when a large number of predictors are available. The new supervised PCA provides an effective way to bridge the gap between predictors and the target variable of interest by scaling and combining the predictors and their lagged values, resulting in an effective dynamic forecasting. Unlike the traditional diffusion-index approach, which does not learn the relationships between the predictors and the target variable before conducting PCA, we first re-scale each predictor according to their significance in forecasting the targeted variable in a dynamic fashion, and a PCA is then applied to a re-scaled and additive panel, which establishes a connection between the predictability of the PCA factors and the target variable. Furthermore, we also propose to use penalized methods such as the LASSO approach to select the significant factors that have superior predictive power over the others. Theoretically, we show that our estimators are consistent and outperform the traditional methods in prediction under some mild conditions. We conduct extensive simulations to verify that the proposed method produces satisfactory forecasting results and outperforms most of the existing methods using the traditional PCA. A real example of predicting U.S. macroeconomic variables using a large number of predictors showcases that our method fares better than most of the existing ones in applications. The proposed method thus provides a comprehensive and effective approach for dynamic forecasting in high-dimensional data analysis. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 58 pages, 7 figures

Journal ref: Journal of the American Statistical Association, 2024

arXiv:2306.15444 [pdf, other]

Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

Authors: Zhan Gao, Aryan Mokhtari, Alec Koppel

Abstract: Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or all past curvature information to form the current Hessian inverse approxima… ▽ More Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or all past curvature information to form the current Hessian inverse approximation. Limited-memory variants of quasi-Newton methods such as the celebrated L-BFGS alleviate this issue by leveraging a limited window of past curvature information to construct the Hessian inverse approximation. As a result, their per iteration complexity and storage requirement is O$(τd)$ where $τ\le d$ is the size of the window and $d$ is the problem dimension reducing the O$(d^2)$ computational cost and memory requirement of standard quasi-Newton methods. However, to the best of our knowledge, there is no result showing a non-asymptotic superlinear convergence rate for any limited-memory quasi-Newton method. In this work, we close this gap by presenting a Limited-memory Greedy BFGS (LG-BFGS) method that can achieve an explicit non-asymptotic superlinear rate. We incorporate displacement aggregation, i.e., decorrelating projection, in post-processing gradient variations, together with a basis vector selection scheme on variable variations, which greedily maximizes a progress measure of the Hessian estimate to the true Hessian. Their combination allows past curvature information to remain in a sparse subspace while yielding a valid representation of the full history. Interestingly, our established non-asymptotic superlinear convergence rate demonstrates an explicit trade-off between the convergence speed and memory requirement, which to our knowledge, is the first of its kind. Numerical results corroborate our theoretical findings and demonstrate the effectiveness of our method. △ Less

Submitted 18 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.13830 [pdf, other]

Improved Aircraft Environmental Impact Segmentation via Metric Learning

Authors: Zhenyu Gao, Dimitri N. Mavris

Abstract: Accurate modeling of aircraft environmental impact is pivotal to the design of operational procedures and policies to mitigate negative aviation environmental impact. Aircraft environmental impact segmentation is a process which clusters aircraft types that have similar environmental impact characteristics based on a set of aircraft features. This practice helps model a large population of aircraf… ▽ More Accurate modeling of aircraft environmental impact is pivotal to the design of operational procedures and policies to mitigate negative aviation environmental impact. Aircraft environmental impact segmentation is a process which clusters aircraft types that have similar environmental impact characteristics based on a set of aircraft features. This practice helps model a large population of aircraft types with insufficient aircraft noise and performance models and contributes to better understanding of aviation environmental impact. Through measuring the similarity between aircraft types, distance metric is the kernel of aircraft segmentation. Traditional ways of aircraft segmentation use plain distance metrics and assign equal weight to all features in an unsupervised clustering process. In this work, we utilize weakly-supervised metric learning and partial information on aircraft fuel burn, emissions, and noise to learn weighted distance metrics for aircraft environmental impact segmentation. We show in a comprehensive case study that the tailored distance metrics can indeed make aircraft segmentation better reflect the actual environmental impact of aircraft. The metric learning approach can help refine a number of similar data-driven analytical studies in aviation. △ Less

Submitted 10 September, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

Comments: 32 pages, 11 figures

arXiv:2306.10656 [pdf, other]

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Authors: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Yoshiaki Ota, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

Abstract: Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healt… ▽ More Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities. VHGM is a deep generative model trained with masked modeling to learn the joint distribution of attributes conditioned on known ones. Using heterogeneous tabular datasets, VHGM learns more than 1,800 attributes efficiently. We numerically evaluate the performance of VHGM and its training techniques. As a proof-of-concept of VHGM, we present several applications demonstrating user scenarios, such as virtual measurements of healthcare attributes and hypothesis verifications of lifestyles. △ Less

Submitted 14 August, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 14 pages, 4 figures

arXiv:2304.12134 [pdf, other]

Determination of the effective cointegration rank in high-dimensional time-series predictive regressions

Authors: Puyi Fang, Zhaoxing Gao, Ruey S. Tsay

Abstract: This paper proposes a new approach to identifying the effective cointegration rank in high-dimensional unit-root (HDUR) time series from a prediction perspective using reduced-rank regression. For a HDUR process $\mathbf{x}_t\in \mathbb{R}^N$ and a stationary series $\mathbf{y}_t\in \mathbb{R}^p$ of interest, our goal is to predict future values of $\mathbf{y}_t$ using $\mathbf{x}_t$ and lagged va… ▽ More This paper proposes a new approach to identifying the effective cointegration rank in high-dimensional unit-root (HDUR) time series from a prediction perspective using reduced-rank regression. For a HDUR process $\mathbf{x}_t\in \mathbb{R}^N$ and a stationary series $\mathbf{y}_t\in \mathbb{R}^p$ of interest, our goal is to predict future values of $\mathbf{y}_t$ using $\mathbf{x}_t$ and lagged values of $\mathbf{y}_t$. The proposed framework consists of a two-step estimation procedure. First, the Principal Component Analysis is used to identify all cointegrating vectors of $\mathbf{x}_t$. Second, the co-integrated stationary series are used as regressors, together with some lagged variables of $\mathbf{y}_t$, to predict $\mathbf{y}_t$. The estimated reduced rank is then defined as the effective cointegration rank of $\mathbf{x}_t$. Under the scenario that the autoregressive coefficient matrices are sparse (or of low-rank), we apply the Least Absolute Shrinkage and Selection Operator (or the reduced-rank techniques) to estimate the autoregressive coefficients when the dimension involved is high. Theoretical properties of the estimators are established under the assumptions that the dimensions $p$ and $N$ and the sample size $T \to \infty$. Both simulated and real examples are used to illustrate the proposed framework, and the empirical application suggests that the proposed procedure fares well in predicting stock returns. △ Less

Submitted 24 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.09723 [pdf, other]

A Review of Bayesian Methods in Electronic Design Automation

Authors: Zhengqi Gao, Duane S. Boning

Abstract: The utilization of Bayesian methods has been widely acknowledged as a viable solution for tackling various challenges in electronic integrated circuit (IC) design under stochastic process variation, including circuit performance modeling, yield/failure rate estimation, and circuit optimization. As the post-Moore era brings about new technologies (such as silicon photonics and quantum circuits), ma… ▽ More The utilization of Bayesian methods has been widely acknowledged as a viable solution for tackling various challenges in electronic integrated circuit (IC) design under stochastic process variation, including circuit performance modeling, yield/failure rate estimation, and circuit optimization. As the post-Moore era brings about new technologies (such as silicon photonics and quantum circuits), many of the associated issues there are similar to those encountered in electronic IC design and can be addressed using Bayesian methods. Motivated by this observation, we present a comprehensive review of Bayesian methods in electronic design automation (EDA). By doing so, we hope to equip researchers and designers with the ability to apply Bayesian methods in solving stochastic problems in electronic circuits and beyond. △ Less

Submitted 13 March, 2023; originally announced April 2023.

Comments: 24 pages, a draft version. We welcome comments and feedback, which can be sent to [email protected]

arXiv:2303.01552 [pdf, other]

Simultaneous Hypothesis Testing Using Internal Negative Controls with An Application to Proteomics

Authors: Zijun Gao, Qingyuan Zhao

Abstract: Negative control is a common technique in scientific investigations and broadly refers to the situation where a null effect (''negative result'') is expected. Motivated by a real proteomic dataset, we will present three promising and closely connected methods of using negative controls to assist simultaneous hypothesis testing. The first method uses negative controls to construct a permutation p-v… ▽ More Negative control is a common technique in scientific investigations and broadly refers to the situation where a null effect (''negative result'') is expected. Motivated by a real proteomic dataset, we will present three promising and closely connected methods of using negative controls to assist simultaneous hypothesis testing. The first method uses negative controls to construct a permutation p-value for every hypothesis under investigation, and we give several sufficient conditions for such p-values to be valid and positive regression dependent on the set (PRDS) of true nulls. The second method uses negative controls to construct an estimate of the false discovery rate (FDR), and we give a sufficient condition under which the step-up procedure based on this estimate controls the FDR. The third method, derived from an existing ad hoc algorithm for proteomic analysis, uses negative controls to construct a nonparametric estimator of the local false discovery rate. We conclude with some practical suggestions and connections to some closely related methods that are propsed recently. △ Less

Submitted 19 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 41 pages, 10 figures, 3 tables

arXiv:2302.01529 [pdf, other]

Failure-informed adaptive sampling for PINNs, Part II: combining with re-sampling and subset simulation

Authors: Zhiwei Gao, Tao Tang, Liang Yan, Tao Zhou

Abstract: This is the second part of our series works on failure-informed adaptive sampling for physic-informed neural networks (FI-PINNs). In our previous work \cite{gao2022failure}, we have presented an adaptive sampling framework by using the failure probability as the posterior error indicator, where the truncated Gaussian model has been adopted for estimating the indicator. In this work, we present two… ▽ More This is the second part of our series works on failure-informed adaptive sampling for physic-informed neural networks (FI-PINNs). In our previous work \cite{gao2022failure}, we have presented an adaptive sampling framework by using the failure probability as the posterior error indicator, where the truncated Gaussian model has been adopted for estimating the indicator. In this work, we present two novel extensions to FI-PINNs. The first extension consist in combining with a re-sampling technique, so that the new algorithm can maintain a constant training size. This is achieved through a cosine-annealing, which gradually transforms the sampling of collocation points from uniform to adaptive via training progress. The second extension is to present the subset simulation algorithm as the posterior model (instead of the truncated Gaussian model) for estimating the error indicator, which can more effectively estimate the failure probability and generate new effective training points in the failure region. We investigate the performance of the new approach using several challenging problems, and numerical experiments demonstrate a significant improvement over the original algorithm. △ Less

Submitted 28 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2211.04918 [pdf, other]

Detection of Sparse Anomalies in High-Dimensional Network Telescope Signals

Authors: Rafail Kartsioukas, Rajat Tandon, Zheng Gao, Jelena Mirkovic, Michalis Kallitsis, Stilian Stoev

Abstract: Network operators and system administrators are increasingly overwhelmed with incessant cyber-security threats ranging from malicious network reconnaissance to attacks such as distributed denial of service and data breaches. A large number of these attacks could be prevented if the network operators were better equipped with threat intelligence information that would allow them to block or throttl… ▽ More Network operators and system administrators are increasingly overwhelmed with incessant cyber-security threats ranging from malicious network reconnaissance to attacks such as distributed denial of service and data breaches. A large number of these attacks could be prevented if the network operators were better equipped with threat intelligence information that would allow them to block or throttle nefarious scanning activities. Network telescopes or "darknets" offer a unique window into observing Internet-wide scanners and other malicious entities, and they could offer early warning signals to operators that would be critical for infrastructure protection and/or attack mitigation. A network telescope consists of unused or "dark" IP spaces that serve no users, and solely passively observes any Internet traffic destined to the "telescope sensor" in an attempt to record ubiquitous network scanners, malware that forage for vulnerable devices, and other dubious activities. Hence, monitoring network telescopes for timely detection of coordinated and heavy scanning activities is an important, albeit challenging, task. The challenges mainly arise due to the non-stationarity and the dynamic nature of Internet traffic and, more importantly, the fact that one needs to monitor high-dimensional signals (e.g., all TCP/UDP ports) to search for "sparse" anomalies. We propose statistical methods to address both challenges in an efficient and "online" manner; our work is validated both with synthetic data as well as real-world data from a large network telescope. △ Less

Submitted 22 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.00279 [pdf, other]

Failure-informed adaptive sampling for PINNs

Authors: Zhiwei Gao, Liang Yan, Tao Zhou

Abstract: Physics-informed neural networks (PINNs) have emerged as an effective technique for solving PDEs in a wide range of domains. It is noticed, however, the performance of PINNs can vary dramatically with different sampling procedures. For instance, a fixed set of (prior chosen) training points may fail to capture the effective solution region (especially for problems with singularities). To overcome… ▽ More Physics-informed neural networks (PINNs) have emerged as an effective technique for solving PDEs in a wide range of domains. It is noticed, however, the performance of PINNs can vary dramatically with different sampling procedures. For instance, a fixed set of (prior chosen) training points may fail to capture the effective solution region (especially for problems with singularities). To overcome this issue, we present in this work an adaptive strategy, termed the failure-informed PINNs (FI-PINNs), which is inspired by the viewpoint of reliability analysis. The key idea is to define an effective failure probability based on the residual, and then, with the aim of placing more samples in the failure region, the FI-PINNs employs a failure-informed enrichment technique to adaptively add new collocation points to the training set, such that the numerical accuracy is dramatically improved. In short, similar as adaptive finite element methods, the proposed FI-PINNs adopts the failure probability as the posterior error indicator to generate new training points. We prove rigorous error bounds of FI-PINNs and illustrate its performance through several problems. △ Less

Submitted 15 January, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

arXiv:2111.00929 [pdf, other]

Bounds all around: training energy-based models with bidirectional bounds

Authors: Cong Geng, Jia Wang, Zhiyong Gao, Jes Frellsen, Søren Hauberg

Abstract: Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper boun… ▽ More Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby providing grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments significantly stabilize training and yield high-quality density estimation and sample generation. △ Less

Submitted 2 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: This paper has been accepted by NeurIPS 2021

arXiv:2110.09823 [pdf, other]

An Empirical Study: Extensive Deep Temporal Point Process

Authors: Haitao Lin, Cheng Tan, Lirong Wu, Zhangyang Gao, Stan. Z. Li

Abstract: Temporal point process as the stochastic process on continuous domain of time is commonly used to model the asynchronous event sequence featuring with occurrence timestamps. Thanks to the strong expressivity of deep neural networks, they are emerging as a promising choice for capturing the patterns in asynchronous sequences, in the context of temporal point process. In this paper, we first review… ▽ More Temporal point process as the stochastic process on continuous domain of time is commonly used to model the asynchronous event sequence featuring with occurrence timestamps. Thanks to the strong expressivity of deep neural networks, they are emerging as a promising choice for capturing the patterns in asynchronous sequences, in the context of temporal point process. In this paper, we first review recent research emphasis and difficulties in modeling asynchronous event sequences with deep temporal point process, which can be concluded into four fields: encoding of history sequence, formulation of conditional intensity function, relational discovery of events and learning approaches for optimization. We introduce most of recently proposed models by dismantling them into the four parts, and conduct experiments by remodularizing the first three parts with the same learning strategy for a fair empirical evaluation. Besides, we extend the history encoders and conditional intensity function family, and propose a Granger causality discovery framework for exploiting the relations among multi-types of events. Because the Granger causality can be represented by the Granger causality graph, discrete graph structure learning in the framework of Variational Inference is employed to reveal latent structures of the graph. Further experiments show that the proposed framework with latent graph discovery can both capture the relations and achieve an improved fitting and predicting performance. △ Less

Submitted 21 December, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: 22 pages, 8 figures

arXiv:2108.01485 [pdf, other]

Fast Estimation Method for the Stability of Ensemble Feature Selectors

Authors: Rina Onda, Zhengyan Gao, Masaaki Kotera, Kenta Oono

Abstract: It is preferred that feature selectors be \textit{stable} for better interpretabity and robust prediction. Ensembling is known to be effective for improving the stability of feature selectors. Since ensembling is time-consuming, it is desirable to reduce the computational cost to estimate the stability of the ensemble feature selectors. We propose a simulator of a feature selector, and apply it to… ▽ More It is preferred that feature selectors be \textit{stable} for better interpretabity and robust prediction. Ensembling is known to be effective for improving the stability of feature selectors. Since ensembling is time-consuming, it is desirable to reduce the computational cost to estimate the stability of the ensemble feature selectors. We propose a simulator of a feature selector, and apply it to a fast estimation of the stability of ensemble feature selectors. To the best of our knowledge, this is the first study that estimates the stability of ensemble feature selectors and reduces the computation time theoretically and empirically. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: 7 pages. Supplementary material 9 pages. Accepted in ICML2021 Workshop, Subset Selection in Machine Learning: From Theory to Practice (SubSetML) URL: https://sites.google.com/view/icml-2021-subsetml

arXiv:2107.12713 [pdf, other]

LinCDE: Conditional Density Estimation via Lindsey's Method

Authors: Zijun Gao, Trevor Hastie

Abstract: Conditional density estimation is a fundamental problem in statistics, with scientific and practical applications in biology, economics, finance and environmental studies, to name a few. In this paper, we propose a conditional density estimator based on gradient boosting and Lindsey's method (LinCDE). LinCDE admits flexible modeling of the density family and can capture distributional characterist… ▽ More Conditional density estimation is a fundamental problem in statistics, with scientific and practical applications in biology, economics, finance and environmental studies, to name a few. In this paper, we propose a conditional density estimator based on gradient boosting and Lindsey's method (LinCDE). LinCDE admits flexible modeling of the density family and can capture distributional characteristics like modality and shape. In particular, when suitably parametrized, LinCDE will produce smooth and non-negative density estimates. Furthermore, like boosted regression trees, LinCDE does automatic feature selection. We demonstrate LinCDE's efficacy through extensive simulations and three real data examples. △ Less

Submitted 31 December, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: 50 pages, 20 figures

arXiv:2106.11793 [pdf, other]

Identifying intercity freight trip ends of heavy trucks from GPS data

Authors: Yitao Yang, Bin Jia, Xiao-Yong Yan, Jiangtao Li, Zhenzhen Yang, Ziyou Gao

Abstract: The intercity freight trips of heavy trucks are important data for transportation system planning and urban agglomeration management. In recent decades, the extraction of freight trips from GPS data has gradually become the main alternative to traditional surveys. Identifying the trip ends (origin and destination, OD) is the first task in trip extraction. In previous trip end identification method… ▽ More The intercity freight trips of heavy trucks are important data for transportation system planning and urban agglomeration management. In recent decades, the extraction of freight trips from GPS data has gradually become the main alternative to traditional surveys. Identifying the trip ends (origin and destination, OD) is the first task in trip extraction. In previous trip end identification methods, some key parameters, such as speed and time thresholds, have mostly been defined on the basis of empirical knowledge, which inevitably lacks universality. Here, we propose a data-driven trip end identification method. First, we define a speed threshold by analyzing the speed distribution of heavy trucks and identify all truck stops from raw GPS data. Second, we define minimum and maximum time thresholds by analyzing the distribution of the dwell times of heavy trucks at stop location and classify truck stops into three types based on these time thresholds. Third, we use highway network GIS data and freight-related points-of-interest (POIs) data to identify valid trip ends from among the three types of truck stops. In this step, we detect POI boundaries to determine whether a heavy truck is stop** at a freight-related location. We further analyze the spatiotemporal characteristics of intercity freight trips of heavy trucks and discuss their potential applications in practice. △ Less

Submitted 22 June, 2021; originally announced June 2021.

arXiv:2103.14626 [pdf, other]

Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data

Authors: Zhaoxing Gao, Ruey S. Tsay

Abstract: This paper proposes a hierarchical approximate-factor approach to analyzing high-dimensional, large-scale heterogeneous time series data using distributed computing. The new method employs a multiple-fold dimension reduction procedure using Principal Component Analysis (PCA) and shows great promises for modeling large-scale data that cannot be stored nor analyzed by a single machine. Each computer… ▽ More This paper proposes a hierarchical approximate-factor approach to analyzing high-dimensional, large-scale heterogeneous time series data using distributed computing. The new method employs a multiple-fold dimension reduction procedure using Principal Component Analysis (PCA) and shows great promises for modeling large-scale data that cannot be stored nor analyzed by a single machine. Each computer at the basic level performs a PCA to extract common factors among the time series assigned to it and transfers those factors to one and only one node of the second level. Each 2nd-level computer collects the common factors from its subordinates and performs another PCA to select the 2nd-level common factors. This process is repeated until the central server is reached, which collects common factors from its direct subordinates and performs a final PCA to select the global common factors. The noise terms of the 2nd-level approximate factor model are the unique common factors of the 1st-level clusters. We focus on the case of 2 levels in our theoretical derivations, but the idea can easily be generalized to any finite number of hierarchies. We discuss some clustering methods when the group memberships are unknown and introduce a new diffusion index approach to forecasting. We further extend the analysis to unit-root nonstationary time series. Asymptotic properties of the proposed method are derived for the diverging dimension of the data in each computing unit and the sample size $T$. We use both simulated data and real examples to assess the performance of the proposed method in finite samples, and compare our method with the commonly used ones in the literature concerning the forecastability of extracted factors. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: 48 pages, 10 figures

Journal ref: Journal of the American Statistical Association, 2022

arXiv:2103.04277 [pdf, other]

Estimating Heterogeneous Treatment Effects for General Responses

Authors: Zijun Gao, Trevor Hastie

Abstract: Heterogeneous treatment effect models allow us to compare treatments at subgroup and individual levels, and are of increasing popularity in applications like personalized medicine, advertising, and education. In this talk, we first survey different causal estimands used in practice, which focus on estimating the difference in conditional means. We then propose DINA, the difference in natural param… ▽ More Heterogeneous treatment effect models allow us to compare treatments at subgroup and individual levels, and are of increasing popularity in applications like personalized medicine, advertising, and education. In this talk, we first survey different causal estimands used in practice, which focus on estimating the difference in conditional means. We then propose DINA, the difference in natural parameters, to quantify heterogeneous treatment effect in exponential families and the Cox model. For binary outcomes and survival times, DINA is both convenient and more practical for modeling the influence of covariates on the treatment effect. Second, we introduce a meta-algorithm for DINA, which allows practitioners to use powerful off-the-shelf machine learning tools for the estimation of nuisance functions, and which is also statistically robust to errors in inaccurate nuisance function estimation. We demonstrate the efficacy of our method combined with various machine learning base-learners on simulated and real datasets. △ Less

Submitted 27 January, 2022; v1 submitted 7 March, 2021; originally announced March 2021.

arXiv:2011.09029 [pdf, ps, other]

A Two-Way Transformed Factor Model for Matrix-Variate Time Series

Authors: Zhaoxing Gao, Ruey S. Tsay

Abstract: We propose a new framework for modeling high-dimensional matrix-variate time series by a two-way transformation, where the transformed data consist of a matrix-variate factor process, which is dynamically dependent, and three other blocks of white noises. Specifically, for a given $p_1\times p_2$ matrix-variate time series, we seek common nonsingular transformations to project the rows and columns… ▽ More We propose a new framework for modeling high-dimensional matrix-variate time series by a two-way transformation, where the transformed data consist of a matrix-variate factor process, which is dynamically dependent, and three other blocks of white noises. Specifically, for a given $p_1\times p_2$ matrix-variate time series, we seek common nonsingular transformations to project the rows and columns onto another $p_1$ and $p_2$ directions according to the strength of the dynamic dependence of the series on the past values. Consequently, we treat the data as nonsingular linear row and column transformations of dynamically dependent common factors and white noise idiosyncratic components. We propose a common orthonormal projection method to estimate the front and back loading matrices of the matrix-variate factors. Under the setting that the largest eigenvalues of the covariance of the vectorized idiosyncratic term diverge for large $p_1$ and $p_2$, we introduce a two-way projected Principal Component Analysis (PCA) to estimate the associated loading matrices of the idiosyncratic terms to mitigate such diverging noise effects. A diagonal-path white noise testing procedure is proposed to estimate the order of the factor matrix. %under the assumption that the idiosyncratic term is a matrix-variate white noise process. Asymptotic properties of the proposed method are established for both fixed and diverging dimensions as the sample size increases to infinity. We use simulated and real examples to assess the performance of the proposed method. We also compare our method with some existing ones in the literature and find that the proposed approach not only provides interpretable results but also performs well in out-of-sample forecasting. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 49 pages, 6 figures

Journal ref: Econometrics and Statistics 2021

arXiv:2009.11612 [pdf, other]

Clustering Based on Graph of Density Topology

Authors: Zhangyang Gao, Haitao Lin, Stan. Z Li

Abstract: Data clustering with uneven distribution in high level noise is challenging. Currently, HDBSCAN is considered as the SOTA algorithm for this problem. In this paper, we propose a novel clustering algorithm based on what we call graph of density topology (GDT). GDT jointly considers the local and global structures of data samples: firstly forming local clusters based on a density growing process wit… ▽ More Data clustering with uneven distribution in high level noise is challenging. Currently, HDBSCAN is considered as the SOTA algorithm for this problem. In this paper, we propose a novel clustering algorithm based on what we call graph of density topology (GDT). GDT jointly considers the local and global structures of data samples: firstly forming local clusters based on a density growing process with a strategy for properly noise handling as well as cluster boundary detection; and then estimating a GDT from relationship between local clusters in terms of a connectivity measure, givingglobal topological graph. The connectivity, measuring similarity between neighboring local clusters, is based on local clusters rather than individual points, ensuring its robustness to even very large noise. Evaluation results on both toy and real-world datasets show that GDT achieves the SOTA performance by far on almost all the popular datasets, and has a low time complexity of O(nlogn). The code is available at https://github.com/gaozhangyang/DGC.git. △ Less

Submitted 24 September, 2020; originally announced September 2020.

arXiv:2009.05872 [pdf, ps, other]

Certified Robustness of Graph Classification against Topology Attack with Randomized Smoothing

Authors: Zhidong Gao, Rui Hu, Yanmin Gong

Abstract: Graph classification has practical applications in diverse fields. Recent studies show that graph-based machine learning models are especially vulnerable to adversarial perturbations due to the non i.i.d nature of graph data. By adding or deleting a small number of edges in the graph, adversaries could greatly change the graph label predicted by a graph classification model. In this work, we propo… ▽ More Graph classification has practical applications in diverse fields. Recent studies show that graph-based machine learning models are especially vulnerable to adversarial perturbations due to the non i.i.d nature of graph data. By adding or deleting a small number of edges in the graph, adversaries could greatly change the graph label predicted by a graph classification model. In this work, we propose to build a smoothed graph classification model with certified robustness guarantee. We have proven that the resulting graph classification model would output the same prediction for a graph under $l_0$ bounded adversarial perturbation. We also evaluate the effectiveness of our approach under graph convolutional network (GCN) based multi-class graph classification model. △ Less

Submitted 12 September, 2020; originally announced September 2020.

Comments: Accepted to IEEE GLOBECOM 2020

arXiv:2006.06376 [pdf, other]

Wide and Deep Graph Neural Networks with Distributed Online Learning

Authors: Zhan Gao, Fernando Gama, Alejandro Ribeiro

Abstract: Graph neural networks (GNNs) learn representations from network data with naturally distributed architectures, rendering them well-suited candidates for decentralized learning. Oftentimes, this decentralized graph support changes with time due to link failures or topology variations. These changes create a mismatch between the graphs on which GNNs were trained and the ones on which they are tested… ▽ More Graph neural networks (GNNs) learn representations from network data with naturally distributed architectures, rendering them well-suited candidates for decentralized learning. Oftentimes, this decentralized graph support changes with time due to link failures or topology variations. These changes create a mismatch between the graphs on which GNNs were trained and the ones on which they are tested. Online learning can be used to retrain GNNs at testing time, overcoming this issue. However, most online algorithms are centralized and work on convex problems (which GNNs rarely lead to). This paper proposes the Wide and Deep GNN (WD-GNN), a novel architecture that can be easily updated with distributed online learning mechanisms. The WD-GNN comprises two components: the wide part is a bank of linear graph filters and the deep part is a GNN. At training time, the joint architecture learns a nonlinear representation from data. At testing time, the deep part (nonlinear) is left unchanged, while the wide part is retrained online, leading to a convex problem. We derive convergence guarantees for this online retraining procedure and further propose a decentralized alternative. Experiments on the robot swarm control for flocking corroborate theory and show potential of the proposed architecture for distributed online learning. △ Less

Submitted 24 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:2005.03496 [pdf, other]

doi 10.1016/j.ijforecast.2020.09.008

Modeling High-Dimensional Unit-Root Time Series

Authors: Zhaoxing Gao, Ruey S. Tsay

Abstract: This paper proposes a new procedure to build factor models for high-dimensional unit-root time series by postulating that a $p$-dimensional unit-root process is a nonsingular linear transformation of a set of unit-root processes, a set of stationary common factors, which are dynamically dependent, and some idiosyncratic white noise components. For the stationary components, we assume that the fact… ▽ More This paper proposes a new procedure to build factor models for high-dimensional unit-root time series by postulating that a $p$-dimensional unit-root process is a nonsingular linear transformation of a set of unit-root processes, a set of stationary common factors, which are dynamically dependent, and some idiosyncratic white noise components. For the stationary components, we assume that the factor process captures the temporal-dependence and the idiosyncratic white noise series explains, jointly with the factors, the cross-sectional dependence. The estimation of nonsingular linear loading spaces is carried out in two steps. First, we use an eigenanalysis of a nonnegative definite matrix of the data to separate the unit-root processes from the stationary ones and a modified method to specify the number of unit roots. We then employ another eigenanalysis and a projected principal component analysis to identify the stationary common factors and the white noise series. We propose a new procedure to specify the number of white noise series and, hence, the number of stationary common factors, establish asymptotic properties of the proposed method for both fixed and diverging $p$ as the sample size $n$ increases, and use simulation and a real example to demonstrate the performance of the proposed method in finite samples. We also compare our method with some commonly used ones in the literature regarding the forecast ability of the extracted factors and find that the proposed method performs well in out-of-sample forecasting of a 508-dimensional PM$_{2.5}$ series in Taiwan. △ Less

Submitted 11 August, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: 45 pages, 11 figures. arXiv admin note: text overlap with arXiv:1808.07932

Journal ref: International Journal of Forecasting 2020

arXiv:2004.10657 [pdf, other]

doi 10.1145/3385412.3385997

Typilus: Neural Type Hints

Authors: Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, Zheng Gao

Abstract: Type inference over partial contexts in dynamically typed languages is challenging. In this work, we present a graph neural network model that predicts types by probabilistically reasoning over a program's structure, names, and patterns. The network uses deep similarity learning to learn a TypeSpace -- a continuous relaxation of the discrete space of types -- and how to embed the type properties o… ▽ More Type inference over partial contexts in dynamically typed languages is challenging. In this work, we present a graph neural network model that predicts types by probabilistically reasoning over a program's structure, names, and patterns. The network uses deep similarity learning to learn a TypeSpace -- a continuous relaxation of the discrete space of types -- and how to embed the type properties of a symbol (i.e. identifier) into it. Importantly, our model can employ one-shot learning to predict an open vocabulary of types, including rare and user-defined ones. We realise our approach in Typilus for Python that combines the TypeSpace with an optional type checker. We show that Typilus accurately predicts types. Typilus confidently predicts types for 70% of all annotatable symbols; when it predicts a type, that type optionally type checks 95% of the time. Typilus can also find incorrect type annotations; two important and popular open source libraries, fairseq and allennlp, accepted our pull requests that fixed the annotation errors Typilus discovered. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: Accepted to PLDI 2020

arXiv:2004.04618 [pdf, other]

doi 10.1109/JIOT.2019.2957778

Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization

Authors: You Li, Xin Hu, Yuan Zhuang, Zhouzheng Gao, Peng Zhang, Naser El-Sheimy

Abstract: Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper propos… ▽ More Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper proposes an approach to model a continuous wireless-localization process as a Markov decision process (MDP) and process it within a DRL framework. (2) To alleviate the challenge of obtaining rewards when using unlabeled data (e.g., daily-life crowdsourced data), this paper presents a reward-setting mechanism, which extracts robust landmark data from unlabeled wireless received signal strengths (RSS). (3) To ease requirements for model re-training when using DRL for localization, this paper uses RSS measurements together with agent location to construct DRL inputs. The proposed method was tested by using field testing data from multiple Bluetooth 5 smart ear tags in a pasture. Meanwhile, the experimental verification process reflected the advantages and challenges for using DRL in wireless localization. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2003.10375 [pdf, other]

FTT-NAS: Discovering Fault-Tolerant Convolutional Neural Architecture

Authors: Xuefei Ning, Guangjun Ge, Wenshuo Li, Zhenhua Zhu, Yin Zheng, Xiaoming Chen, Zhen Gao, Yu Wang, Huazhong Yang

Abstract: With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, and mali… ▽ More With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, and malicious attackers. Thus the safety risk of deploying NNs is now drawing much attention. In this paper, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-20). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models. △ Less

Submitted 12 April, 2021; v1 submitted 20 March, 2020; originally announced March 2020.

Comments: 24 pages; to appear in TODAES

arXiv:2003.06365 [pdf]

Application of Deep Q-Network in Portfolio Management

Authors: Ziming Gao, Yuan Gao, Yi Hu, Zhengyong Jiang, Jionglong Su

Abstract: Machine Learning algorithms and Neural Networks are widely applied to many different areas such as stock market prediction, face recognition and population analysis. This paper will introduce a strategy based on the classic Deep Reinforcement Learning algorithm, Deep Q-Network, for portfolio management in stock market. It is a type of deep neural network which is optimized by Q Learning. To make t… ▽ More Machine Learning algorithms and Neural Networks are widely applied to many different areas such as stock market prediction, face recognition and population analysis. This paper will introduce a strategy based on the classic Deep Reinforcement Learning algorithm, Deep Q-Network, for portfolio management in stock market. It is a type of deep neural network which is optimized by Q Learning. To make the DQN adapt to financial market, we first discretize the action space which is defined as the weight of portfolio in different assets so that portfolio management becomes a problem that Deep Q-Network can solve. Next, we combine the Convolutional Neural Network and dueling Q-net to enhance the recognition ability of the algorithm. Experimentally, we chose five lowrelevant American stocks to test the model. The result demonstrates that the DQN based strategy outperforms the ten other traditional strategies. The profit of DQN algorithm is 30% more than the profit of other strategies. Moreover, the Sharpe ratio associated with Max Drawdown demonstrates that the risk of policy made with DQN is the lowest. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:2003.03881 [pdf, other]

Assessment of Heterogeneous Treatment Effect Estimation Accuracy via Matching

Authors: Zijun Gao, Trevor Hastie, Robert Tibshirani

Abstract: We study the assessment of the accuracy of heterogeneous treatment effect (HTE) estimation, where the HTE is not directly observable so standard computation of prediction errors is not applicable. To tackle the difficulty, we propose an assessment approach by constructing pseudo-observations of the HTE based on matching. Our contributions are three-fold: first, we introduce a novel matching distan… ▽ More We study the assessment of the accuracy of heterogeneous treatment effect (HTE) estimation, where the HTE is not directly observable so standard computation of prediction errors is not applicable. To tackle the difficulty, we propose an assessment approach by constructing pseudo-observations of the HTE based on matching. Our contributions are three-fold: first, we introduce a novel matching distance derived from proximity scores in random forests; second, we formulate the matching problem as an average minimum-cost flow problem and provide an efficient algorithm; third, we propose a match-then-split principle for the assessment with cross-validation. We demonstrate the efficacy of the assessment approach on synthetic data and data generated from a real dataset. △ Less

Submitted 8 March, 2020; originally announced March 2020.

arXiv:2002.06471 [pdf, ps, other]

Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects

Authors: Zijun Gao, Yanjun Han

Abstract: A central goal of causal inference is to detect and estimate the treatment effects of a given treatment or intervention on an outcome variable of interest, where a member known as the heterogeneous treatment effect (HTE) is of growing popularity in recent practical applications such as the personalized medicine. In this paper, we model the HTE as a smooth nonparametric difference between two less… ▽ More A central goal of causal inference is to detect and estimate the treatment effects of a given treatment or intervention on an outcome variable of interest, where a member known as the heterogeneous treatment effect (HTE) is of growing popularity in recent practical applications such as the personalized medicine. In this paper, we model the HTE as a smooth nonparametric difference between two less smooth baseline functions, and determine the tight statistical limits of the nonparametric HTE estimation as a function of the covariate geometry. In particular, a two-stage nearest-neighbor-based estimator throwing away observations with poor matching quality is near minimax optimal. We also establish the tight dependence on the density ratio without the usual assumption that the covariate densities are bounded away from zero, where a key step is to employ a novel maximal inequality which could be of independent interest. △ Less

Submitted 24 October, 2020; v1 submitted 15 February, 2020; originally announced February 2020.

Comments: To appear at NeurIPS 2020 as a spotlight presentation

arXiv:2002.04829 [pdf, other]

Uniform Interpolation Constrained Geodesic Learning on Data Manifold

Authors: Cong Geng, Jia Wang, Li Chen, Wenbo Bao, Chu Chu, Zhiyong Gao

Abstract: In this paper, we propose a method to learn a minimizing geodesic within a data manifold. Along the learned geodesic, our method can generate high-quality interpolations between two given data samples. Specifically, we use an autoencoder network to map data samples into latent space and perform interpolation via an interpolation network. We add prior geometric information to regularize our autoenc… ▽ More In this paper, we propose a method to learn a minimizing geodesic within a data manifold. Along the learned geodesic, our method can generate high-quality interpolations between two given data samples. Specifically, we use an autoencoder network to map data samples into latent space and perform interpolation via an interpolation network. We add prior geometric information to regularize our autoencoder for the convexity of representations so that for any given interpolation approach, the generated interpolations remain within the distribution of the data manifold. Before the learning of a geodesic, a proper Riemannianmetric should be defined. Therefore, we induce a Riemannian metric by the canonical metric in the Euclidean space which the data manifold is isometrically immersed in. Based on this defined Riemannian metric, we introduce a constant speed loss and a minimizing geodesic loss to regularize the interpolation network to generate uniform interpolation along the learned geodesic on the manifold. We provide a theoretical analysis of our model and use image translation as an example to demonstrate the effectiveness of our method. △ Less

Submitted 14 August, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: submitted to NIPS 2020

arXiv:2002.03382 [pdf, other]

Segmenting High-dimensional Matrix-valued Time Series via Sequential Transformations

Authors: Zhaoxing Gao

Abstract: Modeling matrix-valued time series is an interesting and important research topic. In this paper, we extend the method of Chang et al. (2017) to matrix-valued time series. For any given $p\times q$ matrix-valued time series, we look for linear transformations to segment the matrix into many small sub-matrices for which each of them are uncorrelated with the others both contemporaneously and serial… ▽ More Modeling matrix-valued time series is an interesting and important research topic. In this paper, we extend the method of Chang et al. (2017) to matrix-valued time series. For any given $p\times q$ matrix-valued time series, we look for linear transformations to segment the matrix into many small sub-matrices for which each of them are uncorrelated with the others both contemporaneously and serially, thus they can be analyzed separately, which will greatly reduce the number of parameters to be estimated in terms of modeling. To overcome the identification issue, we propose a two-step and more structured procedure to segment the rows and columns separately. When $\max(p,q)$ is large in relation to the sample size $n$, we assume the transformation matrices are sparse and use threshold estimators for the (auto)covariance matrices. We also propose a block-wisely thresholding method to separate the columns (or rows) of the transformed matrix-valued data. The asymptotic properties are established for both fixed and diverging $\max(p,q)$. Unlike principal component analysis (PCA) for independent data, we cannot guarantee that the required linear transformation exists. When it does not, the proposed method provides an approximate segmentation, which may be useful for forecasting. The proposed method is illustrated with both simulated and real data examples. We also propose a sequential transformation algorithm to segment higher-order tensor-valued time series. △ Less

Submitted 9 February, 2020; originally announced February 2020.

arXiv:2001.07072 [pdf]

Projection based Active Gaussian Process Regression for Pareto Front Modeling

Authors: Zhengqi Gao, Jun Tao, Yangfeng Su, Dian Zhou, Xuan Zeng

Abstract: Pareto Front (PF) modeling is essential in decision making problems across all domains such as economics, medicine or engineering. In Operation Research literature, this task has been addressed based on multi-objective optimization algorithms. However, without learning models for PF, these methods cannot examine whether a new provided point locates on PF or not. In this paper, we reconsider the ta… ▽ More Pareto Front (PF) modeling is essential in decision making problems across all domains such as economics, medicine or engineering. In Operation Research literature, this task has been addressed based on multi-objective optimization algorithms. However, without learning models for PF, these methods cannot examine whether a new provided point locates on PF or not. In this paper, we reconsider the task from Data Mining perspective. A novel projection based active Gaussian process regression (P- aGPR) method is proposed for efficient PF modeling. First, P- aGPR chooses a series of projection spaces with dimensionalities ranking from low to high. Next, in each projection space, a Gaussian process regression (GPR) model is trained to represent the constraint that PF should satisfy in that space. Moreover, in order to improve modeling efficacy and stability, an active learning framework has been developed by exploiting the uncertainty information obtained in the GPR models. Different from all existing methods, our proposed P-aGPR method can not only provide a generative PF model, but also fast examine whether a provided point locates on PF or not. The numerical results demonstrate that compared to state-of-the-art passive learning methods the proposed P-aGPR method can achieve higher modeling accuracy and stability. △ Less

Submitted 20 January, 2020; originally announced January 2020.

arXiv:1910.05701 [pdf, ps, other]

Phase Transitions in Genome-wide Association Studies and Categorical Variable Screenings

Authors: Zheng Gao

Abstract: Motivated by genome-wide association screening studies (GWAS), we study high-dimensional marginal screenings of categorical variables where test statistics have approximate chi-square distributions. We characterize four new phase transitions in high-dimensional chi-square models, and derive the signal sizes necessary and sufficient for statistical procedures to simultaneously control false discove… ▽ More Motivated by genome-wide association screening studies (GWAS), we study high-dimensional marginal screenings of categorical variables where test statistics have approximate chi-square distributions. We characterize four new phase transitions in high-dimensional chi-square models, and derive the signal sizes necessary and sufficient for statistical procedures to simultaneously control false discovery (in terms of family-wise error rate or false discovery rate) and missed detection (in terms of family-wise non-discovery rate or false non-discovery rate) in large dimensions. Remarkably, degrees of freedom in the chi-square distributions do not affect the boundaries in all four phase transitions. Several well-known procedures are shown to attain these boundaries. Two new phase transitions are also identified in the Gaussian location model under one-sided alternatives. We then elucidate on the nature of signal sizes in association tests by characterizing its relationship with marginal frequencies, odds ratio, and sample sizes in $2\times2$ contingency tables. This allows us to illustrate an interesting manifestation of the phase transition phenomena in genome-wide association studies (GWAS). We also show, perhaps surprisingly, that given total sample sizes, balanced designs in such association studies rarely deliver optimal power for detecting the effects of rare genetic variants. △ Less

Submitted 3 June, 2022; v1 submitted 13 October, 2019; originally announced October 2019.

Comments: 40 pages, 8 figures

MSC Class: 62G10; 62G20

arXiv:1910.03203 [pdf]

Random forest model identifies serve strength as a key predictor of tennis match outcome

Authors: Zijian Gao, Amanda Kowalczyk

Abstract: Tennis is a popular sport worldwide, boasting millions of fans and numerous national and international tournaments. Like many sports, tennis has benefitted from the popularity of rigorous record-kee** of game and player information, as well as the growth of machine learning methods for use in sports analytics. Of particular interest to bettors and betting companies alike is potential use of spor… ▽ More Tennis is a popular sport worldwide, boasting millions of fans and numerous national and international tournaments. Like many sports, tennis has benefitted from the popularity of rigorous record-kee** of game and player information, as well as the growth of machine learning methods for use in sports analytics. Of particular interest to bettors and betting companies alike is potential use of sports records to predict tennis match outcomes prior to match start. We compiled, cleaned, and used the largest database of tennis match information to date to predict match outcome using fairly simple machine learning methods. Using such methods allows for rapid fit and prediction times to readily incorporate new data and make real-time predictions. We were able to predict match outcomes with upwards of 80% accuracy, much greater than predictions using betting odds alone, and identify serve strength as a key predictor of match outcome. By combining prediction accuracies from three models, we were able to nearly recreate a probability distribution based on average betting odds from betting companies, which indicates that betting companies are using similar information to assign odds to matches. These results demonstrate the capability of relatively simple machine learning models to quite accurately predict tennis match outcomes. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 12 pages, 5 figures, 2 tables

arXiv:1909.03500 [pdf, other]

doi 10.1109/ICDE48307.2020.00078

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Authors: Zhining Liu, Wei Cao, Zhifeng Gao, Jiang Bian, Hechang Chen, Yi Chang, Tie-Yan Liu

Abstract: Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario. To tackle this problem, we conduct deep investigatio… ▽ More Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of existing learning methods suffer from poor performance or low computation efficiency under such a scenario. To tackle this problem, we conduct deep investigations into the nature of class imbalance, which reveals that not only the disproportion between classes, but also other difficulties embedded in the nature of data, especially, noises and class overlap**, prevent us from learning effective classifiers. Taking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling. Extensive experiments have shown that this new framework, while being very computationally efficient, can lead to robust performance even under highly overlap** classes and extremely skewed distribution. Note that, our methods can be easily adapted to most of existing learning methods (e.g., C4.5, SVM, GBDT and Neural Network) to boost their performance on imbalanced data. △ Less

Submitted 17 October, 2020; v1 submitted 8 September, 2019; originally announced September 2019.

Comments: IEEE 36th International Conference on Data Engineering (ICDE 2020)

Journal ref: 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020: 841-852

arXiv:1908.08616 [pdf, other]

doi 10.3934/jimo.2021046

Quadratic Surface Support Vector Machine with L1 Norm Regularization

Authors: Ahmad Mousavi, Zheming Gao, Lanshan Han, Alvin Lim

Abstract: We propose $\ell_1$ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish their desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separabl… ▽ More We propose $\ell_1$ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish their desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter of $\ell_1$ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets. △ Less

Submitted 30 January, 2021; v1 submitted 22 August, 2019; originally announced August 2019.

arXiv:1907.13353 [pdf, other]

A Novel Multiple Classifier Generation and Combination Framework Based on Fuzzy Clustering and Individualized Ensemble Construction

Authors: Zhen Gao, Maryam Zand, Jianhua Ruan

Abstract: Multiple classifier system (MCS) has become a successful alternative for improving classification performance. However, studies have shown inconsistent results for different MCSs, and it is often difficult to predict which MCS algorithm works the best on a particular problem. We believe that the two crucial steps of MCS - base classifier generation and multiple classifier combination, need to be d… ▽ More Multiple classifier system (MCS) has become a successful alternative for improving classification performance. However, studies have shown inconsistent results for different MCSs, and it is often difficult to predict which MCS algorithm works the best on a particular problem. We believe that the two crucial steps of MCS - base classifier generation and multiple classifier combination, need to be designed coordinately to produce robust results. In this work, we show that for different testing instances, better classifiers may be trained from different subdomains of training instances including, for example, neighboring instances of the testing instance, or even instances far away from the testing instance. To utilize this intuition, we propose Individualized Classifier Ensemble (ICE). ICE groups training data into overlap** clusters, builds a classifier for each cluster, and then associates each training instance to the top-performing models while taking into account model types and frequency. In testing, ICE finds the k most similar training instances for a testing instance, then predicts class label of the testing instance by averaging the prediction from models associated with these training instances. Evaluation results on 49 benchmarks show that ICE has a stable improvement on a significant proportion of datasets over existing MCS methods. ICE provides a novel choice of utilizing internal patterns among instances to improve classification, and can be easily combined with various classification models and applied to many application domains. △ Less

Submitted 31 July, 2019; originally announced July 2019.

arXiv:1907.06582 [pdf, other]

AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data

Authors: Zheng Gao, Lin Guo, Chi Ma, Xiao Ma, Kai Sun, Hang Xiang, Xiaoqiang Zhu, Hongsong Li, Xiaozhong Liu

Abstract: Anomaly detection is facing with emerging challenges in many important industry domains, such as cyber security and online recommendation and advertising. The recent trend in these areas calls for anomaly detection on time-evolving data with high-dimensional categorical features without labeled samples. Also, there is an increasing demand for identifying and monitoring irregular patterns at multip… ▽ More Anomaly detection is facing with emerging challenges in many important industry domains, such as cyber security and online recommendation and advertising. The recent trend in these areas calls for anomaly detection on time-evolving data with high-dimensional categorical features without labeled samples. Also, there is an increasing demand for identifying and monitoring irregular patterns at multiple resolutions. In this work, we propose a unified end-to-end approach to solve these challenges by combining the advantages of Adversarial Autoencoder and Recurrent Neural Network. The model learns data representations cross different scales with attention mechanisms, on which an enhanced two-resolution anomaly detector is developed for both instances and data blocks. Extensive experiments are performed over three types of datasets to demonstrate the efficacy of our method and its superiority over the state-of-art approaches. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: Accepted by 2019 KDD Workshop on Deep Learning Practice for High-Dimensional Sparse Data

arXiv:1906.09981 [pdf, other]

Optimal WDM Power Allocation via Deep Learning for Radio on Free Space Optics Systems

Authors: Zhan Gao, Mark Eisen, Alejandro Ribeiro

Abstract: Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, is able to transmit multiple radio frequency signals at high rates in free space optical networks. This paper investigates the optimal design of power allocation for Wavelength Division Multiplexing (WDM) transmission in RoFSO systems. The proposed problem is a weighted total capacity maximization prob… ▽ More Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, is able to transmit multiple radio frequency signals at high rates in free space optical networks. This paper investigates the optimal design of power allocation for Wavelength Division Multiplexing (WDM) transmission in RoFSO systems. The proposed problem is a weighted total capacity maximization problem with two constraints of total power limitation and eye safety concern. The model-based Stochastic Dual Gradient algorithm is presented first, which solves the problem exactly by exploiting the null duality gap. The model-free Primal-Dual Deep Learning algorithm is then developed to learn and optimize the power allocation policy with Deep Neural Network (DNN) parametrization, which can be utilized without any knowledge of system models. Numerical simulations are performed to exhibit significant performance of our algorithms compared to the average equal power allocation. △ Less

Submitted 21 June, 2019; originally announced June 2019.

arXiv:1904.03779 [pdf, ps, other]

Cluster Develo** 1-Bit Matrix Completion

Authors: Chengkun Zhang. Junbin Gao, Stephen Lu

Abstract: Matrix completion has a long-time history of usage as the core technique of recommender systems. In particular, 1-bit matrix completion, which considers the prediction as a ``Recommended'' or ``Not Recommended'' question, has proved its significance and validity in the field. However, while customers and products aggregate into interacted clusters, state-of-the-art model-based 1-bit recommender sy… ▽ More Matrix completion has a long-time history of usage as the core technique of recommender systems. In particular, 1-bit matrix completion, which considers the prediction as a ``Recommended'' or ``Not Recommended'' question, has proved its significance and validity in the field. However, while customers and products aggregate into interacted clusters, state-of-the-art model-based 1-bit recommender systems do not take the consideration of grou** bias. To tackle the gap, this paper introduced Group-Specific 1-bit Matrix Completion (GS1MC) by first-time consolidating group-specific effects into 1-bit recommender systems under the low-rank latent variable framework. Additionally, to empower GS1MC even when grou** information is unobtainable, Cluster Develo** Matrix Completion (CDMC) was proposed by integrating the sparse subspace clustering technique into GS1MC. Namely, CDMC allows clustering users/items and to leverage their group effects into matrix completion at the same time. Experiments on synthetic and real-world data show that GS1MC outperforms the current 1-bit matrix completion methods. Meanwhile, it is compelling that CDMC can successfully capture items' genre features only based on sparse binary user-item interactive data. Notably, GS1MC provides a new insight to incorporate and evaluate the efficacy of clustering methods while CDMC can be served as a new tool to explore unrevealed social behavior or market phenomenon. △ Less

Submitted 7 April, 2019; originally announced April 2019.

Comments: 16 Pages

arXiv:1904.01763 [pdf, other]

Batched Multi-armed Bandits Problem

Authors: Zijun Gao, Yanjun Han, Zhimei Ren, Zhengqing Zhou

Abstract: In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches. While the minimax regret for the two-armed stochastic bandits has been completely characterized in \cite{perchet2016batched}, the effect of the number of arms on the regret for the multi-armed case is still open. Moreover, the question whether adap… ▽ More In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches. While the minimax regret for the two-armed stochastic bandits has been completely characterized in \cite{perchet2016batched}, the effect of the number of arms on the regret for the multi-armed case is still open. Moreover, the question whether adaptively chosen batch sizes will help to reduce the regret also remains underexplored. In this paper, we propose the BaSE (batched successive elimination) policy to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner. △ Less

Submitted 26 October, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: To appear in NeurIPS 2019 as an oral presentation

arXiv:1810.03445 [pdf]

Building a language evolution tree based on word vector combination model

Authors: Zhu Gao, Yanhui Jiang, Junhui Gao

Abstract: In this paper, we try to explore the evolution of language through case calculations. First, we chose the novels of eleven British writers from 1400 to 2005 and found the corresponding works; Then, we use the natural language processing tool to construct the corresponding eleven corpora, and calculate the respective word vectors of 100 high-frequency words in eleven corpora; Next, for each corpus,… ▽ More In this paper, we try to explore the evolution of language through case calculations. First, we chose the novels of eleven British writers from 1400 to 2005 and found the corresponding works; Then, we use the natural language processing tool to construct the corresponding eleven corpora, and calculate the respective word vectors of 100 high-frequency words in eleven corpora; Next, for each corpus, we concatenate the 100 word vectors from beginning to end into one; Finally, we use the similarity comparison and hierarchical clustering method to generate the relationship tree between the combined eleven word vectors. This tree represents the relationship between eleven corpora. We found that in the tree generated by clustering, the distance between the corpus and the year corresponding to the corpus are basically the same. This means that we have discovered a specific language evolution tree. To verify the stability and versatility of this method, we add three other themes: Dickens's eight works, the 19th century poets' works, and art criticism of recent 60 years. For these four themes, we tested different parameters such as the time span of the corpus, the time interval between the corpora, the dimension of the word vector, and the number of high-frequency public words. The results show that this is fairly stable and versatile. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Showing 1–50 of 56 results for author: Gao, Z